feat: hybrid pipeline — gpt-realtime-whisper + gpt-5.4-nano + TTS

Hybrid approach gives streaming STT at ~/usr/bin/bash.017/min + cheap brain
at ~/usr/bin/bash.001/min + TTS at ~/usr/bin/bash.015/min = ~/usr/bin/bash.033/min total.

- gpt-realtime-whisper handles streaming transcription with VAD
- gpt-5.4-nano handles response generation (chat completions)
- OpenAI TTS (nova) for voice output
- Server VAD detects utterance boundaries
- Honcho memory context injected into system prompt
- Removed old full Realtime relay service
This commit is contained in:
2026-06-04 13:48:06 -04:00
parent 1c15d42e06
commit 274d04ea10
4 changed files with 281 additions and 284 deletions
+4
View File
@@ -142,6 +142,10 @@ export function useConversation() {
addMessage(msg.role === 'user' ? 'user' : 'kira', msg.text);
break;
case 'transcript_delta':
// Streaming partial transcript — could show as typing indicator
break;
case 'speaking_start':
setIsKiraSpeaking(true);
break;