e2332af8d0
Replaced the 3-step sequential pipeline (Whisper STT → DeepSeek LLM → OpenAI TTS) with a single OpenAI Realtime API WebSocket using gpt-4o-mini-realtime-preview. - ~300-800ms latency vs 1-3s - Server VAD for automatic turn detection - Streaming audio chunks during playback - Interruptions: user can speak over Kira mid-response - Honcho memory still injected into session instructions - Frontend captures PCM16 mono 24kHz via AudioContext - Backend relays client ↔ OpenAI Realtime API - Supports both voice (PCM16) and text input
11 lines
194 B
Plaintext
11 lines
194 B
Plaintext
fastapi>=0.115.0
|
|
uvicorn[standard]>=0.34.0
|
|
python-dotenv>=1.1.0
|
|
openai>=1.55.0
|
|
websockets>=14.1
|
|
pydantic>=2.10.0
|
|
pydantic-settings>=2.7.0
|
|
httpx>=0.28.0
|
|
honcho-ai>=2.1.0
|
|
openai[realtime]>=2.41.0
|