feat(audio): Gemini Live API replaces Whisper+GPT+ElevenLabs

Single WebSocket proxy: frontend PCM16 16kHz → backend → Gemini Live API
Gemini returns PCM16 24kHz audio + text. Playback via Web Audio API queue.
Removed OpenAI/DeepSeek deps. Model: gemini-3.1-flash-live-preview.
Voice: Aoede. Streaming bidirectional audio with silence gating.
This commit is contained in:
2026-06-05 23:36:29 -04:00
parent d2bde65645
commit 83a990e838
6 changed files with 331 additions and 286 deletions
+2 -7
View File
@@ -1,13 +1,8 @@
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
# OpenAI (used for STT + TTS)
openai_api_key: str = ""
# DeepSeek (LLM)
deepseek_api_key: str = ""
deepseek_base_url: str = "https://api.deepseek.com/v1"
deepseek_model: str = "deepseek-chat"
# Gemini Live API
gemini_api_key: str = ""
# Honcho (memory)
honcho_api_key: str = ""