clean: archive legacy stt/tts/llm services; update ARCHITECTURE.md + README.md to current stack (REST gpt-4o-transcribe, nano, sage, Honcho, incremental TTS, white noise)

Per PLAN item 6. Legacy files moved to archive/legacy-pipeline/.
This commit is contained in:
2026-06-04 15:54:12 -04:00
parent 59b72aa184
commit 1bfc8333e9
5 changed files with 24 additions and 22 deletions
+12 -12
View File
@@ -27,23 +27,24 @@ Browser-based real-time AI body double. She talks to Kira (microphone → STT
``` ```
┌─────────────────────────────────────────────────────────┐ ┌─────────────────────────────────────────────────────────┐
│ Browser │ Browser (React + Live2D + girly UI)
│ │ │ │
│ [Mic] → MediaRecorder → audio chunks │ [Mic] → MediaRecorder (webm/opus full utterance)
│ ↓ (WebSocket) │ ↓ (WebSocket /api/ws)
│ [FastAPI Backend] │ │ [FastAPI Backend] │
│ ↓ │ │ ↓ │
│ 1. Whisper API → text transcript │ │ 1. REST gpt-4o-transcribe → full text (with delta emit)|
│ 2. DeepSeek V4 (system prompt: "You are Kira...") │ 2. gpt-5.4-nano + Honcho memory suffix
│ 3. OpenAI TTS → audio buffer │ 3. OpenAI TTS (sage) streaming response → Opus chunks
│ ↑ (WebSocket) │ │ ↑ (WebSocket) │
│ [Audio Player + Live2D Lip-Sync] │ [Incremental Audio playback + Live2D + "Hearing" UI]
│ │ │ │
Kira's idle animations run between conversation turns White noise (Web Audio) and lofi run independently.
│ VAD is client-button based (toggle Talk). │
└─────────────────────────────────────────────────────────┘ └─────────────────────────────────────────────────────────┘
``` ```
Lo-fi music runs independently via YouTube embed — no backend involvement. Current: REST STT for reliability (Realtime WS attempted but blocked by model access).
--- ---
@@ -65,9 +66,8 @@ ai-body-double/
│ │ └── assets.py # REST: outfit/pet state, backgrounds │ │ └── assets.py # REST: outfit/pet state, backgrounds
│ ├── services/ │ ├── services/
│ │ ├── __init__.py │ │ ├── __init__.py
│ │ ├── stt.py # Whisper API client │ │ ├── memory.py # Honcho wrapper (context, prefs, summaries)
│ │ ── llm.py # DeepSeek chat client │ │ ── whisper_stream.py # Realtime WS attempt (archived use; fallback to REST)
│ │ └── tts.py # OpenAI TTS client
│ └── models/ │ └── models/
│ ├── __init__.py │ ├── __init__.py
│ └── schemas.py # Pydantic models │ └── schemas.py # Pydantic models
+12 -10
View File
@@ -34,7 +34,7 @@ cd ~/Projects/ai-body-double
cp .env.example .env cp .env.example .env
# Edit .env with your API keys: # Edit .env with your API keys:
# OPENAI_API_KEY=sk-... # OPENAI_API_KEY=sk-...
# DEEPSEEK_API_KEY=sk-... # HONCHO_API_KEY=...
``` ```
### 3. Run ### 3. Run
@@ -56,17 +56,19 @@ kira.hobokenchicken.com {
## Architecture ## Architecture
``` ```
Browser ──WebSocket──▶ Backend (FastAPI) Browser (React + Live2D + WhiteNoise + Notes) ──WebSocket──▶ Backend (FastAPI)
│ │
├─ Mic audio ──────────▶ ├─ Whisper API (STT) ├─ Mic (MediaRecorder full webm on stop) ────────────────────▶ ├─ gpt-4o-transcribe (REST STT, emits delta)
├─ DeepSeek (LLM) ├─ gpt-5.4-nano + Honcho memory context
│ ◀── TTS audio ──────── ├─ OpenAI TTS │ ◀── Incremental Opus chunks (play on arrival) ─────────────── ├─ OpenAI TTS streaming (sage)
│ │
├─ YouTube embed (lo-fi) ├─ YouTube lofi + Web Audio noise (independent)
├─ Timer / Notes / Cats ├─ Timer / Notes / Pets / Wardrobe / Scenes
└─ Animated avatar └─ Live2D (or fallback) avatar
``` ```
Note: Realtime WebSocket STT (gpt-realtime-whisper) was attempted for true streaming but blocked by model access — current REST path is stable and cheap.
## Live2D Model Setup ## Live2D Model Setup
Kira currently uses a CSS/SVG animated placeholder avatar. To add a Live2D model: Kira currently uses a CSS/SVG animated placeholder avatar. To add a Live2D model: