clean: archive legacy stt/tts/llm services; update ARCHITECTURE.md + README.md to current stack (REST gpt-4o-transcribe, nano, sage, Honcho, incremental TTS, white noise)

Per PLAN item 6. Legacy files moved to archive/legacy-pipeline/.
2026-06-04 15:54:12 -04:00
parent 59b72aa184
commit 1bfc8333e9
5 changed files with 24 additions and 22 deletions
@@ -27,23 +27,24 @@ Browser-based real-time AI body double. She talks to Kira (microphone → STT

 ```
 ┌─────────────────────────────────────────────────────────┐
-│  Browser                                                 │
+│  Browser (React + Live2D + girly UI)                     │
 │                                                          │
-│  [Mic] → MediaRecorder → audio chunks                   │
-│       ↓ (WebSocket)                                      │
+│  [Mic] → MediaRecorder (webm/opus full utterance)       │
+│       ↓ (WebSocket /api/ws)                              │
 │  [FastAPI Backend]                                       │
 │       ↓                                                 │
-│  1. Whisper API → text transcript                       │
-│  2. DeepSeek V4 (system prompt: "You are Kira...")     │
-│  3. OpenAI TTS → audio buffer                           │
+│  1. REST gpt-4o-transcribe → full text (with delta emit)|
+│  2. gpt-5.4-nano + Honcho memory suffix                 │
+│  3. OpenAI TTS (sage) streaming response → Opus chunks  │
 │       ↑ (WebSocket)                                      │
-│  [Audio Player + Live2D Lip-Sync]                       │
+│  [Incremental Audio playback + Live2D + "Hearing" UI]   │
 │                                                          │
-│  Kira's idle animations run between conversation turns   │
+│  White noise (Web Audio) and lofi run independently.     │
+│  VAD is client-button based (toggle Talk).               │
 └─────────────────────────────────────────────────────────┘
 ```

-Lo-fi music runs independently via YouTube embed — no backend involvement.
+Current: REST STT for reliability (Realtime WS attempted but blocked by model access).

 ---

@@ -65,9 +66,8 @@ ai-body-double/
 │   │   └── assets.py         # REST: outfit/pet state, backgrounds
 │   ├── services/
 │   │   ├── __init__.py
-│   │   ├── stt.py            # Whisper API client
-│   │   ├── llm.py            # DeepSeek chat client
-│   │   └── tts.py            # OpenAI TTS client
+│   │   ├── memory.py         # Honcho wrapper (context, prefs, summaries)
+│   │   └── whisper_stream.py # Realtime WS attempt (archived use; fallback to REST)
 │   └── models/
 │       ├── __init__.py
 │       └── schemas.py        # Pydantic models
@@ -34,7 +34,7 @@ cd ~/Projects/ai-body-double
 cp .env.example .env
 # Edit .env with your API keys:
 #   OPENAI_API_KEY=sk-...
-#   DEEPSEEK_API_KEY=sk-...
+#   HONCHO_API_KEY=...
 ```

 ### 3. Run
@@ -56,17 +56,19 @@ kira.hobokenchicken.com {
 ## Architecture

 ```
-Browser ──WebSocket──▶ Backend (FastAPI)
+Browser (React + Live2D + WhiteNoise + Notes) ──WebSocket──▶ Backend (FastAPI)
  │                                                              │
-  ├─ Mic audio ──────────▶ ├─ Whisper API (STT)
-  │                        ├─ DeepSeek (LLM)
-  │ ◀── TTS audio ──────── ├─ OpenAI TTS
+  ├─ Mic (MediaRecorder full webm on stop) ────────────────────▶ ├─ gpt-4o-transcribe (REST STT, emits delta)
+  │                                                              ├─ gpt-5.4-nano + Honcho memory context
+  │ ◀── Incremental Opus chunks (play on arrival) ─────────────── ├─ OpenAI TTS streaming (sage)
  │                                                              │
-  ├─ YouTube embed (lo-fi) │
-  ├─ Timer / Notes / Cats  │
-  └─ Animated avatar       │
+  ├─ YouTube lofi + Web Audio noise (independent)                │
+  ├─ Timer / Notes / Pets / Wardrobe / Scenes                    │
+  └─ Live2D (or fallback) avatar                                 │
 ```

+Note: Realtime WebSocket STT (gpt-realtime-whisper) was attempted for true streaming but blocked by model access — current REST path is stable and cheap.
+
 ## Live2D Model Setup

 Kira currently uses a CSS/SVG animated placeholder avatar. To add a Live2D model: