clean: archive legacy stt/tts/llm services; update ARCHITECTURE.md + README.md to current stack (REST gpt-4o-transcribe, nano, sage, Honcho, incremental TTS, white noise)

Per PLAN item 6. Legacy files moved to archive/legacy-pipeline/.
2026-06-04 15:54:12 -04:00
parent 59b72aa184
commit 1bfc8333e9
5 changed files with 24 additions and 22 deletions
@@ -27,23 +27,24 @@ Browser-based real-time AI body double. She talks to Kira (microphone → STT

 ```
 ┌─────────────────────────────────────────────────────────┐
-│  Browser                                                 │
+│  Browser (React + Live2D + girly UI)                     │
 │                                                          │
-│  [Mic] → MediaRecorder → audio chunks                   │
-│       ↓ (WebSocket)                                      │
+│  [Mic] → MediaRecorder (webm/opus full utterance)       │
+│       ↓ (WebSocket /api/ws)                              │
 │  [FastAPI Backend]                                       │
 │       ↓                                                 │
-│  1. Whisper API → text transcript                       │
-│  2. DeepSeek V4 (system prompt: "You are Kira...")     │
-│  3. OpenAI TTS → audio buffer                           │
+│  1. REST gpt-4o-transcribe → full text (with delta emit)|
+│  2. gpt-5.4-nano + Honcho memory suffix                 │
+│  3. OpenAI TTS (sage) streaming response → Opus chunks  │
 │       ↑ (WebSocket)                                      │
-│  [Audio Player + Live2D Lip-Sync]                       │
+│  [Incremental Audio playback + Live2D + "Hearing" UI]   │
 │                                                          │
-│  Kira's idle animations run between conversation turns   │
+│  White noise (Web Audio) and lofi run independently.     │
+│  VAD is client-button based (toggle Talk).               │
 └─────────────────────────────────────────────────────────┘
 ```

-Lo-fi music runs independently via YouTube embed — no backend involvement.
+Current: REST STT for reliability (Realtime WS attempted but blocked by model access).

 ---

@@ -65,9 +66,8 @@ ai-body-double/
 │   │   └── assets.py         # REST: outfit/pet state, backgrounds
 │   ├── services/
 │   │   ├── __init__.py
-│   │   ├── stt.py            # Whisper API client
-│   │   ├── llm.py            # DeepSeek chat client
-│   │   └── tts.py            # OpenAI TTS client
+│   │   ├── memory.py         # Honcho wrapper (context, prefs, summaries)
+│   │   └── whisper_stream.py # Realtime WS attempt (archived use; fallback to REST)
 │   └── models/
 │       ├── __init__.py
 │       └── schemas.py        # Pydantic models