clean: archive legacy stt/tts/llm services; update ARCHITECTURE.md + README.md to current stack (REST gpt-4o-transcribe, nano, sage, Honcho, incremental TTS, white noise)
Per PLAN item 6. Legacy files moved to archive/legacy-pipeline/.
This commit is contained in:
+12
-12
@@ -27,23 +27,24 @@ Browser-based real-time AI body double. She talks to Kira (microphone → STT
|
|||||||
|
|
||||||
```
|
```
|
||||||
┌─────────────────────────────────────────────────────────┐
|
┌─────────────────────────────────────────────────────────┐
|
||||||
│ Browser │
|
│ Browser (React + Live2D + girly UI) │
|
||||||
│ │
|
│ │
|
||||||
│ [Mic] → MediaRecorder → audio chunks │
|
│ [Mic] → MediaRecorder (webm/opus full utterance) │
|
||||||
│ ↓ (WebSocket) │
|
│ ↓ (WebSocket /api/ws) │
|
||||||
│ [FastAPI Backend] │
|
│ [FastAPI Backend] │
|
||||||
│ ↓ │
|
│ ↓ │
|
||||||
│ 1. Whisper API → text transcript │
|
│ 1. REST gpt-4o-transcribe → full text (with delta emit)|
|
||||||
│ 2. DeepSeek V4 (system prompt: "You are Kira...") │
|
│ 2. gpt-5.4-nano + Honcho memory suffix │
|
||||||
│ 3. OpenAI TTS → audio buffer │
|
│ 3. OpenAI TTS (sage) streaming response → Opus chunks │
|
||||||
│ ↑ (WebSocket) │
|
│ ↑ (WebSocket) │
|
||||||
│ [Audio Player + Live2D Lip-Sync] │
|
│ [Incremental Audio playback + Live2D + "Hearing" UI] │
|
||||||
│ │
|
│ │
|
||||||
│ Kira's idle animations run between conversation turns │
|
│ White noise (Web Audio) and lofi run independently. │
|
||||||
|
│ VAD is client-button based (toggle Talk). │
|
||||||
└─────────────────────────────────────────────────────────┘
|
└─────────────────────────────────────────────────────────┘
|
||||||
```
|
```
|
||||||
|
|
||||||
Lo-fi music runs independently via YouTube embed — no backend involvement.
|
Current: REST STT for reliability (Realtime WS attempted but blocked by model access).
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -65,9 +66,8 @@ ai-body-double/
|
|||||||
│ │ └── assets.py # REST: outfit/pet state, backgrounds
|
│ │ └── assets.py # REST: outfit/pet state, backgrounds
|
||||||
│ ├── services/
|
│ ├── services/
|
||||||
│ │ ├── __init__.py
|
│ │ ├── __init__.py
|
||||||
│ │ ├── stt.py # Whisper API client
|
│ │ ├── memory.py # Honcho wrapper (context, prefs, summaries)
|
||||||
│ │ ├── llm.py # DeepSeek chat client
|
│ │ └── whisper_stream.py # Realtime WS attempt (archived use; fallback to REST)
|
||||||
│ │ └── tts.py # OpenAI TTS client
|
|
||||||
│ └── models/
|
│ └── models/
|
||||||
│ ├── __init__.py
|
│ ├── __init__.py
|
||||||
│ └── schemas.py # Pydantic models
|
│ └── schemas.py # Pydantic models
|
||||||
|
|||||||
@@ -34,7 +34,7 @@ cd ~/Projects/ai-body-double
|
|||||||
cp .env.example .env
|
cp .env.example .env
|
||||||
# Edit .env with your API keys:
|
# Edit .env with your API keys:
|
||||||
# OPENAI_API_KEY=sk-...
|
# OPENAI_API_KEY=sk-...
|
||||||
# DEEPSEEK_API_KEY=sk-...
|
# HONCHO_API_KEY=...
|
||||||
```
|
```
|
||||||
|
|
||||||
### 3. Run
|
### 3. Run
|
||||||
@@ -56,17 +56,19 @@ kira.hobokenchicken.com {
|
|||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
```
|
```
|
||||||
Browser ──WebSocket──▶ Backend (FastAPI)
|
Browser (React + Live2D + WhiteNoise + Notes) ──WebSocket──▶ Backend (FastAPI)
|
||||||
│ │
|
│ │
|
||||||
├─ Mic audio ──────────▶ ├─ Whisper API (STT)
|
├─ Mic (MediaRecorder full webm on stop) ────────────────────▶ ├─ gpt-4o-transcribe (REST STT, emits delta)
|
||||||
│ ├─ DeepSeek (LLM)
|
│ ├─ gpt-5.4-nano + Honcho memory context
|
||||||
│ ◀── TTS audio ──────── ├─ OpenAI TTS
|
│ ◀── Incremental Opus chunks (play on arrival) ─────────────── ├─ OpenAI TTS streaming (sage)
|
||||||
│ │
|
│ │
|
||||||
├─ YouTube embed (lo-fi) │
|
├─ YouTube lofi + Web Audio noise (independent) │
|
||||||
├─ Timer / Notes / Cats │
|
├─ Timer / Notes / Pets / Wardrobe / Scenes │
|
||||||
└─ Animated avatar │
|
└─ Live2D (or fallback) avatar │
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Note: Realtime WebSocket STT (gpt-realtime-whisper) was attempted for true streaming but blocked by model access — current REST path is stable and cheap.
|
||||||
|
|
||||||
## Live2D Model Setup
|
## Live2D Model Setup
|
||||||
|
|
||||||
Kira currently uses a CSS/SVG animated placeholder avatar. To add a Live2D model:
|
Kira currently uses a CSS/SVG animated placeholder avatar. To add a Live2D model:
|
||||||
|
|||||||
Reference in New Issue
Block a user