1bfc8333e9
Per PLAN item 6. Legacy files moved to archive/legacy-pipeline/.
8.1 KiB
8.1 KiB
Kira — Architecture Plan
Overview
Browser-based real-time AI body double. She talks to Kira (microphone → STT → LLM → TTS → speaker), Kira talks back with lip-sync. Lo-fi from Lofi Girl streaming in the background, two cats hanging out, customizable background scenes, timers, notes, and a full wardrobe/accessory system.
Tech Stack
| Layer | Choice | Why |
|---|---|---|
| Frontend | React 18 + Vite + TypeScript | Fast dev loop, runs in any browser |
| Styling | TailwindCSS + custom girly-pop theme | Pink/lavender/mint palette, rounded everything |
| Live2D | Cubism 4 Web SDK via pixi-live2d-display | Full lip-sync, gestures, blink, idle anims |
| Backend | Python FastAPI + WebSockets | Async-native, easy AI API integration |
| STT | OpenAI Whisper API | Best transcription, <$0.01/min |
| LLM | DeepSeek V4 (cloud API) | Smart, fast reasoning, good personality adherence |
| TTS | OpenAI TTS API | Clean female voice, low latency |
| Music | YouTube IFrame Player API (Lofi Girl channel) | Free, endless streams, no backend proxy |
| Audio Pipeline | MediaRecorder → chunks → WebSocket → backend | Low-latency real-time conversation |
| Infrastructure | Docker Compose + Caddy reverse proxy | Deployable in homelab immediately |
Data Flow (Conversation)
┌─────────────────────────────────────────────────────────┐
│ Browser (React + Live2D + girly UI) │
│ │
│ [Mic] → MediaRecorder (webm/opus full utterance) │
│ ↓ (WebSocket /api/ws) │
│ [FastAPI Backend] │
│ ↓ │
│ 1. REST gpt-4o-transcribe → full text (with delta emit)|
│ 2. gpt-5.4-nano + Honcho memory suffix │
│ 3. OpenAI TTS (sage) streaming response → Opus chunks │
│ ↑ (WebSocket) │
│ [Incremental Audio playback + Live2D + "Hearing" UI] │
│ │
│ White noise (Web Audio) and lofi run independently. │
│ VAD is client-button based (toggle Talk). │
└─────────────────────────────────────────────────────────┘
Current: REST STT for reliability (Realtime WS attempted but blocked by model access).
Directory Structure
ai-body-double/
├── docker-compose.yml # Single compose for both services
├── .env.example # API keys template
├── backend/
│ ├── Dockerfile
│ ├── requirements.txt
│ ├── main.py # FastAPI app + WebSocket handler
│ ├── config.py # Environment/API config
│ ├── routers/
│ │ ├── __init__.py
│ │ ├── conversation.py # WS: mic → STT → LLM → TTS roundtrip
│ │ ├── tools.py # REST: timers, notes, backgrounds
│ │ └── assets.py # REST: outfit/pet state, backgrounds
│ ├── services/
│ │ ├── __init__.py
│ │ ├── memory.py # Honcho wrapper (context, prefs, summaries)
│ │ └── whisper_stream.py # Realtime WS attempt (archived use; fallback to REST)
│ └── models/
│ ├── __init__.py
│ └── schemas.py # Pydantic models
├── frontend/
│ ├── Dockerfile (nginx)
│ ├── package.json
│ ├── vite.config.ts
│ ├── tailwind.config.js
│ ├── index.html
│ ├── public/
│ │ └── live2d/ # Live2D model files
│ │ ├── kira.model3.json
│ │ ├── kira.moc3
│ │ └── textures/
│ └── src/
│ ├── main.tsx
│ ├── App.tsx
│ ├── api/
│ │ └── ws.ts # WebSocket client
│ ├── components/
│ │ ├── KiraAvatar.tsx # Live2D canvas wrapper
│ │ ├── BackgroundScene.tsx # Scene selector + overlay
│ │ ├── MusicPlayer.tsx # Lofi Girl YouTube embed
│ │ ├── Timer.tsx # Pomodoro + countdown
│ │ ├── Notes.tsx # Quick notes widget
│ │ ├── Clock.tsx # Digital clock
│ │ ├── PetZone.tsx # Two cats 🐱🐱
│ │ ├── Toolbar.tsx # Bottom nav
│ │ └── Wardrobe.tsx # Outfit/accessory picker
│ ├── hooks/
│ │ ├── useAudio.ts # Mic recording + playback
│ │ └── useKiraState.ts # Shared state
│ ├── styles/
│ │ └── theme.css # Girly-pop palette
│ └── types/
│ └── index.ts
Phase 1 Build Order
I'll build this in dependency order so each piece is testable:
Phase 1a — Skeleton (this session)
- Project scaffold (frontend + backend + docker-compose)
- Basic UI layout with girly-pop styling
- Clock widget
- Background scene selector (CSS gradient scenes)
- Music player (Lofi Girl YouTube embed)
- Timer widget (Pomodoro + countdown)
Phase 1b — Audio Pipeline
- Backend: FastAPI + WebSocket handler
- Backend: STT service (Whisper API)
- Backend: LLM service (DeepSeek)
- Backend: TTS service (OpenAI)
- Frontend: Microphone recording → WebSocket
- Frontend: Audio playback + conversation UI
Phase 1c — Kira the Avatar
- Integrate Live2D SDK with placeholder model
- Idle animations (blink, breath, random gestures)
- Lip-sync from TTS audio
- Wardrobe/outfit system
Phase 1d — Cats & Polish
- Pet zone with two cats (CSS-animated sprites)
- Notes widget
- Color polish, transitions, responsive layout
- Docker compose finalization + Caddy config
Key Design Decisions
Live2D Model
- We'll use a free sample Live2D model (Hiyori or Mao from Cubism SDK samples) as a placeholder
- Custom "Kira" model can be commissioned and swapped in later — the SDK integration is identical
- Lip-sync driven by analyzing TTS audio amplitude (simplified: no phoneme mapping needed)
Background Scenes
- CSS gradient + pattern scenes for Phase 1 (no image assets needed)
- Scene types: Cozy Room (warm), Coffee Shop (browns), Garden (greens), Rainy Window (blues), Starry Night (dark purples), Sakura Spring (pinks)
- Scene = animated CSS background + subtle particle overlay (rain, stars, petals)
The Two Cats
- CSS/Canvas-animated sprites (not Live2D — overkill for cats)
- Orange fluffy: larger, follows cursor sometimes, stretches out
- Black shorthair: smaller, sleeps curled up, occasional tail twitch
Outfit System
- Live2D models support model.json texture swapping
- Phase 1: pre-defined color palettes/outfits that swap texture files
- Outfits: Cozy Hoodie, Girly Dress, Pajama Set, Study Sweater, Going Out
Conversation Personality
- System prompt defines: female, kind, encouraging, ADHD-aware, body double presence
- Kira checks in: "How's it going? Need a timer? Want me to pick a scene?"
- Encouragement: "You've got this! 15 minutes down." / "Time for a stretch break?"
- Never judgmental. Always supportive.
Palette (Girly-Pop)
Primary bg: #FFF5F5 (soft pink-white)
Accent pink: #FFB6C1 (light pink)
Accent lav: #D8B4FE (lavender)
Accent mint: #A7F3D0 (mint)
Text primary: #4A1942 (deep plum)
Text soft: #7C3AED (violet)
Card bg: #FFFFFF (with soft shadow)
Highlight: #FDF2F8 (pink glow)
Deployment
docker compose up -d
Frontend: http://kira.hobokenchicken.com (via Caddy)
Backend: ws://kira.hobokenchicken.com/api/ws (WebSocket)
Runs on existing homelab stack. Add Caddy entry pointing to the frontend container.