Files
kira/ARCHITECTURE.md

200 lines
8.1 KiB
Markdown

# Kira — Architecture Plan
## Overview
Browser-based real-time AI body double. She talks to Kira (microphone → STT → LLM → TTS → speaker), Kira talks back with lip-sync. Lo-fi from Lofi Girl streaming in the background, two cats hanging out, customizable background scenes, timers, notes, and a full wardrobe/accessory system.
---
## Tech Stack
| Layer | Choice | Why |
|-------|--------|-----|
| **Frontend** | React 18 + Vite + TypeScript | Fast dev loop, runs in any browser |
| **Styling** | TailwindCSS + custom girly-pop theme | Pink/lavender/mint palette, rounded everything |
| **Live2D** | Cubism 4 Web SDK via pixi-live2d-display | Full lip-sync, gestures, blink, idle anims |
| **Backend** | Python FastAPI + WebSockets | Async-native, easy AI API integration |
| **STT** | OpenAI Whisper API | Best transcription, <$0.01/min |
| **LLM** | DeepSeek V4 (cloud API) | Smart, fast reasoning, good personality adherence |
| **TTS** | OpenAI TTS API | Clean female voice, low latency |
| **Music** | YouTube IFrame Player API (Lofi Girl channel) | Free, endless streams, no backend proxy |
| **Audio Pipeline** | MediaRecorder → chunks → WebSocket → backend | Low-latency real-time conversation |
| **Infrastructure** | Docker Compose + Caddy reverse proxy | Deployable in homelab immediately |
---
## Data Flow (Conversation)
```
┌─────────────────────────────────────────────────────────┐
│ Browser (React + Live2D + girly UI) │
│ │
│ [Mic] → MediaRecorder (webm/opus full utterance) │
│ ↓ (WebSocket /api/ws) │
│ [FastAPI Backend] │
│ ↓ │
│ 1. REST gpt-4o-transcribe → full text (with delta emit)|
│ 2. gpt-5.4-nano + Honcho memory suffix │
│ 3. OpenAI TTS (sage) streaming response → Opus chunks │
│ ↑ (WebSocket) │
│ [Incremental Audio playback + Live2D + "Hearing" UI] │
│ │
│ White noise (Web Audio) and lofi run independently. │
│ VAD is client-button based (toggle Talk). │
└─────────────────────────────────────────────────────────┘
```
Current: REST STT for reliability (Realtime WS attempted but blocked by model access).
---
## Directory Structure
```
ai-body-double/
├── docker-compose.yml # Single compose for both services
├── .env.example # API keys template
├── backend/
│ ├── Dockerfile
│ ├── requirements.txt
│ ├── main.py # FastAPI app + WebSocket handler
│ ├── config.py # Environment/API config
│ ├── routers/
│ │ ├── __init__.py
│ │ ├── conversation.py # WS: mic → STT → LLM → TTS roundtrip
│ │ ├── tools.py # REST: timers, notes, backgrounds
│ │ └── assets.py # REST: outfit/pet state, backgrounds
│ ├── services/
│ │ ├── __init__.py
│ │ ├── memory.py # Honcho wrapper (context, prefs, summaries)
│ │ └── whisper_stream.py # Realtime WS attempt (archived use; fallback to REST)
│ └── models/
│ ├── __init__.py
│ └── schemas.py # Pydantic models
├── frontend/
│ ├── Dockerfile (nginx)
│ ├── package.json
│ ├── vite.config.ts
│ ├── tailwind.config.js
│ ├── index.html
│ ├── public/
│ │ └── live2d/ # Live2D model files
│ │ ├── kira.model3.json
│ │ ├── kira.moc3
│ │ └── textures/
│ └── src/
│ ├── main.tsx
│ ├── App.tsx
│ ├── api/
│ │ └── ws.ts # WebSocket client
│ ├── components/
│ │ ├── KiraAvatar.tsx # Live2D canvas wrapper
│ │ ├── BackgroundScene.tsx # Scene selector + overlay
│ │ ├── MusicPlayer.tsx # Lofi Girl YouTube embed
│ │ ├── Timer.tsx # Pomodoro + countdown
│ │ ├── Notes.tsx # Quick notes widget
│ │ ├── Clock.tsx # Digital clock
│ │ ├── PetZone.tsx # Two cats 🐱🐱
│ │ ├── Toolbar.tsx # Bottom nav
│ │ └── Wardrobe.tsx # Outfit/accessory picker
│ ├── hooks/
│ │ ├── useAudio.ts # Mic recording + playback
│ │ └── useKiraState.ts # Shared state
│ ├── styles/
│ │ └── theme.css # Girly-pop palette
│ └── types/
│ └── index.ts
```
---
## Phase 1 Build Order
I'll build this in dependency order so each piece is testable:
### Phase 1a — Skeleton (this session)
1. Project scaffold (frontend + backend + docker-compose)
2. Basic UI layout with girly-pop styling
3. Clock widget
4. Background scene selector (CSS gradient scenes)
5. Music player (Lofi Girl YouTube embed)
6. Timer widget (Pomodoro + countdown)
### Phase 1b — Audio Pipeline
7. Backend: FastAPI + WebSocket handler
8. Backend: STT service (Whisper API)
9. Backend: LLM service (DeepSeek)
10. Backend: TTS service (OpenAI)
11. Frontend: Microphone recording → WebSocket
12. Frontend: Audio playback + conversation UI
### Phase 1c — Kira the Avatar
13. Integrate Live2D SDK with placeholder model
14. Idle animations (blink, breath, random gestures)
15. Lip-sync from TTS audio
16. Wardrobe/outfit system
### Phase 1d — Cats & Polish
17. Pet zone with two cats (CSS-animated sprites)
18. Notes widget
19. Color polish, transitions, responsive layout
20. Docker compose finalization + Caddy config
---
## Key Design Decisions
### Live2D Model
- We'll use a free sample Live2D model (Hiyori or Mao from Cubism SDK samples) as a placeholder
- Custom "Kira" model can be commissioned and swapped in later — the SDK integration is identical
- Lip-sync driven by analyzing TTS audio amplitude (simplified: no phoneme mapping needed)
### Background Scenes
- CSS gradient + pattern scenes for Phase 1 (no image assets needed)
- Scene types: Cozy Room (warm), Coffee Shop (browns), Garden (greens), Rainy Window (blues), Starry Night (dark purples), Sakura Spring (pinks)
- Scene = animated CSS background + subtle particle overlay (rain, stars, petals)
### The Two Cats
- CSS/Canvas-animated sprites (not Live2D — overkill for cats)
- Orange fluffy: larger, follows cursor sometimes, stretches out
- Black shorthair: smaller, sleeps curled up, occasional tail twitch
### Outfit System
- Live2D models support model.json texture swapping
- Phase 1: pre-defined color palettes/outfits that swap texture files
- Outfits: Cozy Hoodie, Girly Dress, Pajama Set, Study Sweater, Going Out
### Conversation Personality
- System prompt defines: female, kind, encouraging, ADHD-aware, body double presence
- Kira checks in: "How's it going? Need a timer? Want me to pick a scene?"
- Encouragement: "You've got this! 15 minutes down." / "Time for a stretch break?"
- Never judgmental. Always supportive.
---
## Palette (Girly-Pop)
```
Primary bg: #FFF5F5 (soft pink-white)
Accent pink: #FFB6C1 (light pink)
Accent lav: #D8B4FE (lavender)
Accent mint: #A7F3D0 (mint)
Text primary: #4A1942 (deep plum)
Text soft: #7C3AED (violet)
Card bg: #FFFFFF (with soft shadow)
Highlight: #FDF2F8 (pink glow)
```
---
## Deployment
```
docker compose up -d
Frontend: http://kira.hobokenchicken.com (via Caddy)
Backend: ws://kira.hobokenchicken.com/api/ws (WebSocket)
```
Runs on existing homelab stack. Add Caddy entry pointing to the frontend container.