# Kira — Architecture Plan ## Overview Browser-based real-time AI body double. She talks to Kira (microphone → STT → LLM → TTS → speaker), Kira talks back with lip-sync. Lo-fi from Lofi Girl streaming in the background, two cats hanging out, customizable background scenes, timers, notes, and a full wardrobe/accessory system. --- ## Tech Stack | Layer | Choice | Why | |-------|--------|-----| | **Frontend** | React 18 + Vite + TypeScript | Fast dev loop, runs in any browser | | **Styling** | TailwindCSS + custom girly-pop theme | Pink/lavender/mint palette, rounded everything | | **Live2D** | Cubism 4 Web SDK via pixi-live2d-display | Full lip-sync, gestures, blink, idle anims | | **Backend** | Python FastAPI + WebSockets | Async-native, easy AI API integration | | **STT** | OpenAI Whisper API | Best transcription, <$0.01/min | | **LLM** | DeepSeek V4 (cloud API) | Smart, fast reasoning, good personality adherence | | **TTS** | OpenAI TTS API | Clean female voice, low latency | | **Music** | YouTube IFrame Player API (Lofi Girl channel) | Free, endless streams, no backend proxy | | **Audio Pipeline** | MediaRecorder → chunks → WebSocket → backend | Low-latency real-time conversation | | **Infrastructure** | Docker Compose + Caddy reverse proxy | Deployable in homelab immediately | --- ## Data Flow (Conversation) ``` ┌─────────────────────────────────────────────────────────┐ │ Browser │ │ │ │ [Mic] → MediaRecorder → audio chunks │ │ ↓ (WebSocket) │ │ [FastAPI Backend] │ │ ↓ │ │ 1. Whisper API → text transcript │ │ 2. DeepSeek V4 (system prompt: "You are Kira...") │ │ 3. OpenAI TTS → audio buffer │ │ ↑ (WebSocket) │ │ [Audio Player + Live2D Lip-Sync] │ │ │ │ Kira's idle animations run between conversation turns │ └─────────────────────────────────────────────────────────┘ ``` Lo-fi music runs independently via YouTube embed — no backend involvement. --- ## Directory Structure ``` ai-body-double/ ├── docker-compose.yml # Single compose for both services ├── .env.example # API keys template ├── backend/ │ ├── Dockerfile │ ├── requirements.txt │ ├── main.py # FastAPI app + WebSocket handler │ ├── config.py # Environment/API config │ ├── routers/ │ │ ├── __init__.py │ │ ├── conversation.py # WS: mic → STT → LLM → TTS roundtrip │ │ ├── tools.py # REST: timers, notes, backgrounds │ │ └── assets.py # REST: outfit/pet state, backgrounds │ ├── services/ │ │ ├── __init__.py │ │ ├── stt.py # Whisper API client │ │ ├── llm.py # DeepSeek chat client │ │ └── tts.py # OpenAI TTS client │ └── models/ │ ├── __init__.py │ └── schemas.py # Pydantic models ├── frontend/ │ ├── Dockerfile (nginx) │ ├── package.json │ ├── vite.config.ts │ ├── tailwind.config.js │ ├── index.html │ ├── public/ │ │ └── live2d/ # Live2D model files │ │ ├── kira.model3.json │ │ ├── kira.moc3 │ │ └── textures/ │ └── src/ │ ├── main.tsx │ ├── App.tsx │ ├── api/ │ │ └── ws.ts # WebSocket client │ ├── components/ │ │ ├── KiraAvatar.tsx # Live2D canvas wrapper │ │ ├── BackgroundScene.tsx # Scene selector + overlay │ │ ├── MusicPlayer.tsx # Lofi Girl YouTube embed │ │ ├── Timer.tsx # Pomodoro + countdown │ │ ├── Notes.tsx # Quick notes widget │ │ ├── Clock.tsx # Digital clock │ │ ├── PetZone.tsx # Two cats 🐱🐱 │ │ ├── Toolbar.tsx # Bottom nav │ │ └── Wardrobe.tsx # Outfit/accessory picker │ ├── hooks/ │ │ ├── useAudio.ts # Mic recording + playback │ │ └── useKiraState.ts # Shared state │ ├── styles/ │ │ └── theme.css # Girly-pop palette │ └── types/ │ └── index.ts ``` --- ## Phase 1 Build Order I'll build this in dependency order so each piece is testable: ### Phase 1a — Skeleton (this session) 1. Project scaffold (frontend + backend + docker-compose) 2. Basic UI layout with girly-pop styling 3. Clock widget 4. Background scene selector (CSS gradient scenes) 5. Music player (Lofi Girl YouTube embed) 6. Timer widget (Pomodoro + countdown) ### Phase 1b — Audio Pipeline 7. Backend: FastAPI + WebSocket handler 8. Backend: STT service (Whisper API) 9. Backend: LLM service (DeepSeek) 10. Backend: TTS service (OpenAI) 11. Frontend: Microphone recording → WebSocket 12. Frontend: Audio playback + conversation UI ### Phase 1c — Kira the Avatar 13. Integrate Live2D SDK with placeholder model 14. Idle animations (blink, breath, random gestures) 15. Lip-sync from TTS audio 16. Wardrobe/outfit system ### Phase 1d — Cats & Polish 17. Pet zone with two cats (CSS-animated sprites) 18. Notes widget 19. Color polish, transitions, responsive layout 20. Docker compose finalization + Caddy config --- ## Key Design Decisions ### Live2D Model - We'll use a free sample Live2D model (Hiyori or Mao from Cubism SDK samples) as a placeholder - Custom "Kira" model can be commissioned and swapped in later — the SDK integration is identical - Lip-sync driven by analyzing TTS audio amplitude (simplified: no phoneme mapping needed) ### Background Scenes - CSS gradient + pattern scenes for Phase 1 (no image assets needed) - Scene types: Cozy Room (warm), Coffee Shop (browns), Garden (greens), Rainy Window (blues), Starry Night (dark purples), Sakura Spring (pinks) - Scene = animated CSS background + subtle particle overlay (rain, stars, petals) ### The Two Cats - CSS/Canvas-animated sprites (not Live2D — overkill for cats) - Orange fluffy: larger, follows cursor sometimes, stretches out - Black shorthair: smaller, sleeps curled up, occasional tail twitch ### Outfit System - Live2D models support model.json texture swapping - Phase 1: pre-defined color palettes/outfits that swap texture files - Outfits: Cozy Hoodie, Girly Dress, Pajama Set, Study Sweater, Going Out ### Conversation Personality - System prompt defines: female, kind, encouraging, ADHD-aware, body double presence - Kira checks in: "How's it going? Need a timer? Want me to pick a scene?" - Encouragement: "You've got this! 15 minutes down." / "Time for a stretch break?" - Never judgmental. Always supportive. --- ## Palette (Girly-Pop) ``` Primary bg: #FFF5F5 (soft pink-white) Accent pink: #FFB6C1 (light pink) Accent lav: #D8B4FE (lavender) Accent mint: #A7F3D0 (mint) Text primary: #4A1942 (deep plum) Text soft: #7C3AED (violet) Card bg: #FFFFFF (with soft shadow) Highlight: #FDF2F8 (pink glow) ``` --- ## Deployment ``` docker compose up -d Frontend: http://kira.hobokenchicken.com (via Caddy) Backend: ws://kira.hobokenchicken.com/api/ws (WebSocket) ``` Runs on existing homelab stack. Add Caddy entry pointing to the frontend container.