init: Kira — AI body double with Honcho memory

Full voice pipeline (Whisper STT -> DeepSeek LLM -> OpenAI TTS), animated SVG avatar (Live2D-ready), girly-pop UI, lofi music, timer/notes/pets/wardrobe widgets, 10 background scenes with particle effects, Honcho cross-session memory.
2026-06-04 10:51:38 -04:00
commit 97424cb98f
47 changed files with 5691 additions and 0 deletions
@@ -0,0 +1,199 @@
+# Kira — Architecture Plan
+
+## Overview
+
+Browser-based real-time AI body double. She talks to Kira (microphone → STT → LLM → TTS → speaker), Kira talks back with lip-sync. Lo-fi from Lofi Girl streaming in the background, two cats hanging out, customizable background scenes, timers, notes, and a full wardrobe/accessory system.
+
+---
+
+## Tech Stack
+
+| Layer | Choice | Why |
+|-------|--------|-----|
+| **Frontend** | React 18 + Vite + TypeScript | Fast dev loop, runs in any browser |
+| **Styling** | TailwindCSS + custom girly-pop theme | Pink/lavender/mint palette, rounded everything |
+| **Live2D** | Cubism 4 Web SDK via pixi-live2d-display | Full lip-sync, gestures, blink, idle anims |
+| **Backend** | Python FastAPI + WebSockets | Async-native, easy AI API integration |
+| **STT** | OpenAI Whisper API | Best transcription, <$0.01/min |
+| **LLM** | DeepSeek V4 (cloud API) | Smart, fast reasoning, good personality adherence |
+| **TTS** | OpenAI TTS API | Clean female voice, low latency |
+| **Music** | YouTube IFrame Player API (Lofi Girl channel) | Free, endless streams, no backend proxy |
+| **Audio Pipeline** | MediaRecorder → chunks → WebSocket → backend | Low-latency real-time conversation |
+| **Infrastructure** | Docker Compose + Caddy reverse proxy | Deployable in homelab immediately |
+
+---
+
+## Data Flow (Conversation)
+
+```
+┌─────────────────────────────────────────────────────────┐
+│  Browser                                                 │
+│                                                          │
+│  [Mic] → MediaRecorder → audio chunks                   │
+│       ↓ (WebSocket)                                      │
+│  [FastAPI Backend]                                       │
+│       ↓                                                 │
+│  1. Whisper API → text transcript                       │
+│  2. DeepSeek V4 (system prompt: "You are Kira...")     │
+│  3. OpenAI TTS → audio buffer                           │
+│       ↑ (WebSocket)                                      │
+│  [Audio Player + Live2D Lip-Sync]                       │
+│                                                          │
+│  Kira's idle animations run between conversation turns   │
+└─────────────────────────────────────────────────────────┘
+```
+
+Lo-fi music runs independently via YouTube embed — no backend involvement.
+
+---
+
+## Directory Structure
+
+```
+ai-body-double/
+├── docker-compose.yml        # Single compose for both services
+├── .env.example              # API keys template
+├── backend/
+│   ├── Dockerfile
+│   ├── requirements.txt
+│   ├── main.py               # FastAPI app + WebSocket handler
+│   ├── config.py             # Environment/API config
+│   ├── routers/
+│   │   ├── __init__.py
+│   │   ├── conversation.py   # WS: mic → STT → LLM → TTS roundtrip
+│   │   ├── tools.py          # REST: timers, notes, backgrounds
+│   │   └── assets.py         # REST: outfit/pet state, backgrounds
+│   ├── services/
+│   │   ├── __init__.py
+│   │   ├── stt.py            # Whisper API client
+│   │   ├── llm.py            # DeepSeek chat client
+│   │   └── tts.py            # OpenAI TTS client
+│   └── models/
+│       ├── __init__.py
+│       └── schemas.py        # Pydantic models
+├── frontend/
+│   ├── Dockerfile (nginx)
+│   ├── package.json
+│   ├── vite.config.ts
+│   ├── tailwind.config.js
+│   ├── index.html
+│   ├── public/
+│   │   └── live2d/           # Live2D model files
+│   │       ├── kira.model3.json
+│   │       ├── kira.moc3
+│   │       └── textures/
+│   └── src/
+│       ├── main.tsx
+│       ├── App.tsx
+│       ├── api/
+│       │   └── ws.ts         # WebSocket client
+│       ├── components/
+│       │   ├── KiraAvatar.tsx     # Live2D canvas wrapper
+│       │   ├── BackgroundScene.tsx # Scene selector + overlay
+│       │   ├── MusicPlayer.tsx    # Lofi Girl YouTube embed
+│       │   ├── Timer.tsx          # Pomodoro + countdown
+│       │   ├── Notes.tsx          # Quick notes widget
+│       │   ├── Clock.tsx          # Digital clock
+│       │   ├── PetZone.tsx        # Two cats 🐱🐱
+│       │   ├── Toolbar.tsx        # Bottom nav
+│       │   └── Wardrobe.tsx       # Outfit/accessory picker
+│       ├── hooks/
+│       │   ├── useAudio.ts        # Mic recording + playback
+│       │   └── useKiraState.ts    # Shared state
+│       ├── styles/
+│       │   └── theme.css          # Girly-pop palette
+│       └── types/
+│           └── index.ts
+```
+
+---
+
+## Phase 1 Build Order
+
+I'll build this in dependency order so each piece is testable:
+
+### Phase 1a — Skeleton (this session)
+1. Project scaffold (frontend + backend + docker-compose)
+2. Basic UI layout with girly-pop styling
+3. Clock widget
+4. Background scene selector (CSS gradient scenes)
+5. Music player (Lofi Girl YouTube embed)
+6. Timer widget (Pomodoro + countdown)
+
+### Phase 1b — Audio Pipeline
+7. Backend: FastAPI + WebSocket handler
+8. Backend: STT service (Whisper API)
+9. Backend: LLM service (DeepSeek)
+10. Backend: TTS service (OpenAI)
+11. Frontend: Microphone recording → WebSocket
+12. Frontend: Audio playback + conversation UI
+
+### Phase 1c — Kira the Avatar
+13. Integrate Live2D SDK with placeholder model
+14. Idle animations (blink, breath, random gestures)
+15. Lip-sync from TTS audio
+16. Wardrobe/outfit system
+
+### Phase 1d — Cats & Polish
+17. Pet zone with two cats (CSS-animated sprites)
+18. Notes widget
+19. Color polish, transitions, responsive layout
+20. Docker compose finalization + Caddy config
+
+---
+
+## Key Design Decisions
+
+### Live2D Model
+- We'll use a free sample Live2D model (Hiyori or Mao from Cubism SDK samples) as a placeholder
+- Custom "Kira" model can be commissioned and swapped in later — the SDK integration is identical
+- Lip-sync driven by analyzing TTS audio amplitude (simplified: no phoneme mapping needed)
+
+### Background Scenes
+- CSS gradient + pattern scenes for Phase 1 (no image assets needed)
+- Scene types: Cozy Room (warm), Coffee Shop (browns), Garden (greens), Rainy Window (blues), Starry Night (dark purples), Sakura Spring (pinks)
+- Scene = animated CSS background + subtle particle overlay (rain, stars, petals)
+
+### The Two Cats
+- CSS/Canvas-animated sprites (not Live2D — overkill for cats)
+- Orange fluffy: larger, follows cursor sometimes, stretches out
+- Black shorthair: smaller, sleeps curled up, occasional tail twitch
+
+### Outfit System
+- Live2D models support model.json texture swapping
+- Phase 1: pre-defined color palettes/outfits that swap texture files
+- Outfits: Cozy Hoodie, Girly Dress, Pajama Set, Study Sweater, Going Out
+
+### Conversation Personality
+- System prompt defines: female, kind, encouraging, ADHD-aware, body double presence
+- Kira checks in: "How's it going? Need a timer? Want me to pick a scene?"
+- Encouragement: "You've got this! 15 minutes down." / "Time for a stretch break?"
+- Never judgmental. Always supportive.
+
+---
+
+## Palette (Girly-Pop)
+
+```
+Primary bg:    #FFF5F5  (soft pink-white)
+Accent pink:   #FFB6C1  (light pink)
+Accent lav:    #D8B4FE  (lavender)
+Accent mint:   #A7F3D0  (mint)
+Text primary:  #4A1942  (deep plum)
+Text soft:     #7C3AED  (violet)
+Card bg:       #FFFFFF  (with soft shadow)
+Highlight:     #FDF2F8  (pink glow)
+```
+
+---
+
+## Deployment
+
+```
+docker compose up -d
+
+Frontend: http://kira.hobokenchicken.com (via Caddy)
+Backend:  ws://kira.hobokenchicken.com/api/ws (WebSocket)
+```
+
+Runs on existing homelab stack. Add Caddy entry pointing to the frontend container.