kira/ARCHITECTURE.md

# Kira — Architecture Plan

## Overview

Browser-based real-time AI body double. She talks to Kira (microphone → STT → LLM → TTS → speaker), Kira talks back with lip-sync. Lo-fi from Lofi Girl streaming in the background, two cats hanging out, customizable background scenes, timers, notes, and a full wardrobe/accessory system.

---

## Tech Stack

| Layer | Choice | Why |
|-------|--------|-----|
| **Frontend** | React 18 + Vite + TypeScript | Fast dev loop, runs in any browser |
| **Styling** | TailwindCSS + custom girly-pop theme | Pink/lavender/mint palette, rounded everything |
| **Live2D** | Cubism 4 Web SDK via pixi-live2d-display | Full lip-sync, gestures, blink, idle anims |
| **Backend** | Python FastAPI + WebSockets | Async-native, easy AI API integration |
| **STT** | OpenAI Whisper API | Best transcription, <$0.01/min |
| **LLM** | DeepSeek V4 (cloud API) | Smart, fast reasoning, good personality adherence |
| **TTS** | OpenAI TTS API | Clean female voice, low latency |
| **Music** | YouTube IFrame Player API (Lofi Girl channel) | Free, endless streams, no backend proxy |
| **Audio Pipeline** | MediaRecorder → chunks → WebSocket → backend | Low-latency real-time conversation |
| **Infrastructure** | Docker Compose + Caddy reverse proxy | Deployable in homelab immediately |

---

## Data Flow (Conversation)

```
┌─────────────────────────────────────────────────────────┐
│  Browser (React + Live2D + girly UI)                     │
│                                                          │
│  [Mic] → MediaRecorder (webm/opus full utterance)       │
│       ↓ (WebSocket /api/ws)                              │
│  [FastAPI Backend]                                       │
│       ↓                                                 │
│  1. REST gpt-4o-transcribe → full text (with delta emit)|
│  2. gpt-5.4-nano + Honcho memory suffix                 │
│  3. OpenAI TTS (sage) streaming response → Opus chunks  │
│       ↑ (WebSocket)                                      │
│  [Incremental Audio playback + Live2D + "Hearing" UI]   │
│                                                          │
│  White noise (Web Audio) and lofi run independently.     │
│  VAD is client-button based (toggle Talk).               │
└─────────────────────────────────────────────────────────┘
```

Current: REST STT for reliability (Realtime WS attempted but blocked by model access).

---

## Directory Structure

```
ai-body-double/
├── docker-compose.yml        # Single compose for both services
├── .env.example              # API keys template
├── backend/
│   ├── Dockerfile
│   ├── requirements.txt
│   ├── main.py               # FastAPI app + WebSocket handler
│   ├── config.py             # Environment/API config
│   ├── routers/
│   │   ├── __init__.py
│   │   ├── conversation.py   # WS: mic → STT → LLM → TTS roundtrip
│   │   ├── tools.py          # REST: timers, notes, backgrounds
│   │   └── assets.py         # REST: outfit/pet state, backgrounds
│   ├── services/
│   │   ├── __init__.py
│   │   ├── memory.py         # Honcho wrapper (context, prefs, summaries)
│   │   └── whisper_stream.py # Realtime WS attempt (archived use; fallback to REST)
│   └── models/
│       ├── __init__.py
│       └── schemas.py        # Pydantic models
├── frontend/
│   ├── Dockerfile (nginx)
│   ├── package.json
│   ├── vite.config.ts
│   ├── tailwind.config.js
│   ├── index.html
│   ├── public/
│   │   └── live2d/           # Live2D model files
│   │       ├── kira.model3.json
│   │       ├── kira.moc3
│   │       └── textures/
│   └── src/
│       ├── main.tsx
│       ├── App.tsx
│       ├── api/
│       │   └── ws.ts         # WebSocket client
│       ├── components/
│       │   ├── KiraAvatar.tsx     # Live2D canvas wrapper
│       │   ├── BackgroundScene.tsx # Scene selector + overlay
│       │   ├── MusicPlayer.tsx    # Lofi Girl YouTube embed
│       │   ├── Timer.tsx          # Pomodoro + countdown
│       │   ├── Notes.tsx          # Quick notes widget
│       │   ├── Clock.tsx          # Digital clock
│       │   ├── PetZone.tsx        # Two cats 🐱🐱
│       │   ├── Toolbar.tsx        # Bottom nav
│       │   └── Wardrobe.tsx       # Outfit/accessory picker
│       ├── hooks/
│       │   ├── useAudio.ts        # Mic recording + playback
│       │   └── useKiraState.ts    # Shared state
│       ├── styles/
│       │   └── theme.css          # Girly-pop palette
│       └── types/
│           └── index.ts
```

---

## Phase 1 Build Order

I'll build this in dependency order so each piece is testable:

### Phase 1a — Skeleton (this session)
1. Project scaffold (frontend + backend + docker-compose)
2. Basic UI layout with girly-pop styling
3. Clock widget
4. Background scene selector (CSS gradient scenes)
5. Music player (Lofi Girl YouTube embed)
6. Timer widget (Pomodoro + countdown)

### Phase 1b — Audio Pipeline
7. Backend: FastAPI + WebSocket handler
8. Backend: STT service (Whisper API)
9. Backend: LLM service (DeepSeek)
10. Backend: TTS service (OpenAI)
11. Frontend: Microphone recording → WebSocket
12. Frontend: Audio playback + conversation UI

### Phase 1c — Kira the Avatar
13. Integrate Live2D SDK with placeholder model
14. Idle animations (blink, breath, random gestures)
15. Lip-sync from TTS audio
16. Wardrobe/outfit system

### Phase 1d — Cats & Polish
17. Pet zone with two cats (CSS-animated sprites)
18. Notes widget
19. Color polish, transitions, responsive layout
20. Docker compose finalization + Caddy config

---

## Key Design Decisions

### Live2D Model
- We'll use a free sample Live2D model (Hiyori or Mao from Cubism SDK samples) as a placeholder
- Custom "Kira" model can be commissioned and swapped in later — the SDK integration is identical
- Lip-sync driven by analyzing TTS audio amplitude (simplified: no phoneme mapping needed)

### Background Scenes
- CSS gradient + pattern scenes for Phase 1 (no image assets needed)
- Scene types: Cozy Room (warm), Coffee Shop (browns), Garden (greens), Rainy Window (blues), Starry Night (dark purples), Sakura Spring (pinks)
- Scene = animated CSS background + subtle particle overlay (rain, stars, petals)

### The Two Cats
- CSS/Canvas-animated sprites (not Live2D — overkill for cats)
- Orange fluffy: larger, follows cursor sometimes, stretches out
- Black shorthair: smaller, sleeps curled up, occasional tail twitch

### Outfit System
- Live2D models support model.json texture swapping
- Phase 1: pre-defined color palettes/outfits that swap texture files
- Outfits: Cozy Hoodie, Girly Dress, Pajama Set, Study Sweater, Going Out

### Conversation Personality
- System prompt defines: female, kind, encouraging, ADHD-aware, body double presence
- Kira checks in: "How's it going? Need a timer? Want me to pick a scene?"
- Encouragement: "You've got this! 15 minutes down." / "Time for a stretch break?"
- Never judgmental. Always supportive.

---

## Palette (Girly-Pop)

```
Primary bg:    #FFF5F5  (soft pink-white)
Accent pink:   #FFB6C1  (light pink)
Accent lav:    #D8B4FE  (lavender)
Accent mint:   #A7F3D0  (mint)
Text primary:  #4A1942  (deep plum)
Text soft:     #7C3AED  (violet)
Card bg:       #FFFFFF  (with soft shadow)
Highlight:     #FDF2F8  (pink glow)
```

---

## Deployment

```
docker compose up -d

Frontend: http://kira.hobokenchicken.com (via Caddy)
Backend:  ws://kira.hobokenchicken.com/api/ws (WebSocket)
```

Runs on existing homelab stack. Add Caddy entry pointing to the frontend container.