Files
kira/ARCHITECTURE.md

8.1 KiB

Kira — Architecture Plan

Overview

Browser-based real-time AI body double. She talks to Kira (microphone → STT → LLM → TTS → speaker), Kira talks back with lip-sync. Lo-fi from Lofi Girl streaming in the background, two cats hanging out, customizable background scenes, timers, notes, and a full wardrobe/accessory system.


Tech Stack

Layer Choice Why
Frontend React 18 + Vite + TypeScript Fast dev loop, runs in any browser
Styling TailwindCSS + custom girly-pop theme Pink/lavender/mint palette, rounded everything
Live2D Cubism 4 Web SDK via pixi-live2d-display Full lip-sync, gestures, blink, idle anims
Backend Python FastAPI + WebSockets Async-native, easy AI API integration
STT OpenAI Whisper API Best transcription, <$0.01/min
LLM DeepSeek V4 (cloud API) Smart, fast reasoning, good personality adherence
TTS OpenAI TTS API Clean female voice, low latency
Music YouTube IFrame Player API (Lofi Girl channel) Free, endless streams, no backend proxy
Audio Pipeline MediaRecorder → chunks → WebSocket → backend Low-latency real-time conversation
Infrastructure Docker Compose + Caddy reverse proxy Deployable in homelab immediately

Data Flow (Conversation)

┌─────────────────────────────────────────────────────────┐
│  Browser (React + Live2D + girly UI)                     │
│                                                          │
│  [Mic] → MediaRecorder (webm/opus full utterance)       │
│       ↓ (WebSocket /api/ws)                              │
│  [FastAPI Backend]                                       │
│       ↓                                                 │
│  1. REST gpt-4o-transcribe → full text (with delta emit)|
│  2. gpt-5.4-nano + Honcho memory suffix                 │
│  3. OpenAI TTS (sage) streaming response → Opus chunks  │
│       ↑ (WebSocket)                                      │
│  [Incremental Audio playback + Live2D + "Hearing" UI]   │
│                                                          │
│  White noise (Web Audio) and lofi run independently.     │
│  VAD is client-button based (toggle Talk).               │
└─────────────────────────────────────────────────────────┘

Current: REST STT for reliability (Realtime WS attempted but blocked by model access).


Directory Structure

ai-body-double/
├── docker-compose.yml        # Single compose for both services
├── .env.example              # API keys template
├── backend/
│   ├── Dockerfile
│   ├── requirements.txt
│   ├── main.py               # FastAPI app + WebSocket handler
│   ├── config.py             # Environment/API config
│   ├── routers/
│   │   ├── __init__.py
│   │   ├── conversation.py   # WS: mic → STT → LLM → TTS roundtrip
│   │   ├── tools.py          # REST: timers, notes, backgrounds
│   │   └── assets.py         # REST: outfit/pet state, backgrounds
│   ├── services/
│   │   ├── __init__.py
│   │   ├── memory.py         # Honcho wrapper (context, prefs, summaries)
│   │   └── whisper_stream.py # Realtime WS attempt (archived use; fallback to REST)
│   └── models/
│       ├── __init__.py
│       └── schemas.py        # Pydantic models
├── frontend/
│   ├── Dockerfile (nginx)
│   ├── package.json
│   ├── vite.config.ts
│   ├── tailwind.config.js
│   ├── index.html
│   ├── public/
│   │   └── live2d/           # Live2D model files
│   │       ├── kira.model3.json
│   │       ├── kira.moc3
│   │       └── textures/
│   └── src/
│       ├── main.tsx
│       ├── App.tsx
│       ├── api/
│       │   └── ws.ts         # WebSocket client
│       ├── components/
│       │   ├── KiraAvatar.tsx     # Live2D canvas wrapper
│       │   ├── BackgroundScene.tsx # Scene selector + overlay
│       │   ├── MusicPlayer.tsx    # Lofi Girl YouTube embed
│       │   ├── Timer.tsx          # Pomodoro + countdown
│       │   ├── Notes.tsx          # Quick notes widget
│       │   ├── Clock.tsx          # Digital clock
│       │   ├── PetZone.tsx        # Two cats 🐱🐱
│       │   ├── Toolbar.tsx        # Bottom nav
│       │   └── Wardrobe.tsx       # Outfit/accessory picker
│       ├── hooks/
│       │   ├── useAudio.ts        # Mic recording + playback
│       │   └── useKiraState.ts    # Shared state
│       ├── styles/
│       │   └── theme.css          # Girly-pop palette
│       └── types/
│           └── index.ts

Phase 1 Build Order

I'll build this in dependency order so each piece is testable:

Phase 1a — Skeleton (this session)

  1. Project scaffold (frontend + backend + docker-compose)
  2. Basic UI layout with girly-pop styling
  3. Clock widget
  4. Background scene selector (CSS gradient scenes)
  5. Music player (Lofi Girl YouTube embed)
  6. Timer widget (Pomodoro + countdown)

Phase 1b — Audio Pipeline

  1. Backend: FastAPI + WebSocket handler
  2. Backend: STT service (Whisper API)
  3. Backend: LLM service (DeepSeek)
  4. Backend: TTS service (OpenAI)
  5. Frontend: Microphone recording → WebSocket
  6. Frontend: Audio playback + conversation UI

Phase 1c — Kira the Avatar

  1. Integrate Live2D SDK with placeholder model
  2. Idle animations (blink, breath, random gestures)
  3. Lip-sync from TTS audio
  4. Wardrobe/outfit system

Phase 1d — Cats & Polish

  1. Pet zone with two cats (CSS-animated sprites)
  2. Notes widget
  3. Color polish, transitions, responsive layout
  4. Docker compose finalization + Caddy config

Key Design Decisions

Live2D Model

  • We'll use a free sample Live2D model (Hiyori or Mao from Cubism SDK samples) as a placeholder
  • Custom "Kira" model can be commissioned and swapped in later — the SDK integration is identical
  • Lip-sync driven by analyzing TTS audio amplitude (simplified: no phoneme mapping needed)

Background Scenes

  • CSS gradient + pattern scenes for Phase 1 (no image assets needed)
  • Scene types: Cozy Room (warm), Coffee Shop (browns), Garden (greens), Rainy Window (blues), Starry Night (dark purples), Sakura Spring (pinks)
  • Scene = animated CSS background + subtle particle overlay (rain, stars, petals)

The Two Cats

  • CSS/Canvas-animated sprites (not Live2D — overkill for cats)
  • Orange fluffy: larger, follows cursor sometimes, stretches out
  • Black shorthair: smaller, sleeps curled up, occasional tail twitch

Outfit System

  • Live2D models support model.json texture swapping
  • Phase 1: pre-defined color palettes/outfits that swap texture files
  • Outfits: Cozy Hoodie, Girly Dress, Pajama Set, Study Sweater, Going Out

Conversation Personality

  • System prompt defines: female, kind, encouraging, ADHD-aware, body double presence
  • Kira checks in: "How's it going? Need a timer? Want me to pick a scene?"
  • Encouragement: "You've got this! 15 minutes down." / "Time for a stretch break?"
  • Never judgmental. Always supportive.

Palette (Girly-Pop)

Primary bg:    #FFF5F5  (soft pink-white)
Accent pink:   #FFB6C1  (light pink)
Accent lav:    #D8B4FE  (lavender)
Accent mint:   #A7F3D0  (mint)
Text primary:  #4A1942  (deep plum)
Text soft:     #7C3AED  (violet)
Card bg:       #FFFFFF  (with soft shadow)
Highlight:     #FDF2F8  (pink glow)

Deployment

docker compose up -d

Frontend: http://kira.hobokenchicken.com (via Caddy)
Backend:  ws://kira.hobokenchicken.com/api/ws (WebSocket)

Runs on existing homelab stack. Add Caddy entry pointing to the frontend container.