Files
kira/ARCHITECTURE.md
T
hobokenchicken 97424cb98f init: Kira — AI body double with Honcho memory
Full voice pipeline (Whisper STT -> DeepSeek LLM -> OpenAI TTS),
animated SVG avatar (Live2D-ready), girly-pop UI, lofi music,
timer/notes/pets/wardrobe widgets, 10 background scenes with
particle effects, Honcho cross-session memory.
2026-06-04 10:51:38 -04:00

8.0 KiB

Kira — Architecture Plan

Overview

Browser-based real-time AI body double. She talks to Kira (microphone → STT → LLM → TTS → speaker), Kira talks back with lip-sync. Lo-fi from Lofi Girl streaming in the background, two cats hanging out, customizable background scenes, timers, notes, and a full wardrobe/accessory system.


Tech Stack

Layer Choice Why
Frontend React 18 + Vite + TypeScript Fast dev loop, runs in any browser
Styling TailwindCSS + custom girly-pop theme Pink/lavender/mint palette, rounded everything
Live2D Cubism 4 Web SDK via pixi-live2d-display Full lip-sync, gestures, blink, idle anims
Backend Python FastAPI + WebSockets Async-native, easy AI API integration
STT OpenAI Whisper API Best transcription, <$0.01/min
LLM DeepSeek V4 (cloud API) Smart, fast reasoning, good personality adherence
TTS OpenAI TTS API Clean female voice, low latency
Music YouTube IFrame Player API (Lofi Girl channel) Free, endless streams, no backend proxy
Audio Pipeline MediaRecorder → chunks → WebSocket → backend Low-latency real-time conversation
Infrastructure Docker Compose + Caddy reverse proxy Deployable in homelab immediately

Data Flow (Conversation)

┌─────────────────────────────────────────────────────────┐
│  Browser                                                 │
│                                                          │
│  [Mic] → MediaRecorder → audio chunks                   │
│       ↓ (WebSocket)                                      │
│  [FastAPI Backend]                                       │
│       ↓                                                 │
│  1. Whisper API → text transcript                       │
│  2. DeepSeek V4 (system prompt: "You are Kira...")     │
│  3. OpenAI TTS → audio buffer                           │
│       ↑ (WebSocket)                                      │
│  [Audio Player + Live2D Lip-Sync]                       │
│                                                          │
│  Kira's idle animations run between conversation turns   │
└─────────────────────────────────────────────────────────┘

Lo-fi music runs independently via YouTube embed — no backend involvement.


Directory Structure

ai-body-double/
├── docker-compose.yml        # Single compose for both services
├── .env.example              # API keys template
├── backend/
│   ├── Dockerfile
│   ├── requirements.txt
│   ├── main.py               # FastAPI app + WebSocket handler
│   ├── config.py             # Environment/API config
│   ├── routers/
│   │   ├── __init__.py
│   │   ├── conversation.py   # WS: mic → STT → LLM → TTS roundtrip
│   │   ├── tools.py          # REST: timers, notes, backgrounds
│   │   └── assets.py         # REST: outfit/pet state, backgrounds
│   ├── services/
│   │   ├── __init__.py
│   │   ├── stt.py            # Whisper API client
│   │   ├── llm.py            # DeepSeek chat client
│   │   └── tts.py            # OpenAI TTS client
│   └── models/
│       ├── __init__.py
│       └── schemas.py        # Pydantic models
├── frontend/
│   ├── Dockerfile (nginx)
│   ├── package.json
│   ├── vite.config.ts
│   ├── tailwind.config.js
│   ├── index.html
│   ├── public/
│   │   └── live2d/           # Live2D model files
│   │       ├── kira.model3.json
│   │       ├── kira.moc3
│   │       └── textures/
│   └── src/
│       ├── main.tsx
│       ├── App.tsx
│       ├── api/
│       │   └── ws.ts         # WebSocket client
│       ├── components/
│       │   ├── KiraAvatar.tsx     # Live2D canvas wrapper
│       │   ├── BackgroundScene.tsx # Scene selector + overlay
│       │   ├── MusicPlayer.tsx    # Lofi Girl YouTube embed
│       │   ├── Timer.tsx          # Pomodoro + countdown
│       │   ├── Notes.tsx          # Quick notes widget
│       │   ├── Clock.tsx          # Digital clock
│       │   ├── PetZone.tsx        # Two cats 🐱🐱
│       │   ├── Toolbar.tsx        # Bottom nav
│       │   └── Wardrobe.tsx       # Outfit/accessory picker
│       ├── hooks/
│       │   ├── useAudio.ts        # Mic recording + playback
│       │   └── useKiraState.ts    # Shared state
│       ├── styles/
│       │   └── theme.css          # Girly-pop palette
│       └── types/
│           └── index.ts

Phase 1 Build Order

I'll build this in dependency order so each piece is testable:

Phase 1a — Skeleton (this session)

  1. Project scaffold (frontend + backend + docker-compose)
  2. Basic UI layout with girly-pop styling
  3. Clock widget
  4. Background scene selector (CSS gradient scenes)
  5. Music player (Lofi Girl YouTube embed)
  6. Timer widget (Pomodoro + countdown)

Phase 1b — Audio Pipeline

  1. Backend: FastAPI + WebSocket handler
  2. Backend: STT service (Whisper API)
  3. Backend: LLM service (DeepSeek)
  4. Backend: TTS service (OpenAI)
  5. Frontend: Microphone recording → WebSocket
  6. Frontend: Audio playback + conversation UI

Phase 1c — Kira the Avatar

  1. Integrate Live2D SDK with placeholder model
  2. Idle animations (blink, breath, random gestures)
  3. Lip-sync from TTS audio
  4. Wardrobe/outfit system

Phase 1d — Cats & Polish

  1. Pet zone with two cats (CSS-animated sprites)
  2. Notes widget
  3. Color polish, transitions, responsive layout
  4. Docker compose finalization + Caddy config

Key Design Decisions

Live2D Model

  • We'll use a free sample Live2D model (Hiyori or Mao from Cubism SDK samples) as a placeholder
  • Custom "Kira" model can be commissioned and swapped in later — the SDK integration is identical
  • Lip-sync driven by analyzing TTS audio amplitude (simplified: no phoneme mapping needed)

Background Scenes

  • CSS gradient + pattern scenes for Phase 1 (no image assets needed)
  • Scene types: Cozy Room (warm), Coffee Shop (browns), Garden (greens), Rainy Window (blues), Starry Night (dark purples), Sakura Spring (pinks)
  • Scene = animated CSS background + subtle particle overlay (rain, stars, petals)

The Two Cats

  • CSS/Canvas-animated sprites (not Live2D — overkill for cats)
  • Orange fluffy: larger, follows cursor sometimes, stretches out
  • Black shorthair: smaller, sleeps curled up, occasional tail twitch

Outfit System

  • Live2D models support model.json texture swapping
  • Phase 1: pre-defined color palettes/outfits that swap texture files
  • Outfits: Cozy Hoodie, Girly Dress, Pajama Set, Study Sweater, Going Out

Conversation Personality

  • System prompt defines: female, kind, encouraging, ADHD-aware, body double presence
  • Kira checks in: "How's it going? Need a timer? Want me to pick a scene?"
  • Encouragement: "You've got this! 15 minutes down." / "Time for a stretch break?"
  • Never judgmental. Always supportive.

Palette (Girly-Pop)

Primary bg:    #FFF5F5  (soft pink-white)
Accent pink:   #FFB6C1  (light pink)
Accent lav:    #D8B4FE  (lavender)
Accent mint:   #A7F3D0  (mint)
Text primary:  #4A1942  (deep plum)
Text soft:     #7C3AED  (violet)
Card bg:       #FFFFFF  (with soft shadow)
Highlight:     #FDF2F8  (pink glow)

Deployment

docker compose up -d

Frontend: http://kira.hobokenchicken.com (via Caddy)
Backend:  ws://kira.hobokenchicken.com/api/ws (WebSocket)

Runs on existing homelab stack. Add Caddy entry pointing to the frontend container.