init: Kira — AI body double with Honcho memory
Full voice pipeline (Whisper STT -> DeepSeek LLM -> OpenAI TTS), animated SVG avatar (Live2D-ready), girly-pop UI, lofi music, timer/notes/pets/wardrobe widgets, 10 background scenes with particle effects, Honcho cross-session memory.
This commit is contained in:
+199
@@ -0,0 +1,199 @@
|
||||
# Kira — Architecture Plan
|
||||
|
||||
## Overview
|
||||
|
||||
Browser-based real-time AI body double. She talks to Kira (microphone → STT → LLM → TTS → speaker), Kira talks back with lip-sync. Lo-fi from Lofi Girl streaming in the background, two cats hanging out, customizable background scenes, timers, notes, and a full wardrobe/accessory system.
|
||||
|
||||
---
|
||||
|
||||
## Tech Stack
|
||||
|
||||
| Layer | Choice | Why |
|
||||
|-------|--------|-----|
|
||||
| **Frontend** | React 18 + Vite + TypeScript | Fast dev loop, runs in any browser |
|
||||
| **Styling** | TailwindCSS + custom girly-pop theme | Pink/lavender/mint palette, rounded everything |
|
||||
| **Live2D** | Cubism 4 Web SDK via pixi-live2d-display | Full lip-sync, gestures, blink, idle anims |
|
||||
| **Backend** | Python FastAPI + WebSockets | Async-native, easy AI API integration |
|
||||
| **STT** | OpenAI Whisper API | Best transcription, <$0.01/min |
|
||||
| **LLM** | DeepSeek V4 (cloud API) | Smart, fast reasoning, good personality adherence |
|
||||
| **TTS** | OpenAI TTS API | Clean female voice, low latency |
|
||||
| **Music** | YouTube IFrame Player API (Lofi Girl channel) | Free, endless streams, no backend proxy |
|
||||
| **Audio Pipeline** | MediaRecorder → chunks → WebSocket → backend | Low-latency real-time conversation |
|
||||
| **Infrastructure** | Docker Compose + Caddy reverse proxy | Deployable in homelab immediately |
|
||||
|
||||
---
|
||||
|
||||
## Data Flow (Conversation)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Browser │
|
||||
│ │
|
||||
│ [Mic] → MediaRecorder → audio chunks │
|
||||
│ ↓ (WebSocket) │
|
||||
│ [FastAPI Backend] │
|
||||
│ ↓ │
|
||||
│ 1. Whisper API → text transcript │
|
||||
│ 2. DeepSeek V4 (system prompt: "You are Kira...") │
|
||||
│ 3. OpenAI TTS → audio buffer │
|
||||
│ ↑ (WebSocket) │
|
||||
│ [Audio Player + Live2D Lip-Sync] │
|
||||
│ │
|
||||
│ Kira's idle animations run between conversation turns │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Lo-fi music runs independently via YouTube embed — no backend involvement.
|
||||
|
||||
---
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
ai-body-double/
|
||||
├── docker-compose.yml # Single compose for both services
|
||||
├── .env.example # API keys template
|
||||
├── backend/
|
||||
│ ├── Dockerfile
|
||||
│ ├── requirements.txt
|
||||
│ ├── main.py # FastAPI app + WebSocket handler
|
||||
│ ├── config.py # Environment/API config
|
||||
│ ├── routers/
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── conversation.py # WS: mic → STT → LLM → TTS roundtrip
|
||||
│ │ ├── tools.py # REST: timers, notes, backgrounds
|
||||
│ │ └── assets.py # REST: outfit/pet state, backgrounds
|
||||
│ ├── services/
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── stt.py # Whisper API client
|
||||
│ │ ├── llm.py # DeepSeek chat client
|
||||
│ │ └── tts.py # OpenAI TTS client
|
||||
│ └── models/
|
||||
│ ├── __init__.py
|
||||
│ └── schemas.py # Pydantic models
|
||||
├── frontend/
|
||||
│ ├── Dockerfile (nginx)
|
||||
│ ├── package.json
|
||||
│ ├── vite.config.ts
|
||||
│ ├── tailwind.config.js
|
||||
│ ├── index.html
|
||||
│ ├── public/
|
||||
│ │ └── live2d/ # Live2D model files
|
||||
│ │ ├── kira.model3.json
|
||||
│ │ ├── kira.moc3
|
||||
│ │ └── textures/
|
||||
│ └── src/
|
||||
│ ├── main.tsx
|
||||
│ ├── App.tsx
|
||||
│ ├── api/
|
||||
│ │ └── ws.ts # WebSocket client
|
||||
│ ├── components/
|
||||
│ │ ├── KiraAvatar.tsx # Live2D canvas wrapper
|
||||
│ │ ├── BackgroundScene.tsx # Scene selector + overlay
|
||||
│ │ ├── MusicPlayer.tsx # Lofi Girl YouTube embed
|
||||
│ │ ├── Timer.tsx # Pomodoro + countdown
|
||||
│ │ ├── Notes.tsx # Quick notes widget
|
||||
│ │ ├── Clock.tsx # Digital clock
|
||||
│ │ ├── PetZone.tsx # Two cats 🐱🐱
|
||||
│ │ ├── Toolbar.tsx # Bottom nav
|
||||
│ │ └── Wardrobe.tsx # Outfit/accessory picker
|
||||
│ ├── hooks/
|
||||
│ │ ├── useAudio.ts # Mic recording + playback
|
||||
│ │ └── useKiraState.ts # Shared state
|
||||
│ ├── styles/
|
||||
│ │ └── theme.css # Girly-pop palette
|
||||
│ └── types/
|
||||
│ └── index.ts
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 1 Build Order
|
||||
|
||||
I'll build this in dependency order so each piece is testable:
|
||||
|
||||
### Phase 1a — Skeleton (this session)
|
||||
1. Project scaffold (frontend + backend + docker-compose)
|
||||
2. Basic UI layout with girly-pop styling
|
||||
3. Clock widget
|
||||
4. Background scene selector (CSS gradient scenes)
|
||||
5. Music player (Lofi Girl YouTube embed)
|
||||
6. Timer widget (Pomodoro + countdown)
|
||||
|
||||
### Phase 1b — Audio Pipeline
|
||||
7. Backend: FastAPI + WebSocket handler
|
||||
8. Backend: STT service (Whisper API)
|
||||
9. Backend: LLM service (DeepSeek)
|
||||
10. Backend: TTS service (OpenAI)
|
||||
11. Frontend: Microphone recording → WebSocket
|
||||
12. Frontend: Audio playback + conversation UI
|
||||
|
||||
### Phase 1c — Kira the Avatar
|
||||
13. Integrate Live2D SDK with placeholder model
|
||||
14. Idle animations (blink, breath, random gestures)
|
||||
15. Lip-sync from TTS audio
|
||||
16. Wardrobe/outfit system
|
||||
|
||||
### Phase 1d — Cats & Polish
|
||||
17. Pet zone with two cats (CSS-animated sprites)
|
||||
18. Notes widget
|
||||
19. Color polish, transitions, responsive layout
|
||||
20. Docker compose finalization + Caddy config
|
||||
|
||||
---
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
### Live2D Model
|
||||
- We'll use a free sample Live2D model (Hiyori or Mao from Cubism SDK samples) as a placeholder
|
||||
- Custom "Kira" model can be commissioned and swapped in later — the SDK integration is identical
|
||||
- Lip-sync driven by analyzing TTS audio amplitude (simplified: no phoneme mapping needed)
|
||||
|
||||
### Background Scenes
|
||||
- CSS gradient + pattern scenes for Phase 1 (no image assets needed)
|
||||
- Scene types: Cozy Room (warm), Coffee Shop (browns), Garden (greens), Rainy Window (blues), Starry Night (dark purples), Sakura Spring (pinks)
|
||||
- Scene = animated CSS background + subtle particle overlay (rain, stars, petals)
|
||||
|
||||
### The Two Cats
|
||||
- CSS/Canvas-animated sprites (not Live2D — overkill for cats)
|
||||
- Orange fluffy: larger, follows cursor sometimes, stretches out
|
||||
- Black shorthair: smaller, sleeps curled up, occasional tail twitch
|
||||
|
||||
### Outfit System
|
||||
- Live2D models support model.json texture swapping
|
||||
- Phase 1: pre-defined color palettes/outfits that swap texture files
|
||||
- Outfits: Cozy Hoodie, Girly Dress, Pajama Set, Study Sweater, Going Out
|
||||
|
||||
### Conversation Personality
|
||||
- System prompt defines: female, kind, encouraging, ADHD-aware, body double presence
|
||||
- Kira checks in: "How's it going? Need a timer? Want me to pick a scene?"
|
||||
- Encouragement: "You've got this! 15 minutes down." / "Time for a stretch break?"
|
||||
- Never judgmental. Always supportive.
|
||||
|
||||
---
|
||||
|
||||
## Palette (Girly-Pop)
|
||||
|
||||
```
|
||||
Primary bg: #FFF5F5 (soft pink-white)
|
||||
Accent pink: #FFB6C1 (light pink)
|
||||
Accent lav: #D8B4FE (lavender)
|
||||
Accent mint: #A7F3D0 (mint)
|
||||
Text primary: #4A1942 (deep plum)
|
||||
Text soft: #7C3AED (violet)
|
||||
Card bg: #FFFFFF (with soft shadow)
|
||||
Highlight: #FDF2F8 (pink glow)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment
|
||||
|
||||
```
|
||||
docker compose up -d
|
||||
|
||||
Frontend: http://kira.hobokenchicken.com (via Caddy)
|
||||
Backend: ws://kira.hobokenchicken.com/api/ws (WebSocket)
|
||||
```
|
||||
|
||||
Runs on existing homelab stack. Add Caddy entry pointing to the frontend container.
|
||||
Reference in New Issue
Block a user