hobokenchicken 0e74a16b40 fix(stt): revert to reliable REST gpt-4o-transcribe + MediaRecorder full-blob (Realtime WS not accessible on key)
- Backend: added transcribe_audio (gpt-4o-transcribe), switched audio handler to full blob -> REST -> LLM -> streaming TTS
- Frontend: MediaRecorder (webm/opus) full recording sent on stop (one blob per utterance)
- Removed dead WhisperStream callbacks and pending_transcript/lock
- This unblocks voice per AUDIT item 1 (Option B fallback). Deltas will come in later item.
- Also preps for deprecation fix (MediaRecorder is the good path).
2026-06-04 15:23:57 -04:00

Kira

AI body double — a girly-pop focus companion with real-time voice conversation, lo-fi music, ADHD tools, and a customizable avatar.

Your wife's very own focus bestie who's always there when her real body double isn't available.

Features

  • 🎙️ Voice conversation — Push-to-talk microphone → STT (Whisper) → LLM (DeepSeek) → TTS (OpenAI Nova voice)
  • 💬 Text chat fallback — Type when you don't want to speak
  • 🎶 Lo-fi music — Streaming from Lofi Girl YouTube channel
  • 🍅 Timer — Pomodoro, countdown, stopwatch modes
  • 📝 Notes — Quick task list / check-in notes
  • 🎨 10 background scenes — Cozy room, coffee shop, garden, rainy window, starry night, sakura, ocean, autumn, winter cabin, sunset
  • Particle effects — Rain, stars, cherry petals, snow per scene
  • 👘 Wardrobe — 5 outfits + 5 accessories (bow, glasses, flower crown, earrings, scarf)
  • 🐱 Pet zone — Two cats: Mochi (orange fluffy) and Luna (sleepy black)
  • 🕐 Live clock — Time + date
  • 🌸 Animated avatar — Blinking, speaking mouth, waving, outfit colors (Live2D-ready)

Quick Start

1. Get API keys

Service What for Where
OpenAI Whisper STT + TTS (Nova voice) https://platform.openai.com/api-keys
DeepSeek LLM brain https://platform.deepseek.com/api_keys

2. Configure

cd ~/Projects/ai-body-double
cp .env.example .env
# Edit .env with your API keys:
#   OPENAI_API_KEY=sk-...
#   DEEPSEEK_API_KEY=sk-...

3. Run

docker compose up -d

Open http://kira.hobokenchicken.com:3000 (or wherever you deploy it).

4. Add a Caddy entry (homelab)

kira.hobokenchicken.com {
    reverse_proxy 172.20.0.X:3000
}

Architecture

Browser ──WebSocket──▶ Backend (FastAPI)
  │                        │
  ├─ Mic audio ──────────▶ ├─ Whisper API (STT)
  │                        ├─ DeepSeek (LLM)
  │ ◀── TTS audio ──────── ├─ OpenAI TTS
  │                        │
  ├─ YouTube embed (lo-fi) │
  ├─ Timer / Notes / Cats  │
  └─ Animated avatar       │

Live2D Model Setup

Kira currently uses a CSS/SVG animated placeholder avatar. To add a Live2D model:

  1. Commission or obtain a .model3.json (Cubism 4.x format) model
  2. Place the model directory in frontend/public/live2d/models/
  3. Rename the model entry point to kira.model3.json
  4. The WebGL renderer will auto-detect and switch

Required model files:

  • kira.model3.json — model definition
  • kira.moc3 — mesh/deformer data
  • kira.cdi3.json — display info (optional)
  • textures/ — PNG texture files
  • motions/ — animation files (optional)
  • expressions/ — face expression files (optional)

Recommended creators for custom Live2D models: search VGen, VTube Studio model artists, or Fiverr.

Color Palette

Token Hex Usage
Kira Pink #FFB6C1 Primary accent
Kira Lavender #D8B4FE Secondary accent
Kira Mint #A7F3D0 Success/status
Background #FFF5F5 Card/comfy bg
Text Plum #4A1942 Body text
Text Violet #7C3AED Soft text

Project Structure

ai-body-double/
├── docker-compose.yml
├── .env
├── backend/          # FastAPI + WebSocket
│   ├── main.py       # WS handler (STT→LLM→TTS pipeline)
│   ├── services/
│   │   ├── stt.py    # OpenAI Whisper
│   │   ├── llm.py    # DeepSeek
│   │   └── tts.py    # OpenAI TTS
│   └── config.py     # Env config
├── frontend/         # React + Vite + TailwindCSS
│   ├── src/
│   │   ├── App.tsx           # Main layout
│   │   ├── components/
│   │   │   ├── AnimatedAvatar.tsx  # SVG animated character
│   │   │   ├── KiraAvatar.tsx      # Live2D loader + fallback
│   │   │   ├── ChatBubble.tsx      # Conversation display
│   │   │   ├── Timer.tsx           # Pomodoro/stopwatch
│   │   │   ├── MusicPlayer.tsx     # Lofi Girl embed
│   │   │   ├── PetZone.tsx         # Two CSS cats
│   │   │   ├── Wardrobe.tsx        # Outfit + accessories
│   │   │   └── Particles.tsx       # Rain/stars/petals/snow
│   │   └── hooks/
│   │       └── useConversation.ts  # WS + audio management
│   └── public/live2d/     # Cubism SDK + model slot
└── README.md
S
Description
No description provided
Readme 136 MiB
Languages
TypeScript 73.8%
Python 24.5%
CSS 0.8%
HTML 0.5%
Dockerfile 0.4%