T

hobokenchicken 0e74a16b40 fix(stt): revert to reliable REST gpt-4o-transcribe + MediaRecorder full-blob (Realtime WS not accessible on key)

- Backend: added transcribe_audio (gpt-4o-transcribe), switched audio handler to full blob -> REST -> LLM -> streaming TTS
- Frontend: MediaRecorder (webm/opus) full recording sent on stop (one blob per utterance)
- Removed dead WhisperStream callbacks and pending_transcript/lock
- This unblocks voice per AUDIT item 1 (Option B fallback). Deltas will come in later item.
- Also preps for deprecation fix (MediaRecorder is the good path).

2026-06-04 15:23:57 -04:00

backend

fix(stt): revert to reliable REST gpt-4o-transcribe + MediaRecorder full-blob (Realtime WS not accessible on key)

2026-06-04 15:23:57 -04:00

frontend

fix(stt): revert to reliable REST gpt-4o-transcribe + MediaRecorder full-blob (Realtime WS not accessible on key)

2026-06-04 15:23:57 -04:00

scripts

feat: Live2D outfit textures + expression system + canvas tweaks

2026-06-04 11:40:10 -04:00

.env.example

init: Kira — AI body double with Honcho memory

2026-06-04 10:51:38 -04:00

.gitignore

init: Kira — AI body double with Honcho memory

2026-06-04 10:51:38 -04:00

ARCHITECTURE.md

init: Kira — AI body double with Honcho memory

2026-06-04 10:51:38 -04:00

AUDIT.md

fix(stt): correct Realtime WS model to gpt-realtime-whisper + enhance event handling for deltas/completed

2026-06-04 15:14:26 -04:00

docker-compose.yml

init: Kira — AI body double with Honcho memory

2026-06-04 10:51:38 -04:00

PLAN.md

fix(stt): correct Realtime WS model to gpt-realtime-whisper + enhance event handling for deltas/completed

2026-06-04 15:14:26 -04:00

PROJECT_SCOPE.md

init: Kira — AI body double with Honcho memory

2026-06-04 10:51:38 -04:00

README.md

init: Kira — AI body double with Honcho memory

2026-06-04 10:51:38 -04:00

README.md

Kira ✨

AI body double — a girly-pop focus companion with real-time voice conversation, lo-fi music, ADHD tools, and a customizable avatar.

Your wife's very own focus bestie who's always there when her real body double isn't available.

Features

🎙️ Voice conversation — Push-to-talk microphone → STT (Whisper) → LLM (DeepSeek) → TTS (OpenAI Nova voice)
💬 Text chat fallback — Type when you don't want to speak
🎶 Lo-fi music — Streaming from Lofi Girl YouTube channel
🍅 Timer — Pomodoro, countdown, stopwatch modes
📝 Notes — Quick task list / check-in notes
🎨 10 background scenes — Cozy room, coffee shop, garden, rainy window, starry night, sakura, ocean, autumn, winter cabin, sunset
✨ Particle effects — Rain, stars, cherry petals, snow per scene
👘 Wardrobe — 5 outfits + 5 accessories (bow, glasses, flower crown, earrings, scarf)
🐱 Pet zone — Two cats: Mochi (orange fluffy) and Luna (sleepy black)
🕐 Live clock — Time + date
🌸 Animated avatar — Blinking, speaking mouth, waving, outfit colors (Live2D-ready)

Quick Start

1. Get API keys

Service	What for	Where
OpenAI	Whisper STT + TTS (Nova voice)	https://platform.openai.com/api-keys
DeepSeek	LLM brain	https://platform.deepseek.com/api_keys

2. Configure

cd ~/Projects/ai-body-double
cp .env.example .env
# Edit .env with your API keys:
#   OPENAI_API_KEY=sk-...
#   DEEPSEEK_API_KEY=sk-...

3. Run

docker compose up -d

Open http://kira.hobokenchicken.com:3000 (or wherever you deploy it).

4. Add a Caddy entry (homelab)

kira.hobokenchicken.com {
    reverse_proxy 172.20.0.X:3000
}

Architecture

Browser ──WebSocket──▶ Backend (FastAPI)
  │                        │
  ├─ Mic audio ──────────▶ ├─ Whisper API (STT)
  │                        ├─ DeepSeek (LLM)
  │ ◀── TTS audio ──────── ├─ OpenAI TTS
  │                        │
  ├─ YouTube embed (lo-fi) │
  ├─ Timer / Notes / Cats  │
  └─ Animated avatar       │

Live2D Model Setup

Kira currently uses a CSS/SVG animated placeholder avatar. To add a Live2D model:

Commission or obtain a .model3.json (Cubism 4.x format) model
Place the model directory in frontend/public/live2d/models/
Rename the model entry point to kira.model3.json
The WebGL renderer will auto-detect and switch

Required model files:

kira.model3.json — model definition
kira.moc3 — mesh/deformer data
kira.cdi3.json — display info (optional)
textures/ — PNG texture files
motions/ — animation files (optional)
expressions/ — face expression files (optional)

Recommended creators for custom Live2D models: search VGen, VTube Studio model artists, or Fiverr.

Color Palette

Token	Hex	Usage
Kira Pink	`#FFB6C1`	Primary accent
Kira Lavender	`#D8B4FE`	Secondary accent
Kira Mint	`#A7F3D0`	Success/status
Background	`#FFF5F5`	Card/comfy bg
Text Plum	`#4A1942`	Body text
Text Violet	`#7C3AED`	Soft text

Project Structure

ai-body-double/
├── docker-compose.yml
├── .env
├── backend/          # FastAPI + WebSocket
│   ├── main.py       # WS handler (STT→LLM→TTS pipeline)
│   ├── services/
│   │   ├── stt.py    # OpenAI Whisper
│   │   ├── llm.py    # DeepSeek
│   │   └── tts.py    # OpenAI TTS
│   └── config.py     # Env config
├── frontend/         # React + Vite + TailwindCSS
│   ├── src/
│   │   ├── App.tsx           # Main layout
│   │   ├── components/
│   │   │   ├── AnimatedAvatar.tsx  # SVG animated character
│   │   │   ├── KiraAvatar.tsx      # Live2D loader + fallback
│   │   │   ├── ChatBubble.tsx      # Conversation display
│   │   │   ├── Timer.tsx           # Pomodoro/stopwatch
│   │   │   ├── MusicPlayer.tsx     # Lofi Girl embed
│   │   │   ├── PetZone.tsx         # Two CSS cats
│   │   │   ├── Wardrobe.tsx        # Outfit + accessories
│   │   │   └── Particles.tsx       # Rain/stars/petals/snow
│   │   └── hooks/
│   │       └── useConversation.ts  # WS + audio management
│   └── public/live2d/     # Cubism SDK + model slot
└── README.md

Languages

TypeScript 73.8%

Python 24.5%

CSS 0.8%

HTML 0.5%

Dockerfile 0.4%