# Kira Body Double — Systematic Fix Plan (from AUDIT.md) **Created:** 2026-06-04 **Source:** Full audit of codebase, running deployment (172.20.1.204), server logs, build outputs, component cross-references, and user-reported symptoms (33s+ delays, static, "Could not transcribe", console errors). **Goal:** Turn the current non-functional voice pipeline + incomplete UI into a reliable, low-latency body-double experience matching the original specs (realtime voice, lo-fi + white noise, ADHD tools, avatar customization, notes, clock, pet, backgrounds). **Approach:** Prioritize by dependency and impact. Fix core voice first (STT + audio playback + feedback). Then UI completeness. Then cleanup + polish. Deploy + verify after every change where possible. Re-test end-to-end on the live site at the end. **Principles:** - Prefer minimal, targeted patches over big refactors. - Keep the cheap 3-step pipeline (or Realtime where it actually helps latency). - Verify with real tool output (builds, docker logs, git, SSH health checks). - Update docs and todo as we go. - If a subtask blocks, note it and move to parallelizable items. **Total items:** 9 (matching the current todo list). Dependencies noted. --- ## 1. fix-realtime-stt (CRITICAL — blocks everything voice-related) **Status in audit:** Mic does nothing. WS connects then 4004s with `model_not_found`. **Root cause:** Wrong base model in WS URL (`gpt-4o-mini-realtime-preview`). `gpt-realtime-whisper` exists per OpenAI announcements but must be used as the `?model=` in the Realtime WS connect for transcription mode. Session config may also need tweaking for pure STT (no LLM responses from the realtime model). **Options:** A. Correct model + config for Realtime WS (preferred for true streaming partials + VAD). B. Revert to proven REST path (`gpt-4o-transcribe`) + simulate deltas if possible (cheaper, known working from history). **Plan steps:** 1. Read current `backend/services/whisper_stream.py` and `backend/main.py` WS handling. 2. Update WS URL to `?model=gpt-realtime-whisper` (and adjust session.update if needed — remove or set transcription model correctly). 3. Add better error logging + client-facing error on WS failure. 4. Rebuild/deploy. 5. Verify: SSH docker logs for "Whisper stream ready" + no 4004/model_not_found. Simulate or note user test. 6. If still fails after 1-2 attempts: implement Option B (switch main.py back to REST transcription + keep the streaming delta plumbing for future). **Risks:** OpenAI access tier for Realtime Whisper. Cost (Realtime is more expensive than REST transcription). **Success criteria:** Backend accepts audio chunks and emits `transcript_delta` + `on_done` / `transcript` without errors. No more "Could not transcribe". **Dependencies:** None (first). ## 2. streaming-audio-playback (HIGH — fixes perceived latency) **Status:** Chunks stream from backend but frontend only plays on `speaking_end`. **Plan steps:** 1. Read `frontend/src/hooks/useConversation.ts` audio handling + `backend/main.py` synthesize_speech. 2. Switch frontend to play chunks incrementally: - Option: Use a single Audio element with `srcObject` + MediaSource / SourceBuffer for Opus chunks (complex but true streaming). - Simpler reliable: Create new `