kira/PLAN.md

# Kira Body Double — Systematic Fix Plan (from AUDIT.md)

**Created:** 2026-06-04
**Source:** Full audit of codebase, running deployment (172.20.1.204), server logs, build outputs, component cross-references, and user-reported symptoms (33s+ delays, static, "Could not transcribe", console errors).
**Goal:** Turn the current non-functional voice pipeline + incomplete UI into a reliable, low-latency body-double experience matching the original specs (realtime voice, lo-fi + white noise, ADHD tools, avatar customization, notes, clock, pet, backgrounds).
**Approach:** Prioritize by dependency and impact. Fix core voice first (STT + audio playback + feedback). Then UI completeness. Then cleanup + polish. Deploy + verify after every change where possible. Re-test end-to-end on the live site at the end.
**Principles:**
- Prefer minimal, targeted patches over big refactors.
- Keep the cheap 3-step pipeline (or Realtime where it actually helps latency).
- Verify with real tool output (builds, docker logs, git, SSH health checks).
- Update docs and todo as we go.
- If a subtask blocks, note it and move to parallelizable items.

**Total items:** 9 (matching the current todo list). Dependencies noted.

---

## 1. fix-realtime-stt (CRITICAL — blocks everything voice-related)
**Status in audit:** Mic does nothing. WS connects then 4004s with `model_not_found`.
**Root cause:** Wrong base model in WS URL (`gpt-4o-mini-realtime-preview`). `gpt-realtime-whisper` exists per OpenAI announcements but must be used as the `?model=` in the Realtime WS connect for transcription mode. Session config may also need tweaking for pure STT (no LLM responses from the realtime model).
**Options:**
  A. Correct model + config for Realtime WS (preferred for true streaming partials + VAD).
  B. Revert to proven REST path (`gpt-4o-transcribe`) + simulate deltas if possible (cheaper, known working from history).
**Plan steps:**
  1. Read current `backend/services/whisper_stream.py` and `backend/main.py` WS handling.
  2. Update WS URL to `?model=gpt-realtime-whisper` (and adjust session.update if needed — remove or set transcription model correctly).
  3. Add better error logging + client-facing error on WS failure.
  4. Rebuild/deploy.
  5. Verify: SSH docker logs for "Whisper stream ready" + no 4004/model_not_found. Simulate or note user test.
  6. If still fails after 1-2 attempts: implement Option B (switch main.py back to REST transcription + keep the streaming delta plumbing for future).
**Risks:** OpenAI access tier for Realtime Whisper. Cost (Realtime is more expensive than REST transcription).
**Success criteria:** Backend accepts audio chunks and emits `transcript_delta` + `on_done` / `transcript` without errors. No more "Could not transcribe".
**Dependencies:** None (first).

## 2. streaming-audio-playback (HIGH — fixes perceived latency)
**Status:** Chunks stream from backend but frontend only plays on `speaking_end`.
**Plan steps:**
  1. Read `frontend/src/hooks/useConversation.ts` audio handling + `backend/main.py` synthesize_speech.
  2. Switch frontend to play chunks incrementally:
     - Option: Use a single Audio element with `srcObject` + MediaSource / SourceBuffer for Opus chunks (complex but true streaming).
     - Simpler reliable: Create new `<audio>` elements or use Web Audio API to decode/play each small chunk as it arrives (low latency for first words).
     - Keep lip-sync tied to `isKiraSpeaking`.
  3. Update backend to send smaller chunks or keep current streaming.
  4. Handle `speaking_end` only for final cleanup / state.
  5. Test build + deploy.
  6. Verify with logs + manual timing (first audio chunk arrives ~1-2s after LLM).
**Success criteria:** User hears Kira's voice starting within ~1-2s of her finishing speaking (first words audible while rest is still generating).
**Dependencies:** 1 (need working STT to test TTS flow).

## 3. live-transcript-ui (MEDIUM — UX for body-double feel)
**Status:** Deltas arrive but are dropped.
**Plan steps:**
  1. In `useConversation.ts`: expose live partial transcript state (append deltas, clear on `transcript` or `speaking_start`).
  2. Pass the live partial to `ChatBubble.tsx` (or a new "Hearing..." bubble).
  3. In ChatBubble: show a temporary "user is speaking: [live text]" or "Kira hears: ..." line with typing indicator while listening.
  4. Clear it when final transcript is processed or on error.
  5. Style girly-pop (pink bubble or inline).
  6. Build/deploy/verify.
**Success criteria:** While holding Talk (or during server VAD), user sees real-time words appearing as she speaks.
**Dependencies:** 1 (deltas only come from working STT).

## 4. integrate-notes (LOW — quick completeness win)
**Status:** Component exists and is local-state only. Dead import in App.
**Plan steps:**
  1. In `App.tsx`: import is already there — add `<Notes />` to the grid (e.g., replace or add alongside Timer or in a new row/column for balance).
  2. Consider making notes persistent (localStorage) so they survive refresh (optional quick win).
  3. Test layout on 1920x1080.
  4. Build/deploy.
**Success criteria:** Notes panel visible and functional in the main dashboard.
**Dependencies:** None.

## 5. add-white-noise (MEDIUM — matches original spec)
**Status:** Spec: "white noise". Only YouTube lofi exists.
**Plan steps:**
  1. Decide sources: free public-domain / CC0 ambient tracks (rain, cafe, fan, brown noise) hosted or use Web Audio oscillator + filters for simple generated noise (no external deps, reliable).
  2. Extend or new component `WhiteNoisePlayer.tsx` (similar UX to MusicPlayer: buttons for presets, volume, play/pause).
  3. Add to grid in `App.tsx` (perhaps replace or share space with MusicPlayer, or add a small ambient section).
  4. Integrate with scenes if possible (e.g., rain scene enables rain sound).
  5. Use `<audio>` elements with loop for tracks (host small mp3/wav in public/sounds or generate).
  6. For zero-asset version: use AudioContext + noise buffer (pink/brown noise generator).
  7. Build + deploy.
**Success criteria:** User can toggle white noise independently of (or layered with) lofi. Matches "white noise" in original request.
**Dependencies:** 4 (grid space).

## 6. clean-legacy-dead-code (MEDIUM — hygiene)
**Status:** Old services + docs describe pre-Honcho, pre-nano, pre-Sage, pre-streaming world.
**Plan steps:**
  1. Delete (or move to `archive/`) `backend/services/stt.py`, `tts.py`, `llm.py`.
  2. Update `backend/requirements.txt` if any unique deps were only for them.
  3. Rewrite sections in:
     - `README.md` (features, quick start, pipeline, costs).
     - `ARCHITECTURE.md` (data flow, stack table, current Realtime attempt).
  4. Update `PLAN.md` and `AUDIT.md` references if needed.
  5. Remove unused imports in `main.py` (time, pending_transcript, lock).
  6. Remove dead `recorderRef` in frontend hook.
  7. Commit + deploy (frontend/backend rebuild not strictly needed but do it).
**Success criteria:** No more references to the old Whisper+DeepSeek+Nova pipeline. Codebase matches reality.
**Dependencies:** Can be done in parallel with others.

## 7. fix-deprecations (MEDIUM — polish + warnings)
**Subtasks (can batch):**
  a. ScriptProcessorNode → AudioWorkletNode (or at minimum wrap the deprecation in a try and log once).
  b. YouTube music: suppress or fix postMessage/CORS (common YouTube IFrame issues; options: host a silent proxy, use a different lofi source like a free stream, or add `origin` handling + user gesture guards. For now, add try/catch around YT init and a note in UI).
  c. Timer stopwatch: replace the hacky formula + logic with proper elapsed counter (use a separate `elapsed` state that only increments when mode==='stopwatch').
  d. Any other console warnings from build or previous user reports (Pixi interaction deprecation is already noted in history; check current build).
**Plan steps:**
  1. Fix one subtask at a time in frontend (Timer.tsx, useConversation.ts, MusicPlayer.tsx).
  2. Rebuild frontend after changes.
  3. Verify no new errors in `npm run build`.
**Success criteria:** Clean console on load + mic use. Stopwatch actually counts up.
**Dependencies:** None (frontend-only).

## 8. fix-welcome-nesting (LOW)
**Status:** When saved ID exists, WelcomeScreen (full min-h-screen) is rendered inside a card div.
**Plan steps:**
  1. In `App.tsx`: adjust the conditional rendering for WelcomeScreen — either make it always full-bleed (portal or separate route-like) or simplify the "identify" flow so it doesn't nest.
  2. Test both new-user and returning-user paths.
  3. Build/deploy.
**Success criteria:** Welcome screen looks correct (full or nicely carded) for both paths. No layout breakage.
**Dependencies:** None.

## 9. audit-followup (FINAL)
**Plan steps:**
  1. After all above: full rebuild + deploy to 172.20.1.204.
  2. SSH: `docker compose logs --tail 20`, health check, git status on server.
  3. Update `AUDIT.md` with "Fixed" sections + remaining known issues.
  4. Update this PLAN.md with completion notes.
  5. Update todo list to all completed.
  6. Provide user with summary + "hard refresh and test mic" instructions.
  7. Optional: add a simple test script or note for future.
**Success criteria:** Live site has working (or clearly documented fallback) voice + all spec items visible. No critical items from original audit left open.
**Dependencies:** All previous.

---

## Execution Order & Parallelism
- **Phase 1 (Voice Core — do sequentially):** 1 → 2 → 3
- **Phase 2 (UI Completeness):** 4, 5 (can start after 4)
- **Phase 3 (Cleanup & Polish — parallelizable):** 6, 7, 8
- **Phase 4:** 9 (only after all)

**Estimated effort:** 1-2 hours of focused tool use + deploys (real time longer due to builds/SSH).

**Verification at each step:**
- `cd frontend && npm run build`
- `cd backend && python -c "import main; from services... import ..."`
- `git add -A && git commit -m "..." && git push`
- `ssh root@172.20.1.204 "cd /opt/kira && git pull && docker compose up -d --build 2>&1 | tail -5"`
- `ssh ... "docker logs kira-backend --tail 15 | grep -E 'whisper|STT|error|ready'" `
- Health: `curl http://localhost:8000/api/health` on server
- For frontend changes: note that nginx serves the new dist after rebuild.

**Rollback:** Every commit is atomic. Server has restart policy. Keep a "before" tag if needed.

**Post-plan:** After 9, offer to save this workflow as a skill if it proves reusable.

Start execution now. First action: tackle item 1 (fix-realtime-stt) by correcting the model name.