12 Commits

Author SHA1 Message Date
hobokenchicken b06f20f9d8 chore: cleanup dead code and unused files
Deleted 7 unused frontend components (ChatBubble, PetZone, Live2DCat,
BackgroundScene, Clock, Toolbar, types/index.ts).
Deleted unused backend: whisper_stream.py, empty models/ and routers/ dirs.
Removed dead code: Message interface, messages/micError/sendText/addMessage,
conversation_text handler, 3 unused memory.py methods.
Fixed task persistence bug: added 'tasks' to DEFAULT_PREFERENCES.
Stale comment in Live2DStage updated.
2026-06-06 12:05:28 -04:00
hobokenchicken 1bfc8333e9 clean: archive legacy stt/tts/llm services; update ARCHITECTURE.md + README.md to current stack (REST gpt-4o-transcribe, nano, sage, Honcho, incremental TTS, white noise)
Per PLAN item 6. Legacy files moved to archive/legacy-pipeline/.
2026-06-04 15:54:12 -04:00
hobokenchicken 188da1d52a fix(stt): try gpt-4o-realtime-preview as base session model + gpt-realtime-whisper for input_audio_transcription (per OpenAI error guidance) 2026-06-04 15:16:08 -04:00
hobokenchicken 191b7ad9b5 fix(stt): correct Realtime WS model to gpt-realtime-whisper + enhance event handling for deltas/completed
- URL now uses ?model=gpt-realtime-whisper (was invalid gpt-4o-mini-realtime-preview)
- Cleaned session.update (removed modalities that may not apply)
- Expanded _handle to catch input_audio_transcription.delta and .completed events
- on_error now forwards transcription errors to frontend client
- Per AUDIT + PLAN item 1
2026-06-04 15:14:26 -04:00
hobokenchicken 7502f201c7 feat: Realtime WebSocket STT via gpt-realtime-whisper
Replaces REST-based transcription (gpt-4o-transcribe) with WebSocket
streaming via gpt-realtime-whisper. Frontend captures PCM16 audio and
streams it through the backend to a Realtime transcription session.

- Server-side VAD detects utterance boundaries automatically
- Word-level transcript deltas stream to the client in real-time
- On utterance end, gpt-5.4-nano generates a response
- TTS streams back via with_streaming_response
- Total pipeline: PCM16 → Realtime WS → LLM → streaming TTS
2026-06-04 14:26:19 -04:00
hobokenchicken f2a5416408 feat: cheapest pipeline — gpt-4o-mini-transcribe + gpt-5.4-nano + TTS
Simple 3-step chat completions pipeline at ~/usr/bin/bash.019/min total.
Streams PCM16 audio from frontend, transcribes on release,
generates response via gpt-5.4-nano, speaks via OpenAI TTS.

Cost breakdown:
  gpt-4o-mini-transcribe: /usr/bin/bash.003/min
  gpt-5.4-nano:          ~/usr/bin/bash.001/min
  OpenAI TTS (nova):     /usr/bin/bash.015/min
  Total:                 ~/usr/bin/bash.019/min (~/usr/bin/bash.57/day at 30min)
2026-06-04 13:51:35 -04:00
hobokenchicken 66e799a655 fix: connect directly via websockets to bypass OpenAI-Beta header
The openai library's beta.realtime.connect() hardcodes the obsolete
'OpenAI-Beta: realtime=v1' header which the GA API rejects. Connecting
directly via the websockets library with only the Authorization header
resolves the 'beta_api_shape_disabled' error.
2026-06-04 13:49:56 -04:00
hobokenchicken 274d04ea10 feat: hybrid pipeline — gpt-realtime-whisper + gpt-5.4-nano + TTS
Hybrid approach gives streaming STT at ~/usr/bin/bash.017/min + cheap brain
at ~/usr/bin/bash.001/min + TTS at ~/usr/bin/bash.015/min = ~/usr/bin/bash.033/min total.

- gpt-realtime-whisper handles streaming transcription with VAD
- gpt-5.4-nano handles response generation (chat completions)
- OpenAI TTS (nova) for voice output
- Server VAD detects utterance boundaries
- Honcho memory context injected into system prompt
- Removed old full Realtime relay service
2026-06-04 13:48:06 -04:00
hobokenchicken 1c15d42e06 fix: remove OpenAI-Beta header, use gpt-realtime-2 GA model
The old OpenAI-Beta: realtime=v1 header is rejected by the GA API.
Removing it via extra_headers override. Using gpt-realtime-2 which
is the current production Realtime model.
2026-06-04 13:42:06 -04:00
hobokenchicken e2332af8d0 feat: OpenAI Realtime API pipeline
Replaced the 3-step sequential pipeline (Whisper STT → DeepSeek LLM
→ OpenAI TTS) with a single OpenAI Realtime API WebSocket using
gpt-4o-mini-realtime-preview.

- ~300-800ms latency vs 1-3s
- Server VAD for automatic turn detection
- Streaming audio chunks during playback
- Interruptions: user can speak over Kira mid-response
- Honcho memory still injected into session instructions
- Frontend captures PCM16 mono 24kHz via AudioContext
- Backend relays client ↔ OpenAI Realtime API
- Supports both voice (PCM16) and text input
2026-06-04 13:32:39 -04:00
hobokenchicken 78ea059f08 feat: user personalization with Honcho-backed preferences
- WelcomeScreen: first-time name entry with cute onboarding
- identify WS message: sets user_id, loads saved prefs from Honcho
- set_preference WS message: saves scene/outfit/accessory to Honcho metadata
- Preferences auto-load on return visits via localStorage + Honcho peer meta
- Kira uses the user's name in greeting and prompts
- Backend: get/set preference methods in KiraMemory service
- Frontend: optimistic preference updates, synced to backend on change
2026-06-04 11:00:58 -04:00
hobokenchicken 97424cb98f init: Kira — AI body double with Honcho memory
Full voice pipeline (Whisper STT -> DeepSeek LLM -> OpenAI TTS),
animated SVG avatar (Live2D-ready), girly-pop UI, lofi music,
timer/notes/pets/wardrobe widgets, 10 background scenes with
particle effects, Honcho cross-session memory.
2026-06-04 10:51:38 -04:00