hobokenchicken/kira

Author	SHA1	Message	Date
hobokenchicken	b06f20f9d8	chore: cleanup dead code and unused files Deleted 7 unused frontend components (ChatBubble, PetZone, Live2DCat, BackgroundScene, Clock, Toolbar, types/index.ts). Deleted unused backend: whisper_stream.py, empty models/ and routers/ dirs. Removed dead code: Message interface, messages/micError/sendText/addMessage, conversation_text handler, 3 unused memory.py methods. Fixed task persistence bug: added 'tasks' to DEFAULT_PREFERENCES. Stale comment in Live2DStage updated.	2026-06-06 12:05:28 -04:00
hobokenchicken	1bfc8333e9	clean: archive legacy stt/tts/llm services; update ARCHITECTURE.md + README.md to current stack (REST gpt-4o-transcribe, nano, sage, Honcho, incremental TTS, white noise) Per PLAN item 6. Legacy files moved to archive/legacy-pipeline/.	2026-06-04 15:54:12 -04:00
hobokenchicken	188da1d52a	fix(stt): try gpt-4o-realtime-preview as base session model + gpt-realtime-whisper for input_audio_transcription (per OpenAI error guidance)	2026-06-04 15:16:08 -04:00
hobokenchicken	191b7ad9b5	fix(stt): correct Realtime WS model to gpt-realtime-whisper + enhance event handling for deltas/completed - URL now uses ?model=gpt-realtime-whisper (was invalid gpt-4o-mini-realtime-preview) - Cleaned session.update (removed modalities that may not apply) - Expanded _handle to catch input_audio_transcription.delta and .completed events - on_error now forwards transcription errors to frontend client - Per AUDIT + PLAN item 1	2026-06-04 15:14:26 -04:00
hobokenchicken	7502f201c7	feat: Realtime WebSocket STT via gpt-realtime-whisper Replaces REST-based transcription (gpt-4o-transcribe) with WebSocket streaming via gpt-realtime-whisper. Frontend captures PCM16 audio and streams it through the backend to a Realtime transcription session. - Server-side VAD detects utterance boundaries automatically - Word-level transcript deltas stream to the client in real-time - On utterance end, gpt-5.4-nano generates a response - TTS streams back via with_streaming_response - Total pipeline: PCM16 → Realtime WS → LLM → streaming TTS	2026-06-04 14:26:19 -04:00
hobokenchicken	f2a5416408	feat: cheapest pipeline — gpt-4o-mini-transcribe + gpt-5.4-nano + TTS Simple 3-step chat completions pipeline at ~/usr/bin/bash.019/min total. Streams PCM16 audio from frontend, transcribes on release, generates response via gpt-5.4-nano, speaks via OpenAI TTS. Cost breakdown: gpt-4o-mini-transcribe: /usr/bin/bash.003/min gpt-5.4-nano: ~/usr/bin/bash.001/min OpenAI TTS (nova): /usr/bin/bash.015/min Total: ~/usr/bin/bash.019/min (~/usr/bin/bash.57/day at 30min)	2026-06-04 13:51:35 -04:00
hobokenchicken	66e799a655	fix: connect directly via websockets to bypass OpenAI-Beta header The openai library's beta.realtime.connect() hardcodes the obsolete 'OpenAI-Beta: realtime=v1' header which the GA API rejects. Connecting directly via the websockets library with only the Authorization header resolves the 'beta_api_shape_disabled' error.	2026-06-04 13:49:56 -04:00
hobokenchicken	274d04ea10	feat: hybrid pipeline — gpt-realtime-whisper + gpt-5.4-nano + TTS Hybrid approach gives streaming STT at ~/usr/bin/bash.017/min + cheap brain at ~/usr/bin/bash.001/min + TTS at ~/usr/bin/bash.015/min = ~/usr/bin/bash.033/min total. - gpt-realtime-whisper handles streaming transcription with VAD - gpt-5.4-nano handles response generation (chat completions) - OpenAI TTS (nova) for voice output - Server VAD detects utterance boundaries - Honcho memory context injected into system prompt - Removed old full Realtime relay service	2026-06-04 13:48:06 -04:00
hobokenchicken	1c15d42e06	fix: remove OpenAI-Beta header, use gpt-realtime-2 GA model The old OpenAI-Beta: realtime=v1 header is rejected by the GA API. Removing it via extra_headers override. Using gpt-realtime-2 which is the current production Realtime model.	2026-06-04 13:42:06 -04:00
hobokenchicken	e2332af8d0	feat: OpenAI Realtime API pipeline Replaced the 3-step sequential pipeline (Whisper STT → DeepSeek LLM → OpenAI TTS) with a single OpenAI Realtime API WebSocket using gpt-4o-mini-realtime-preview. - ~300-800ms latency vs 1-3s - Server VAD for automatic turn detection - Streaming audio chunks during playback - Interruptions: user can speak over Kira mid-response - Honcho memory still injected into session instructions - Frontend captures PCM16 mono 24kHz via AudioContext - Backend relays client ↔ OpenAI Realtime API - Supports both voice (PCM16) and text input	2026-06-04 13:32:39 -04:00
hobokenchicken	78ea059f08	feat: user personalization with Honcho-backed preferences - WelcomeScreen: first-time name entry with cute onboarding - identify WS message: sets user_id, loads saved prefs from Honcho - set_preference WS message: saves scene/outfit/accessory to Honcho metadata - Preferences auto-load on return visits via localStorage + Honcho peer meta - Kira uses the user's name in greeting and prompts - Backend: get/set preference methods in KiraMemory service - Frontend: optimistic preference updates, synced to backend on change	2026-06-04 11:00:58 -04:00
hobokenchicken	97424cb98f	init: Kira — AI body double with Honcho memory Full voice pipeline (Whisper STT -> DeepSeek LLM -> OpenAI TTS), animated SVG avatar (Live2D-ready), girly-pop UI, lofi music, timer/notes/pets/wardrobe widgets, 10 background scenes with particle effects, Honcho cross-session memory.	2026-06-04 10:51:38 -04:00

12 Commits