PCM16 capture via AudioContext was streaming raw audio continuously,
causing massive accumulated buffers that took ~20s to transcribe.
Replaced with MediaRecorder which records compressed Opus/webm and
sends a single blob on release — much smaller, faster to transcribe.
Also removed all unused PCM16/WAV helper functions from both frontend
and backend.
The backend sends Opus-encoded audio from OpenAI TTS (tts-1 with
response_format=opus). The frontend was treating it as raw PCM16
and wrapping it in a WAV container, which corrupted the audio into
static. Now plays the Opus data directly as audio/ogg.
Hybrid approach gives streaming STT at ~/usr/bin/bash.017/min + cheap brain
at ~/usr/bin/bash.001/min + TTS at ~/usr/bin/bash.015/min = ~/usr/bin/bash.033/min total.
- gpt-realtime-whisper handles streaming transcription with VAD
- gpt-5.4-nano handles response generation (chat completions)
- OpenAI TTS (nova) for voice output
- Server VAD detects utterance boundaries
- Honcho memory context injected into system prompt
- Removed old full Realtime relay service
Replaced the 3-step sequential pipeline (Whisper STT → DeepSeek LLM
→ OpenAI TTS) with a single OpenAI Realtime API WebSocket using
gpt-4o-mini-realtime-preview.
- ~300-800ms latency vs 1-3s
- Server VAD for automatic turn detection
- Streaming audio chunks during playback
- Interruptions: user can speak over Kira mid-response
- Honcho memory still injected into session instructions
- Frontend captures PCM16 mono 24kHz via AudioContext
- Backend relays client ↔ OpenAI Realtime API
- Supports both voice (PCM16) and text input
navigator.mediaDevices.getUserMedia() requires a secure context
(HTTPS or localhost). When accessed over plain HTTP, the API
is undefined. Now shows a friendly chat message instead of a
cryptic TypeError in the console.
- Registered pixi Ticker via (Live2DModel as any).registerTicker()
to fix 'No Ticker registered' warning and animation issues
- Fixed outfit texture swap: textures live on model.textures[]
not model.internalModel.textures[]
KiraAvatar: Added Talk mic button to Live2D view (was only in
AnimatedAvatar fallback). Includes listening-pulse animation.
MusicPlayer: Replaced hidden YouTube iframe with proper IFrame
Player API. Now starts on explicit user click (Start Lo-Fi button),
complying with browser autoplay policies. Supports station
switching and volume control after playback starts.
model.internalModel.coreModel.setTexture() expects a raw WebGL
texture, not a PixiJS Texture. Instead, set the new PixiJS Texture
directly on the model's internalModel.textures[2] array. The render
loop's bindTexture() call extracts the WebGL handle from the PixiJS
BaseTexture and passes it to the Cubism core.
This eliminates the cascade of try-catch fallbacks and the
'coreModel.setTexture is not a function' TypeError.
- Added isInteractive() stub on Live2DModel to prevent
'e.isInteractive is not a function' errors in pixi v7
- Swapped outfit texture loading to use Assets.load() with
cascading fallbacks (model.setTexture -> internalModel -> coreModel)
- Removed unused Texture import
- Added Epsilon Live2D model (Cubism 4) with full motion/expression set
- KiraAvatar now loads Live2D via PixiJS + cubism4 renderer
- Idle animation auto-plays on load
- Lip-sync: PARAM_MOUTH_OPEN_Y driven by speaking state
- 8 expressions (Normal, Smile, Sad, Angry, Surprised, Blushing, f01, f02)
- 15 motion files including idle, tap, flick, shake
- Physics, eye blink, and LipSync parameter groups configured
- Falls back to animated SVG placeholder if model isn't available
- WelcomeScreen: first-time name entry with cute onboarding
- identify WS message: sets user_id, loads saved prefs from Honcho
- set_preference WS message: saves scene/outfit/accessory to Honcho metadata
- Preferences auto-load on return visits via localStorage + Honcho peer meta
- Kira uses the user's name in greeting and prompts
- Backend: get/set preference methods in KiraMemory service
- Frontend: optimistic preference updates, synced to backend on change