- Extract layout constants matching Tailwind config (PAD, LEFT_W, GAP, RIGHT_W)
- positionModels() helper computes exact pixel positions from layout
- Kira: centered in center panel at 78% of available space
- Mochi: 120px tall, centered in right sidebar, above status bar
- Both models reposition on window resize
Live2DStage creates ONE full-viewport transparent canvas (z-0, pointer-events:none).
Both Kira and Mochi cat models render on the same Pixi stage and WebGL context.
KiraAvatar is now UI-only (no canvas), receives model ref from stage.
PetZone is label-only. Eliminates all WebGL context conflict errors.
Cat gets its own canvas with WebGL1 context (Kira uses WebGL2 by default).
Different GL versions don't share buffers, so no bindBuffer spam.
Cat now renders in the PetZone section of the right sidebar where it belongs.
Removed all shared-context onAppReady plumbing.
Single WebGL context, no bindBuffer spam. Cat model loads onto
KiraAvatar's stage and positions itself at bottom-right corner.
PetZone passes app prop through to Live2DCat.
Reverts shared-context approach. Live2DCat gets its own canvas with
forceCanvas:true (Canvas2D renderer), which avoids the WebGL bindBuffer
spam entirely. Cleaned up onAppReady prop from KiraAvatar.
Eliminates WebGL bindBuffer/bindTexture spam from dual Application contexts.
Cat model now loads onto KiraAvatar's shared stage via onAppReady callback.
- Copied LittleCat model files to frontend/public/live2d/models/little-cat/
- Using the black alternate texture as default
- Created Live2DCat component that renders the model in a small canvas
- PetZone now shows a single Live2D cat instead of two SVG cats
React was potentially clearing the canvas on re-render because we
appended it manually to a div. Now using a <canvas ref={canvasRef}>
element directly in JSX that React manages. Pixi app uses .
Scale set to 82% of container.
Previous approach set CSS width:100% on a low-res canvas, causing the browser
to stretch/pixelate the model. Now using Pixi's built-in resizeTo so the
canvas internal resolution always matches the container. Model scaled to 90%
of container with centered anchor.
Pixi renderer.resize() overwrites canvas inline width/height styles,
locking the canvas to the initial size and leaving empty space below.
Now we re-apply width:100%;height:100% after every resize so the canvas
always fills its container. Removed unused appRef.
Problem: flex layout wasn't ready on first paint, so clientWidth fell back
to 400px. Canvas was 400px wide but parent was only 288px, causing the
avatar to be clipped on the right.
Fix: ResizeObserver measures real laid-out size before init. Canvas forced
to width/height 100% via CSS so it never overflows. Model scaled to 68%
with centered anchor. Resize handled dynamically.
Replaced the hero + scrollable grid with a fixed-height three-column
workspace:
- Left (fixed 288px): Kira avatar + compact chat + text input
- Center (flex): Large focus timer + notes
- Right (fixed 256px): Music, white noise, wardrobe, pets
Thin top bar: scene selector dots + clock
Thin bottom bar: status + connection indicator
No cards, no scrollable grid, no wasted space. Clean, modern,
everything visible at once. Avatar fills full sidebar height.
All 15+ glass-card instances removed across every component (Timer, Music,
Notes, WhiteNoise, PetZone, Clock, ChatBubble, Wardrobe, Toolbar, KiraAvatar,
BackgroundScene, WelcomeScreen, App text input + bottom bar).
New design: widgets sit directly on the gradient background with only padding,
no frosted-glass backgrounds, borders, or shadows. Cleaner, more modern look.
- Avatar now centered in its own row above the tools grid (was crammed in column 1)
- KiraAvatar container: min-height 33vh, canvas up to 500px wide
- Tools reorganized into 4 columns below: Chat, Timer+Music, Notes+Noise, Clock+Pets+Wardrobe
- WelcomeScreen restored to full (not compact) for first-time users
The REST STT revert commit (0e74a16) deleted lines 78-262 including the message loop, identify handler, audio handler, text handler, and disconnect cleanup. This caused the WS to accept then immediately close, triggering a reconnect loop.
Refactored for clarity: transcribe_audio(), get_kira_response(), stream_tts() as standalone async helpers. Full pipeline restored.
All items finished and 'clean' applied (legacy archived, docs updated, deprecations suppressed, nesting fixed, white noise added, etc.).
Site at https://kira.kaylassafe.space should now have working voice (REST path), incremental TTS, live hearing display, notes, white noise, fixed timer, clean welcome, no legacy code pollution.
- REST STT now sends delta (full text) so UI lights up immediately with what was heard.
- Works as 'live' for the final transcript (true partials would stream words if Realtime was available).
- Per PLAN item 3.
This makes the voice start playing the first words while the rest of the response is still generating (big win for perceived latency).
Per PLAN item 2.
- Backend: added transcribe_audio (gpt-4o-transcribe), switched audio handler to full blob -> REST -> LLM -> streaming TTS
- Frontend: MediaRecorder (webm/opus) full recording sent on stop (one blob per utterance)
- Removed dead WhisperStream callbacks and pending_transcript/lock
- This unblocks voice per AUDIT item 1 (Option B fallback). Deltas will come in later item.
- Also preps for deprecation fix (MediaRecorder is the good path).
- URL now uses ?model=gpt-realtime-whisper (was invalid gpt-4o-mini-realtime-preview)
- Cleaned session.update (removed modalities that may not apply)
- Expanded _handle to catch input_audio_transcription.delta and .completed events
- on_error now forwards transcription errors to frontend client
- Per AUDIT + PLAN item 1
Replaces REST-based transcription (gpt-4o-transcribe) with WebSocket
streaming via gpt-realtime-whisper. Frontend captures PCM16 audio and
streams it through the backend to a Realtime transcription session.
- Server-side VAD detects utterance boundaries automatically
- Word-level transcript deltas stream to the client in real-time
- On utterance end, gpt-5.4-nano generates a response
- TTS streams back via with_streaming_response
- Total pipeline: PCM16 → Realtime WS → LLM → streaming TTS
Replaces gpt-4o-mini-transcribe (/usr/bin/bash.003/min) with gpt-realtime-whisper
(/usr/bin/bash.017/min). Expected to reduce transcription latency from ~2.6s
to ~1s due to the model's realtime optimization.
Replaced synchronous TTS (waiting for full audio at 5.9s) with
streaming TTS that sends audio chunks as they arrive. Backend now
accumulates chunks in audioBufferRef and plays the complete stream
on speaking_end. Reduces TTS latency from ~6s to ~1s first byte.
The memory context was being rebuilt on every conversation turn via
build_system_prompt(), which calls Honcho's dialectic reasoning API
twice (get_user_context + get_kira_context). Each call takes 5-15s.
Now the memory suffix is computed ONCE during identify and cached in
a memory_suffix variable for the session duration. Per-turn latency
drops from ~37s to ~3s.
Also removed duplicated _pcm16_to_wav and cleaned up orphaned code.
PCM16 capture via AudioContext was streaming raw audio continuously,
causing massive accumulated buffers that took ~20s to transcribe.
Replaced with MediaRecorder which records compressed Opus/webm and
sends a single blob on release — much smaller, faster to transcribe.
Also removed all unused PCM16/WAV helper functions from both frontend
and backend.
The backend sends Opus-encoded audio from OpenAI TTS (tts-1 with
response_format=opus). The frontend was treating it as raw PCM16
and wrapping it in a WAV container, which corrupted the audio into
static. Now plays the Opus data directly as audio/ogg.
Frontend captures PCM16 mono 24kHz audio. The transcription API
expects a proper audio container format (wav, webm, etc.), not raw
PCM16 data. Added _pcm16_to_wav() to wrap the raw bytes in a WAV
header before sending to gpt-4o-mini-transcribe.
The openai library's beta.realtime.connect() hardcodes the obsolete
'OpenAI-Beta: realtime=v1' header which the GA API rejects. Connecting
directly via the websockets library with only the Authorization header
resolves the 'beta_api_shape_disabled' error.
Hybrid approach gives streaming STT at ~/usr/bin/bash.017/min + cheap brain
at ~/usr/bin/bash.001/min + TTS at ~/usr/bin/bash.015/min = ~/usr/bin/bash.033/min total.
- gpt-realtime-whisper handles streaming transcription with VAD
- gpt-5.4-nano handles response generation (chat completions)
- OpenAI TTS (nova) for voice output
- Server VAD detects utterance boundaries
- Honcho memory context injected into system prompt
- Removed old full Realtime relay service
The old OpenAI-Beta: realtime=v1 header is rejected by the GA API.
Removing it via extra_headers override. Using gpt-realtime-2 which
is the current production Realtime model.
Replaced the 3-step sequential pipeline (Whisper STT → DeepSeek LLM
→ OpenAI TTS) with a single OpenAI Realtime API WebSocket using
gpt-4o-mini-realtime-preview.
- ~300-800ms latency vs 1-3s
- Server VAD for automatic turn detection
- Streaming audio chunks during playback
- Interruptions: user can speak over Kira mid-response
- Honcho memory still injected into session instructions
- Frontend captures PCM16 mono 24kHz via AudioContext
- Backend relays client ↔ OpenAI Realtime API
- Supports both voice (PCM16) and text input
navigator.mediaDevices.getUserMedia() requires a secure context
(HTTPS or localhost). When accessed over plain HTTP, the API
is undefined. Now shows a friendly chat message instead of a
cryptic TypeError in the console.
- Registered pixi Ticker via (Live2DModel as any).registerTicker()
to fix 'No Ticker registered' warning and animation issues
- Fixed outfit texture swap: textures live on model.textures[]
not model.internalModel.textures[]
KiraAvatar: Added Talk mic button to Live2D view (was only in
AnimatedAvatar fallback). Includes listening-pulse animation.
MusicPlayer: Replaced hidden YouTube iframe with proper IFrame
Player API. Now starts on explicit user click (Start Lo-Fi button),
complying with browser autoplay policies. Supports station
switching and volume control after playback starts.
model.internalModel.coreModel.setTexture() expects a raw WebGL
texture, not a PixiJS Texture. Instead, set the new PixiJS Texture
directly on the model's internalModel.textures[2] array. The render
loop's bindTexture() call extracts the WebGL handle from the PixiJS
BaseTexture and passes it to the Cubism core.
This eliminates the cascade of try-catch fallbacks and the
'coreModel.setTexture is not a function' TypeError.