hobokenchicken/kira

Fork 0

Commit Graph

Author	SHA1	Message	Date
hobokenchicken	188da1d52a	fix(stt): try gpt-4o-realtime-preview as base session model + gpt-realtime-whisper for input_audio_transcription (per OpenAI error guidance)	2026-06-04 15:16:08 -04:00
hobokenchicken	191b7ad9b5	fix(stt): correct Realtime WS model to gpt-realtime-whisper + enhance event handling for deltas/completed - URL now uses ?model=gpt-realtime-whisper (was invalid gpt-4o-mini-realtime-preview) - Cleaned session.update (removed modalities that may not apply) - Expanded _handle to catch input_audio_transcription.delta and .completed events - on_error now forwards transcription errors to frontend client - Per AUDIT + PLAN item 1	2026-06-04 15:14:26 -04:00
hobokenchicken	7502f201c7	feat: Realtime WebSocket STT via gpt-realtime-whisper Replaces REST-based transcription (gpt-4o-transcribe) with WebSocket streaming via gpt-realtime-whisper. Frontend captures PCM16 audio and streams it through the backend to a Realtime transcription session. - Server-side VAD detects utterance boundaries automatically - Word-level transcript deltas stream to the client in real-time - On utterance end, gpt-5.4-nano generates a response - TTS streams back via with_streaming_response - Total pipeline: PCM16 → Realtime WS → LLM → streaming TTS	2026-06-04 14:26:19 -04:00

Author

SHA1

Message

Date

hobokenchicken

188da1d52a

fix(stt): try gpt-4o-realtime-preview as base session model + gpt-realtime-whisper for input_audio_transcription (per OpenAI error guidance)

2026-06-04 15:16:08 -04:00

hobokenchicken

191b7ad9b5

fix(stt): correct Realtime WS model to gpt-realtime-whisper + enhance event handling for deltas/completed

- URL now uses ?model=gpt-realtime-whisper (was invalid gpt-4o-mini-realtime-preview)
- Cleaned session.update (removed modalities that may not apply)
- Expanded _handle to catch input_audio_transcription.delta and .completed events
- on_error now forwards transcription errors to frontend client
- Per AUDIT + PLAN item 1

2026-06-04 15:14:26 -04:00

hobokenchicken

7502f201c7

feat: Realtime WebSocket STT via gpt-realtime-whisper

Replaces REST-based transcription (gpt-4o-transcribe) with WebSocket
streaming via gpt-realtime-whisper. Frontend captures PCM16 audio and
streams it through the backend to a Realtime transcription session.

- Server-side VAD detects utterance boundaries automatically
- Word-level transcript deltas stream to the client in real-time
- On utterance end, gpt-5.4-nano generates a response
- TTS streams back via with_streaming_response
- Total pipeline: PCM16 → Realtime WS → LLM → streaming TTS

2026-06-04 14:26:19 -04:00

3 Commits