Upgrades the routing engine to support tag, token limit, multimodal, reasoning, and tool calling conditions. Adds unit tests for the new routing features.
- Add promo discount system for deepseek-v4-pro (75% off until 2026-05-31)
- Rewrite StreamGemini to handle both SSE and JSON array response formats,
fixing 0-token logging for gemini-3-flash and gemini-3-flash-preview
- Fall back to model group name for cost lookup when concrete model
isnt in the registry (fixes $0 cost on deepseek-auto entries)
- Move registry lock before FindModel call to fix data race
The 40-character truncation of tool call IDs in helper.go caused collisions
when models (like deepseek-v4-flash) generated longer IDs, leading to
"Duplicate value for 'tool_call_id'" errors. Removed the limit to allow
full unique IDs.
DeepSeek: updated reasoning_content injection to use an empty string
instead of a space, better matching provider expectations for history.
Improved API error reporting across all providers by capturing raw body
content when response parsing fails or returns empty strings.
DeepSeek models default to Chinese for some prompts. The ensureEnglish()
function prepends 'Always respond in English' as a system message when
no system prompt is already set. Applied to both ChatCompletion and
ChatCompletionStream paths.
- Add CachedContentTokenCount to UsageMetadata parsing for both
streaming (helpers.go) and non-streaming (gemini.go) requests
- CacheReadTokens now populated from Gemini cachedContentTokenCount
- Add uint32Ptr helper for nil-safe uint32 pointer creation
Improved extraction of reasoning and cached tokens from OpenAI and DeepSeek responses (including streams). Ensured accurate cost calculation using registry metadata.