- Add promo discount system for deepseek-v4-pro (75% off until 2026-05-31)
- Rewrite StreamGemini to handle both SSE and JSON array response formats,
fixing 0-token logging for gemini-3-flash and gemini-3-flash-preview
- Fall back to model group name for cost lookup when concrete model
isnt in the registry (fixes $0 cost on deepseek-auto entries)
- Move registry lock before FindModel call to fix data race
The 40-character truncation of tool call IDs in helper.go caused collisions
when models (like deepseek-v4-flash) generated longer IDs, leading to
"Duplicate value for 'tool_call_id'" errors. Removed the limit to allow
full unique IDs.
DeepSeek: updated reasoning_content injection to use an empty string
instead of a space, better matching provider expectations for history.
Improved API error reporting across all providers by capturing raw body
content when response parsing fails or returns empty strings.
FindModel iterates providers in random map order, so when deepseek-v4-pro
exists in both 'deepseek' (output=384000) and 'ollama-cloud' (output=1048576),
it sometimes returned the wrong metadata. The proxy then injected
max_tokens=1048576 into DeepSeek's API, which rejected it with 400
(valid range is [1, 393216]).
Fix: define CanonicalProviders list (deepseek, openai, google, xai, etc.)
and search them in priority order before falling back to all providers.
Each of the four lookup strategies (exact key, metadata ID, reverse fuzzy,
forward fuzzy) checks canonical providers first.
DeepSeek models default to Chinese for some prompts. The ensureEnglish()
function prepends 'Always respond in English' as a system message when
no system prompt is already set. Applied to both ChatCompletion and
ChatCompletionStream paths.
Previously, provider selection happened on the raw client-requested model
name (e.g. 'dispatcher') which defaulted to OpenAI. After routing resolved
it to 'deepseek-v4-flash', the provider was never re-selected.
Now prefix-stripping + routing runs first, then selectProvider() picks
the correct provider based on the resolved concrete model.
Extracted selectProvider() method from handleChatCompletions' inline
logic. The classifier callback now calls selectProvider(selectorModel)
instead of hardcoding openaiProvider.
This fixes the 'circuit breaker is open' error when dispatcher tries
to use deepseek-v4-flash as its selector model.
Classifier: When complexity_threshold is set (e.g. 10), uses it as the
rating scale and maps ratings proportionally to target buckets instead
of 1:1. Formula: idx = rating * len(targets) / (threshold + 1).
With threshold=10 and 3 targets: 1-3→target[0], 4-7→target[1], 8-10→target[2].
Seed: Added 'dispatcher' group (classifier, threshold=10, selector=deepseek-v4-flash)
that auto-routes to fast-flow/standard-pro/heavy-logic by complexity score.
Combined with hierarchical routing, this enables two-level dispatch:
dispatcher scores 1-10 → routes to tier group → tier picks concrete model.
RouteToConcrete() recursively resolves group chains until a concrete
model is reached, with cycle detection and max depth (10) guard.
Example: all-purpose -> fast-flow -> deepseek-v4-flash
The dashboard log shows the full chain: 'deepseek-v4-flash (hierarchical:
fast-flow (default (first target)) -> deepseek-v4-flash (default (first target)))'
Schema: Added logic_level (INTEGER) and primary_use (TEXT) columns
to model_groups table with auto-migration for existing databases.
Seed: Three new default groups:
heavy-logic (level 9) — Complex Coding, Logic, Agents
standard-pro (level 5) — General Assistant, Long Docs
fast-flow (level 2) — Classification, JSON, Basic Q&A
Admin API: INSERT/UPDATE handlers now accept and persist the new fields.
Dashboard: Table shows Level and Primary Use columns; form includes
both fields with appropriate inputs and placeholders.
Add Groups() method to Router so handleListModels can append model
group IDs (e.g. 'deepseek-auto', 'openai-auto') to the model list,
marked with owned_by: 'gophergate'. This lets clients discover and
use groups via the standard OpenAI /v1/models endpoint.
When using model groups (e.g. 'deepseek-auto'), the dashboard logged the
group name instead of the concrete resolved model (e.g. 'deepseek-reasoner').
Now:
- logRequest passes the resolved modelID (concrete) + modelGroup (group name)
- RequestLog struct has a new ModelGroup field (omitempty)
- Dashboard displays resolved model (via group) when a group was used
Files changed:
internal/server/logging.go - add ModelGroup field
internal/server/server.go - pass resolved modelID, capture modelGroup
static/js/websocket.js - show group annotation in Recent Activity
static/js/pages/overview.js - show group annotation in overview table
static/js/pages/monitoring.js - show group annotation in stream
30-second resty client timeout was killing long streaming responses
mid-generation. Models with large output windows (e.g. deepseek-v4-pro
at 384K max_tokens) routinely exceed 30s. Raised all providers to
10 minutes (Ollama already at 15min, unchanged). Circuit breaker
recovery timeout raised from 30s to 5min.
Logs what max_tokens the client sends, whether gophergate injects
one from the registry, and the final value forwarded to the provider.
Helps trace output truncation issues.
Go 1.26 changed NullTime.Scan to delegate to convertAssign,
which has no string->time.Time conversion path. The
modernc.org/sqlite driver returns raw strings for aggregate
expressions like MAX(last_used_at), causing silent scan failures
that made all clients/providers show 'Never' for last used.
Fixes by scanning into a string and parsing with time.Parse.
When a client omits max_tokens, providers (DeepSeek, etc.) apply
a low server-side default output cap. Now gophergate looks up the
model in the models.dev registry and injects the model's output
limit, preventing silent truncation.
- Add CachedContentTokenCount to UsageMetadata parsing for both
streaming (helpers.go) and non-streaming (gemini.go) requests
- CacheReadTokens now populated from Gemini cachedContentTokenCount
- Add uint32Ptr helper for nil-safe uint32 pointer creation
Add step between exact ID match and forward fuzzy match that checks
if registry model ID starts with the requested name. Fixes models like
'gpt-5.4-mini' not matching 'gpt-5.4-mini-2026-04-01' in registry.
- Gemini requires function results to immediately follow the model message that called them
- Implemented look-ahead grouping to pair assistant calls with their tool results
- Standardized system and orphaned tool message handling for Gemini compatibility
- Group consecutive 'tool' messages into a single Gemini content message with multiple 'functionResponse' parts
- Ensure assistant tool calls are properly mapped and sent
- Maintain v1beta for preview and newer models
- Added debug logging for API errors
- Support tool definitions in Gemini requests
- Map tool role to 'function' in Gemini content
- Ensure tool results are wrapped in JSON objects for Gemini compatibility
- Parse FunctionCall from Gemini response and map to OpenAI-compatible ToolCalls
- Correctly map finish_reason for tool calls
- Explicitly set tool_choice: auto when tools are present to aid gemma/llama models
- Sync stop sequences into the options map for broader compatibility
- Restore gemma/llama to the high-context (32k) optimization list
- Use case-insensitive matching for model names and routing
- Default max_tokens/num_predict to 8192 for all Ollama models to prevent truncation
- Increase default context window and add more large-context model families
- Ensure DeepSeek routing handles Ollama-hosted variants correctly
- Map MaxTokens to num_predict in options map
- Set default num_ctx to 8192 for common models (gemma, llama, etc.)
- This ensures Ollama doesn't cut off responses early due to default limits
- Increase Ollama timeout to 5m for larger models (e.g. gemma4)
- Set default max_tokens to 4096 for common Ollama models
- Expand stream scanner buffer to 10MB to prevent truncation
- Improve model routing and prefix stripping in server
- Add timeouts (30s) and retries to resty client
- Add debug logging for Ollama requests and responses
- Import time package for timeout configuration
- This should fix 500 errors and provide better error messages