Commit Graph

266 Commits

Author SHA1 Message Date
hobokenchicken eb585c0001 fix: switch dispatcher classifier to gpt-5.4-nano
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
gpt-5.4-nano correctly discriminates complexity (1 vs 10)
while deepseek-v4-flash rated everything as 1/10.
2026-05-07 14:00:19 -04:00
hobokenchicken 4aea7a3b4c fix: select provider AFTER routing resolves model groups
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Previously, provider selection happened on the raw client-requested model
name (e.g. 'dispatcher') which defaulted to OpenAI. After routing resolved
it to 'deepseek-v4-flash', the provider was never re-selected.

Now prefix-stripping + routing runs first, then selectProvider() picks
the correct provider based on the resolved concrete model.
2026-05-07 13:54:42 -04:00
hobokenchicken 330eaa57d1 fix: update model names to match current models.dev registry
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
heavy-logic: kimi-k2.5 -> kimi-k2.6
standard-pro: gemini-3-flash -> gemini-3-flash-preview
2026-05-07 13:48:33 -04:00
hobokenchicken 0ae30036f0 fix: classifier selector model now routes to correct provider
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Extracted selectProvider() method from handleChatCompletions' inline
logic. The classifier callback now calls selectProvider(selectorModel)
instead of hardcoding openaiProvider.

This fixes the 'circuit breaker is open' error when dispatcher tries
to use deepseek-v4-flash as its selector model.
2026-05-07 13:37:19 -04:00
hobokenchicken 3c0b59622e feat: classifier bucket mapping + dispatcher seed group
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Classifier: When complexity_threshold is set (e.g. 10), uses it as the
rating scale and maps ratings proportionally to target buckets instead
of 1:1. Formula: idx = rating * len(targets) / (threshold + 1).

With threshold=10 and 3 targets: 1-3→target[0], 4-7→target[1], 8-10→target[2].

Seed: Added 'dispatcher' group (classifier, threshold=10, selector=deepseek-v4-flash)
that auto-routes to fast-flow/standard-pro/heavy-logic by complexity score.

Combined with hierarchical routing, this enables two-level dispatch:
  dispatcher scores 1-10 → routes to tier group → tier picks concrete model.
2026-05-07 13:18:35 -04:00
hobokenchicken 7517307c11 feat: add hierarchical routing — groups can target other groups
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
RouteToConcrete() recursively resolves group chains until a concrete
model is reached, with cycle detection and max depth (10) guard.

Example: all-purpose -> fast-flow -> deepseek-v4-flash
The dashboard log shows the full chain: 'deepseek-v4-flash (hierarchical:
fast-flow (default (first target)) -> deepseek-v4-flash (default (first target)))'
2026-05-07 12:28:31 -04:00
hobokenchicken 19517b0847 chore: add deploy.sh for prod restarts
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-05-07 12:02:28 -04:00
hobokenchicken a3a6f765e7 feat: add logic_level and primary_use metadata to model groups
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Schema: Added logic_level (INTEGER) and primary_use (TEXT) columns
to model_groups table with auto-migration for existing databases.

Seed: Three new default groups:
  heavy-logic  (level 9) — Complex Coding, Logic, Agents
  standard-pro (level 5) — General Assistant, Long Docs
  fast-flow    (level 2) — Classification, JSON, Basic Q&A

Admin API: INSERT/UPDATE handlers now accept and persist the new fields.
Dashboard: Table shows Level and Primary Use columns; form includes
both fields with appropriate inputs and placeholders.
2026-05-07 12:01:28 -04:00
hobokenchicken 79dd122b56 feat: expose model groups in /v1/models endpoint
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Add Groups() method to Router so handleListModels can append model
group IDs (e.g. 'deepseek-auto', 'openai-auto') to the model list,
marked with owned_by: 'gophergate'. This lets clients discover and
use groups via the standard OpenAI /v1/models endpoint.
2026-05-07 11:26:05 -04:00
hobokenchicken 3021e4b2b4 fix: log resolved model name instead of group name in Recent Activity
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
When using model groups (e.g. 'deepseek-auto'), the dashboard logged the
group name instead of the concrete resolved model (e.g. 'deepseek-reasoner').

Now:
- logRequest passes the resolved modelID (concrete) + modelGroup (group name)
- RequestLog struct has a new ModelGroup field (omitempty)
- Dashboard displays resolved model (via group) when a group was used

Files changed:
  internal/server/logging.go  - add ModelGroup field
  internal/server/server.go   - pass resolved modelID, capture modelGroup
  static/js/websocket.js      - show group annotation in Recent Activity
  static/js/pages/overview.js - show group annotation in overview table
  static/js/pages/monitoring.js - show group annotation in stream
2026-05-07 11:16:36 -04:00
hobokenchicken 14de7e9ebf fix: wrap model-groups API responses in SuccessResponse for api.js client
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-05-05 11:41:23 -04:00
hobokenchicken 4fef201e95 fix: remove /api prefix from model-groups API calls (api.js already prepends it)
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-05-05 11:33:05 -04:00
hobokenchicken bac03de051 docs: add automatic model routing to README
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-05-05 11:28:59 -04:00
hobokenchicken 37949e560b feat: add model groups dashboard page with CRUD UI
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-05-05 10:55:25 -04:00
hobokenchicken f04cb6b8f2 feat: add model groups CRUD admin API endpoints 2026-05-05 10:50:33 -04:00
hobokenchicken 10262c0e5a feat: wire model group router into chat completions handler 2026-05-05 10:47:32 -04:00
hobokenchicken d345f8c41d feat: add classifier routing strategy with LLM complexity rating 2026-05-05 10:40:26 -04:00
hobokenchicken d1f7a57f58 feat: add router package with heuristic strategy 2026-05-05 10:37:36 -04:00
hobokenchicken dc9af4d79c feat: add model_groups table and default seed data 2026-05-05 10:33:35 -04:00
hobokenchicken c009d401fb docs: add Responses API endpoint to README
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-05-05 09:36:51 -04:00
hobokenchicken e5ef39f327 feat: add OpenAI Responses API support (POST /v1/responses)
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Add full Responses API endpoint alongside existing Chat Completions,
with identical logging/tracking/cost pipeline.

New:
- internal/models/responses.go — request/response/stream types + ToUsage() bridge
- internal/providers/openai_responses.go — OpenAI Responses/ResponsesStream

Modified:
- provider.go — Responses()+ResponsesStream() added to Provider interface
- helpers.go — BuildOpenAIResponsesBody, parsers, SSE stream reader
- circuit_breaker.go — CB wraps Responses, passthrough for stream
- server.go — POST /v1/responses route + handleResponses handler
- all non-OpenAI providers — stub methods with clear error messages

Logging: ResponsesUsage.ToUsage() bridges to models.Usage, feeding same
logRequest() -> DB insert -> dashboard WS -> client stats -> cost calc
pipeline. No schema or logger changes needed.
2026-05-02 16:38:17 -04:00
hobokenchicken eb67287b56 fix: raise provider HTTP timeouts from 30s to 10min
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
30-second resty client timeout was killing long streaming responses
mid-generation. Models with large output windows (e.g. deepseek-v4-pro
at 384K max_tokens) routinely exceed 30s. Raised all providers to
10 minutes (Ollama already at 15min, unchanged). Circuit breaker
recovery timeout raised from 30s to 5min.
2026-04-30 10:17:45 -04:00
hobokenchicken 4aa17b4fd2 debug: add max_tokens trace logging to chat completions handler
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Logs what max_tokens the client sends, whether gophergate injects
one from the registry, and the final value forwarded to the provider.
Helps trace output truncation issues.
2026-04-30 10:04:50 -04:00
hobokenchicken 79571c6bdc fix: replace sql.NullTime with string scan for MAX() aggregate queries
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Go 1.26 changed NullTime.Scan to delegate to convertAssign,
which has no string->time.Time conversion path. The
modernc.org/sqlite driver returns raw strings for aggregate
expressions like MAX(last_used_at), causing silent scan failures
that made all clients/providers show 'Never' for last used.

Fixes by scanning into a string and parsing with time.Parse.
2026-04-30 09:32:11 -04:00
hobokenchicken d46a333249 feat: inject max_tokens from models.dev registry when not specified in request
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
When a client omits max_tokens, providers (DeepSeek, etc.) apply
a low server-side default output cap. Now gophergate looks up the
model in the models.dev registry and injects the model's output
limit, preventing silent truncation.
2026-04-28 15:36:06 -04:00
hobokenchicken 7446f3463d fix: add per-image cost tracking for DALL-E and Imagen
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-27 10:42:29 -04:00
hobokenchicken b1a72f5a10 fix: estimate image gen tokens from prompt length instead of hardcoding
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-27 10:28:39 -04:00
hobokenchicken 5ee539d95c feat: add image generation for OpenAI DALL-E and Gemini Imagen
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
New `/v1/images/generations` endpoint proxies DALL-E 2/3 (OpenAI)
and Imagen 3 (Gemini). Same auth/logging as chat completions.

- Add ImageGenerationRequest/Response models
- Extend Provider interface with ImageGeneration()
- OpenAI: forward to /v1/images/generations
- Gemini: call /v1beta/models/{model}:predict, map OpenAI params
- Circuit breaker wraps image gen like chat completions
- Model routing: dall-e* -> openai, imagen*/gemini* -> gemini
- Unsupported providers (deepseek/moonshot/grok/ollama) return error
- Fix pre-existing CachedContentTokenCount bug in StreamGemini
2026-04-27 10:06:07 -04:00
hobokenchicken 14e26a4323 feat: capture Gemini cached content tokens in cost tracking
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Add CachedContentTokenCount to UsageMetadata parsing for both
  streaming (helpers.go) and non-streaming (gemini.go) requests
- CacheReadTokens now populated from Gemini cachedContentTokenCount
- Add uint32Ptr helper for nil-safe uint32 pointer creation
2026-04-26 21:14:53 -04:00
hobokenchicken 1c3b1c6fe9 fix: FindModel reverse fuzzy match for date-suffixed model IDs
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Add step between exact ID match and forward fuzzy match that checks
if registry model ID starts with the requested name. Fixes models like
'gpt-5.4-mini' not matching 'gpt-5.4-mini-2026-04-01' in registry.
2026-04-26 21:09:56 -04:00
hobokenchicken 5e0c10db01 fix: goimports — strip unused imports from all server files
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-26 15:00:04 -04:00
hobokenchicken e598150d90 fix: trim imports per file — no unused imports in split files
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-26 14:58:53 -04:00
hobokenchicken 2fa6f0df62 fix: split dashboard.go properly — extract analytics + models_config
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- analytics.go: UsagePeriodFilter, UsageSummary, TimeSeries, ProvidersUsage, ClientsUsage, AnalyticsBreakdown, DetailedUsage
- models_config.go: handleGetModels, handleUpdateModel
- Fix all import blocks with missing closing parens
- Remove leftover fmt.Printf warnings in server.go
2026-04-26 14:57:28 -04:00
hobokenchicken db76858072 fix: import block syntax in split dashboard files
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Add missing closing ) in clients.go, providers_admin.go, users.go, system.go
- Add SetTimeout(30s) to OpenAI provider (was resty.New() with no timeout)
2026-04-26 14:55:29 -04:00
hobokenchicken af2c5b95f7 feat: Phase 3 - architecture & maintainability
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Split 1474-line dashboard.go into 5 domain files (clients, providers, users, system)
- Unit tests for ModelRegistry.FindModel and CalculateCost
- go mod tidy + verify (deps clean)
- .gitignore excludes tool cache dirs (.pi-lens/, .opencode/)
2026-04-26 14:52:10 -04:00
hobokenchicken 1f574d8134 feat: Phase 2 - reliability & observability
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Circuit breaker: proper thresholds (3 failures, 30s timeout)
- HTTP timeouts: 30s on all providers (was no timeout)
- Structured logging: slog replaces fmt.Printf throughout
- Stream errors: propagated as SSE error events to client
- Registry fetch: retry with backoff (3 attempts)
- Registry reads in dashboard protected by RWMutex
2026-04-26 14:48:56 -04:00
hobokenchicken 8a8d8d1477 fix: Phase 1 - security & stability patches
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- AuthMiddleware now requires auth on /v1/* routes (returns 401)
- WebSocket origin check configurable via WSAllowedOrigin
- Removed debug fmt.Printf leaks (config, ollama, server)
- Registry access protected by sync.RWMutex (race condition fix)
- Session cleanup goroutine runs every 15 min
- RevokeSession returns error instead of silent no-op
2026-04-26 14:45:22 -04:00
hobokenchicken da074f52b4 fix: remove global auth middleware, causing webui login issues
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-09 12:21:02 -04:00
hobokenchicken 9b0aa4dbe8 fix: remove unused fmt import in circuit breaker provider
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-09 12:19:14 -04:00
hobokenchicken 212ac14a1b feat: implement circuit breaker, fix auth vulnerability
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-09 12:17:18 -04:00
hobokenchicken 2929f51556 fixed model visibility
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-09 12:13:53 +00:00
hobokenchicken e12418cc4c fix(gemini): ensure strict 1:1 pairing of model calls and function responses
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Gemini requires function results to immediately follow the model message that called them
- Implemented look-ahead grouping to pair assistant calls with their tool results
- Standardized system and orphaned tool message handling for Gemini compatibility
2026-04-07 18:57:13 +00:00
hobokenchicken be4ec3482a fix(gemini): group adjacent tool messages and ensure correct role sequence
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Group consecutive 'tool' messages into a single Gemini content message with multiple 'functionResponse' parts
- Ensure assistant tool calls are properly mapped and sent
- Maintain v1beta for preview and newer models
- Added debug logging for API errors
2026-04-07 18:50:48 +00:00
hobokenchicken e67aafdac1 fix(gemini): improve tool-calling support and handle function_call response
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Support tool definitions in Gemini requests
- Map tool role to 'function' in Gemini content
- Ensure tool results are wrapped in JSON objects for Gemini compatibility
- Parse FunctionCall from Gemini response and map to OpenAI-compatible ToolCalls
- Correctly map finish_reason for tool calls
2026-04-07 18:37:57 +00:00
hobokenchicken 21e5204abd fix(ollama): improve tool-calling support and restore gemma/llama context limits
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Explicitly set tool_choice: auto when tools are present to aid gemma/llama models
- Sync stop sequences into the options map for broader compatibility
- Restore gemma/llama to the high-context (32k) optimization list
2026-04-07 14:24:23 +00:00
hobokenchicken 4095c68822 fix(ollama): improve model detection and ensure robust token/context limits
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Use case-insensitive matching for model names and routing
- Default max_tokens/num_predict to 8192 for all Ollama models to prevent truncation
- Increase default context window and add more large-context model families
- Ensure DeepSeek routing handles Ollama-hosted variants correctly
2026-04-07 14:05:21 +00:00
hobokenchicken ef37dc5af0 fix(ollama): significantly increase context and prediction limits
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Increase timeout to 15m
- Set num_ctx to 32k for common models
- Set default num_predict to 8192 for common models
2026-04-07 13:48:02 +00:00
hobokenchicken fdbb068a6c fix(ollama): map max_tokens to num_predict and increase context window
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Map MaxTokens to num_predict in options map
- Set default num_ctx to 8192 for common models (gemma, llama, etc.)
- This ensures Ollama doesn't cut off responses early due to default limits
2026-04-07 13:44:17 +00:00
hobokenchicken dbbf48cb14 fix(ollama): increase timeout and add default max_tokens for large models
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Increase Ollama timeout to 5m for larger models (e.g. gemma4)
- Set default max_tokens to 4096 for common Ollama models
- Expand stream scanner buffer to 10MB to prevent truncation
- Improve model routing and prefix stripping in server
2026-04-07 13:42:10 +00:00
hobokenchicken 1e13b0376b feat(ollama): improve configuration and dashboard integration
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-07 12:53:17 +00:00