Commit Graph

248 Commits

Author SHA1 Message Date
hobokenchicken dc9af4d79c feat: add model_groups table and default seed data 2026-05-05 10:33:35 -04:00
hobokenchicken c009d401fb docs: add Responses API endpoint to README
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-05-05 09:36:51 -04:00
hobokenchicken e5ef39f327 feat: add OpenAI Responses API support (POST /v1/responses)
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Add full Responses API endpoint alongside existing Chat Completions,
with identical logging/tracking/cost pipeline.

New:
- internal/models/responses.go — request/response/stream types + ToUsage() bridge
- internal/providers/openai_responses.go — OpenAI Responses/ResponsesStream

Modified:
- provider.go — Responses()+ResponsesStream() added to Provider interface
- helpers.go — BuildOpenAIResponsesBody, parsers, SSE stream reader
- circuit_breaker.go — CB wraps Responses, passthrough for stream
- server.go — POST /v1/responses route + handleResponses handler
- all non-OpenAI providers — stub methods with clear error messages

Logging: ResponsesUsage.ToUsage() bridges to models.Usage, feeding same
logRequest() -> DB insert -> dashboard WS -> client stats -> cost calc
pipeline. No schema or logger changes needed.
2026-05-02 16:38:17 -04:00
hobokenchicken eb67287b56 fix: raise provider HTTP timeouts from 30s to 10min
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
30-second resty client timeout was killing long streaming responses
mid-generation. Models with large output windows (e.g. deepseek-v4-pro
at 384K max_tokens) routinely exceed 30s. Raised all providers to
10 minutes (Ollama already at 15min, unchanged). Circuit breaker
recovery timeout raised from 30s to 5min.
2026-04-30 10:17:45 -04:00
hobokenchicken 4aa17b4fd2 debug: add max_tokens trace logging to chat completions handler
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Logs what max_tokens the client sends, whether gophergate injects
one from the registry, and the final value forwarded to the provider.
Helps trace output truncation issues.
2026-04-30 10:04:50 -04:00
hobokenchicken 79571c6bdc fix: replace sql.NullTime with string scan for MAX() aggregate queries
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Go 1.26 changed NullTime.Scan to delegate to convertAssign,
which has no string->time.Time conversion path. The
modernc.org/sqlite driver returns raw strings for aggregate
expressions like MAX(last_used_at), causing silent scan failures
that made all clients/providers show 'Never' for last used.

Fixes by scanning into a string and parsing with time.Parse.
2026-04-30 09:32:11 -04:00
hobokenchicken d46a333249 feat: inject max_tokens from models.dev registry when not specified in request
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
When a client omits max_tokens, providers (DeepSeek, etc.) apply
a low server-side default output cap. Now gophergate looks up the
model in the models.dev registry and injects the model's output
limit, preventing silent truncation.
2026-04-28 15:36:06 -04:00
hobokenchicken 7446f3463d fix: add per-image cost tracking for DALL-E and Imagen
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-27 10:42:29 -04:00
hobokenchicken b1a72f5a10 fix: estimate image gen tokens from prompt length instead of hardcoding
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-27 10:28:39 -04:00
hobokenchicken 5ee539d95c feat: add image generation for OpenAI DALL-E and Gemini Imagen
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
New `/v1/images/generations` endpoint proxies DALL-E 2/3 (OpenAI)
and Imagen 3 (Gemini). Same auth/logging as chat completions.

- Add ImageGenerationRequest/Response models
- Extend Provider interface with ImageGeneration()
- OpenAI: forward to /v1/images/generations
- Gemini: call /v1beta/models/{model}:predict, map OpenAI params
- Circuit breaker wraps image gen like chat completions
- Model routing: dall-e* -> openai, imagen*/gemini* -> gemini
- Unsupported providers (deepseek/moonshot/grok/ollama) return error
- Fix pre-existing CachedContentTokenCount bug in StreamGemini
2026-04-27 10:06:07 -04:00
hobokenchicken 14e26a4323 feat: capture Gemini cached content tokens in cost tracking
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Add CachedContentTokenCount to UsageMetadata parsing for both
  streaming (helpers.go) and non-streaming (gemini.go) requests
- CacheReadTokens now populated from Gemini cachedContentTokenCount
- Add uint32Ptr helper for nil-safe uint32 pointer creation
2026-04-26 21:14:53 -04:00
hobokenchicken 1c3b1c6fe9 fix: FindModel reverse fuzzy match for date-suffixed model IDs
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Add step between exact ID match and forward fuzzy match that checks
if registry model ID starts with the requested name. Fixes models like
'gpt-5.4-mini' not matching 'gpt-5.4-mini-2026-04-01' in registry.
2026-04-26 21:09:56 -04:00
hobokenchicken 5e0c10db01 fix: goimports — strip unused imports from all server files
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-26 15:00:04 -04:00
hobokenchicken e598150d90 fix: trim imports per file — no unused imports in split files
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-26 14:58:53 -04:00
hobokenchicken 2fa6f0df62 fix: split dashboard.go properly — extract analytics + models_config
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- analytics.go: UsagePeriodFilter, UsageSummary, TimeSeries, ProvidersUsage, ClientsUsage, AnalyticsBreakdown, DetailedUsage
- models_config.go: handleGetModels, handleUpdateModel
- Fix all import blocks with missing closing parens
- Remove leftover fmt.Printf warnings in server.go
2026-04-26 14:57:28 -04:00
hobokenchicken db76858072 fix: import block syntax in split dashboard files
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Add missing closing ) in clients.go, providers_admin.go, users.go, system.go
- Add SetTimeout(30s) to OpenAI provider (was resty.New() with no timeout)
2026-04-26 14:55:29 -04:00
hobokenchicken af2c5b95f7 feat: Phase 3 - architecture & maintainability
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Split 1474-line dashboard.go into 5 domain files (clients, providers, users, system)
- Unit tests for ModelRegistry.FindModel and CalculateCost
- go mod tidy + verify (deps clean)
- .gitignore excludes tool cache dirs (.pi-lens/, .opencode/)
2026-04-26 14:52:10 -04:00
hobokenchicken 1f574d8134 feat: Phase 2 - reliability & observability
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Circuit breaker: proper thresholds (3 failures, 30s timeout)
- HTTP timeouts: 30s on all providers (was no timeout)
- Structured logging: slog replaces fmt.Printf throughout
- Stream errors: propagated as SSE error events to client
- Registry fetch: retry with backoff (3 attempts)
- Registry reads in dashboard protected by RWMutex
2026-04-26 14:48:56 -04:00
hobokenchicken 8a8d8d1477 fix: Phase 1 - security & stability patches
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- AuthMiddleware now requires auth on /v1/* routes (returns 401)
- WebSocket origin check configurable via WSAllowedOrigin
- Removed debug fmt.Printf leaks (config, ollama, server)
- Registry access protected by sync.RWMutex (race condition fix)
- Session cleanup goroutine runs every 15 min
- RevokeSession returns error instead of silent no-op
2026-04-26 14:45:22 -04:00
hobokenchicken da074f52b4 fix: remove global auth middleware, causing webui login issues
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-09 12:21:02 -04:00
hobokenchicken 9b0aa4dbe8 fix: remove unused fmt import in circuit breaker provider
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-09 12:19:14 -04:00
hobokenchicken 212ac14a1b feat: implement circuit breaker, fix auth vulnerability
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-09 12:17:18 -04:00
hobokenchicken 2929f51556 fixed model visibility
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-09 12:13:53 +00:00
hobokenchicken e12418cc4c fix(gemini): ensure strict 1:1 pairing of model calls and function responses
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Gemini requires function results to immediately follow the model message that called them
- Implemented look-ahead grouping to pair assistant calls with their tool results
- Standardized system and orphaned tool message handling for Gemini compatibility
2026-04-07 18:57:13 +00:00
hobokenchicken be4ec3482a fix(gemini): group adjacent tool messages and ensure correct role sequence
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Group consecutive 'tool' messages into a single Gemini content message with multiple 'functionResponse' parts
- Ensure assistant tool calls are properly mapped and sent
- Maintain v1beta for preview and newer models
- Added debug logging for API errors
2026-04-07 18:50:48 +00:00
hobokenchicken e67aafdac1 fix(gemini): improve tool-calling support and handle function_call response
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Support tool definitions in Gemini requests
- Map tool role to 'function' in Gemini content
- Ensure tool results are wrapped in JSON objects for Gemini compatibility
- Parse FunctionCall from Gemini response and map to OpenAI-compatible ToolCalls
- Correctly map finish_reason for tool calls
2026-04-07 18:37:57 +00:00
hobokenchicken 21e5204abd fix(ollama): improve tool-calling support and restore gemma/llama context limits
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Explicitly set tool_choice: auto when tools are present to aid gemma/llama models
- Sync stop sequences into the options map for broader compatibility
- Restore gemma/llama to the high-context (32k) optimization list
2026-04-07 14:24:23 +00:00
hobokenchicken 4095c68822 fix(ollama): improve model detection and ensure robust token/context limits
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Use case-insensitive matching for model names and routing
- Default max_tokens/num_predict to 8192 for all Ollama models to prevent truncation
- Increase default context window and add more large-context model families
- Ensure DeepSeek routing handles Ollama-hosted variants correctly
2026-04-07 14:05:21 +00:00
hobokenchicken ef37dc5af0 fix(ollama): significantly increase context and prediction limits
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Increase timeout to 15m
- Set num_ctx to 32k for common models
- Set default num_predict to 8192 for common models
2026-04-07 13:48:02 +00:00
hobokenchicken fdbb068a6c fix(ollama): map max_tokens to num_predict and increase context window
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Map MaxTokens to num_predict in options map
- Set default num_ctx to 8192 for common models (gemma, llama, etc.)
- This ensures Ollama doesn't cut off responses early due to default limits
2026-04-07 13:44:17 +00:00
hobokenchicken dbbf48cb14 fix(ollama): increase timeout and add default max_tokens for large models
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Increase Ollama timeout to 5m for larger models (e.g. gemma4)
- Set default max_tokens to 4096 for common Ollama models
- Expand stream scanner buffer to 10MB to prevent truncation
- Improve model routing and prefix stripping in server
2026-04-07 13:42:10 +00:00
hobokenchicken 1e13b0376b feat(ollama): improve configuration and dashboard integration
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-07 12:53:17 +00:00
hobokenchicken 1b5cd2815e fix(ollama): improve error handling and add timeouts
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Add timeouts (30s) and retries to resty client
- Add debug logging for Ollama requests and responses
- Import time package for timeout configuration
- This should fix 500 errors and provide better error messages
2026-04-06 15:05:31 -04:00
hobokenchicken ba4c4af2f8 docs: update documentation for Ollama provider
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Add Ollama configuration instructions to README.md
- Update API usage section with Ollama examples
- Add Ollama to provider list in BACKEND_ARCHITECTURE.md
- All documentation now reflects complete Ollama support
2026-04-06 15:01:55 -04:00
hobokenchicken e56a284415 docs: update TODO.md with Ollama provider completion
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Mark Ollama as completed provider implementation
- Add Ollama-specific feature checklist
- Update provider list in completed tasks
2026-04-06 14:46:32 -04:00
hobokenchicken cbc9eeb453 fix(server): add Ollama model detection and registry support
- Add Ollama to allowed providers in model list endpoint
- Add model pattern detection for Ollama models (glm-, qwen, gemma, llama, mistral, codellama)
- This fixes 500 errors when using Ollama models via /v1/chat/completions
2026-04-06 14:45:57 -04:00
hobokenchicken 2f6b7deb2c feat(providers): add Ollama provider support
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Implement OllamaProvider with OpenAI-compatible API integration
- Add Ollama to provider initialization in server.go
- Update config.go to handle Ollama (no API key required)
- Configure .env with Ollama server at 172.20.1.222:11434
- Support models: glm-4.7-flash:latest, qwen3-coder:30b, gemma4:26b
2026-04-06 14:38:35 -04:00
hobokenchicken 9375448087 fix(moonshot): resolve 401 Unauthorized errors by trimming API keys and improving request compatibility
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-03-26 17:09:27 +00:00
hobokenchicken 5be2f6f7aa fix: use Moonshot test model
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-03-26 10:12:44 -04:00
hobokenchicken eebcadcba1 fix: surface moonshot on providers page
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-03-25 09:35:41 -04:00
hobokenchicken 6b2bd13903 chore: remove tracked binary gophergate
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-03-25 13:32:51 +00:00
hobokenchicken 5dfda0a10c merge: resolve conflicts in server.go and integrate moonshot support 2026-03-25 13:32:40 +00:00
hobokenchicken a8a02d9e1c feat: add moonshot kimi k2.5 support
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-03-25 09:28:52 -04:00
hobokenchicken bd1d17cc4d feat: add moonshot kimi k2.5 support
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-03-25 09:27:46 -04:00
hobokenchicken 9207a7231c chore: update all grok-2 references to grok-4
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-03-25 13:17:06 +00:00
hobokenchicken c6efff9034 fix: update grok test model to grok-4-1-fast-non-reasoning
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-03-25 13:14:31 +00:00
hobokenchicken 27fbd8ed15 chore: cleanup repository and update gitignore
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Removed binary 'gophergate'
- Removed 'data/llm_proxy.db' from source control (kept locally)
- Removed old database backups in 'data/backups/'
- Updated .gitignore to exclude data directory and gophergate binary
2026-03-25 13:08:33 +00:00
hobokenchicken 348341f304 fix: prioritize database provider configs and implement API key encryption
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Added AES-GCM encryption/decryption for provider API keys in the database.
- Implemented RefreshProviders to load provider configs from the database with precedence over environment variables.
- Updated dashboard handlers to encrypt keys on save and trigger in-memory provider refresh.
- Updated Grok test model to grok-3-mini for better compatibility.
2026-03-25 13:04:26 +00:00
hobokenchicken 9380580504 fix: resolve dashboard websocket 'disconnected' status
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Fixed status indicator UI mapping in websocket.js and index.html.
- Added missing CSS for connection status indicator and pulse animation.
- Made initial model registry fetch asynchronous to prevent blocking server startup.
- Improved configuration loading to correctly handle LLM_PROXY__SERVER__PORT from environment.
2026-03-19 14:32:34 -04:00
hobokenchicken 08cf5cc1d9 fix: improve cost tracking accuracy for modern models
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Added support for reasoning tokens in cost calculations.
- Fixed DeepSeek cache-write token mapping (PromptCacheMissTokens).
- Improved CalculateCost debug logging to trace all pricing variables.
2026-03-19 14:14:54 -04:00