Commit Graph

246 Commits

Author SHA1 Message Date
hobokenchicken e5ef39f327 feat: add OpenAI Responses API support (POST /v1/responses)
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Add full Responses API endpoint alongside existing Chat Completions,
with identical logging/tracking/cost pipeline.

New:
- internal/models/responses.go — request/response/stream types + ToUsage() bridge
- internal/providers/openai_responses.go — OpenAI Responses/ResponsesStream

Modified:
- provider.go — Responses()+ResponsesStream() added to Provider interface
- helpers.go — BuildOpenAIResponsesBody, parsers, SSE stream reader
- circuit_breaker.go — CB wraps Responses, passthrough for stream
- server.go — POST /v1/responses route + handleResponses handler
- all non-OpenAI providers — stub methods with clear error messages

Logging: ResponsesUsage.ToUsage() bridges to models.Usage, feeding same
logRequest() -> DB insert -> dashboard WS -> client stats -> cost calc
pipeline. No schema or logger changes needed.
2026-05-02 16:38:17 -04:00
hobokenchicken eb67287b56 fix: raise provider HTTP timeouts from 30s to 10min
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
30-second resty client timeout was killing long streaming responses
mid-generation. Models with large output windows (e.g. deepseek-v4-pro
at 384K max_tokens) routinely exceed 30s. Raised all providers to
10 minutes (Ollama already at 15min, unchanged). Circuit breaker
recovery timeout raised from 30s to 5min.
2026-04-30 10:17:45 -04:00
hobokenchicken 4aa17b4fd2 debug: add max_tokens trace logging to chat completions handler
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Logs what max_tokens the client sends, whether gophergate injects
one from the registry, and the final value forwarded to the provider.
Helps trace output truncation issues.
2026-04-30 10:04:50 -04:00
hobokenchicken 79571c6bdc fix: replace sql.NullTime with string scan for MAX() aggregate queries
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Go 1.26 changed NullTime.Scan to delegate to convertAssign,
which has no string->time.Time conversion path. The
modernc.org/sqlite driver returns raw strings for aggregate
expressions like MAX(last_used_at), causing silent scan failures
that made all clients/providers show 'Never' for last used.

Fixes by scanning into a string and parsing with time.Parse.
2026-04-30 09:32:11 -04:00
hobokenchicken d46a333249 feat: inject max_tokens from models.dev registry when not specified in request
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
When a client omits max_tokens, providers (DeepSeek, etc.) apply
a low server-side default output cap. Now gophergate looks up the
model in the models.dev registry and injects the model's output
limit, preventing silent truncation.
2026-04-28 15:36:06 -04:00
hobokenchicken 7446f3463d fix: add per-image cost tracking for DALL-E and Imagen
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-27 10:42:29 -04:00
hobokenchicken b1a72f5a10 fix: estimate image gen tokens from prompt length instead of hardcoding
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-27 10:28:39 -04:00
hobokenchicken 5ee539d95c feat: add image generation for OpenAI DALL-E and Gemini Imagen
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
New `/v1/images/generations` endpoint proxies DALL-E 2/3 (OpenAI)
and Imagen 3 (Gemini). Same auth/logging as chat completions.

- Add ImageGenerationRequest/Response models
- Extend Provider interface with ImageGeneration()
- OpenAI: forward to /v1/images/generations
- Gemini: call /v1beta/models/{model}:predict, map OpenAI params
- Circuit breaker wraps image gen like chat completions
- Model routing: dall-e* -> openai, imagen*/gemini* -> gemini
- Unsupported providers (deepseek/moonshot/grok/ollama) return error
- Fix pre-existing CachedContentTokenCount bug in StreamGemini
2026-04-27 10:06:07 -04:00
hobokenchicken 14e26a4323 feat: capture Gemini cached content tokens in cost tracking
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Add CachedContentTokenCount to UsageMetadata parsing for both
  streaming (helpers.go) and non-streaming (gemini.go) requests
- CacheReadTokens now populated from Gemini cachedContentTokenCount
- Add uint32Ptr helper for nil-safe uint32 pointer creation
2026-04-26 21:14:53 -04:00
hobokenchicken 1c3b1c6fe9 fix: FindModel reverse fuzzy match for date-suffixed model IDs
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Add step between exact ID match and forward fuzzy match that checks
if registry model ID starts with the requested name. Fixes models like
'gpt-5.4-mini' not matching 'gpt-5.4-mini-2026-04-01' in registry.
2026-04-26 21:09:56 -04:00
hobokenchicken 5e0c10db01 fix: goimports — strip unused imports from all server files
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-26 15:00:04 -04:00
hobokenchicken e598150d90 fix: trim imports per file — no unused imports in split files
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-26 14:58:53 -04:00
hobokenchicken 2fa6f0df62 fix: split dashboard.go properly — extract analytics + models_config
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- analytics.go: UsagePeriodFilter, UsageSummary, TimeSeries, ProvidersUsage, ClientsUsage, AnalyticsBreakdown, DetailedUsage
- models_config.go: handleGetModels, handleUpdateModel
- Fix all import blocks with missing closing parens
- Remove leftover fmt.Printf warnings in server.go
2026-04-26 14:57:28 -04:00
hobokenchicken db76858072 fix: import block syntax in split dashboard files
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Add missing closing ) in clients.go, providers_admin.go, users.go, system.go
- Add SetTimeout(30s) to OpenAI provider (was resty.New() with no timeout)
2026-04-26 14:55:29 -04:00
hobokenchicken af2c5b95f7 feat: Phase 3 - architecture & maintainability
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Split 1474-line dashboard.go into 5 domain files (clients, providers, users, system)
- Unit tests for ModelRegistry.FindModel and CalculateCost
- go mod tidy + verify (deps clean)
- .gitignore excludes tool cache dirs (.pi-lens/, .opencode/)
2026-04-26 14:52:10 -04:00
hobokenchicken 1f574d8134 feat: Phase 2 - reliability & observability
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Circuit breaker: proper thresholds (3 failures, 30s timeout)
- HTTP timeouts: 30s on all providers (was no timeout)
- Structured logging: slog replaces fmt.Printf throughout
- Stream errors: propagated as SSE error events to client
- Registry fetch: retry with backoff (3 attempts)
- Registry reads in dashboard protected by RWMutex
2026-04-26 14:48:56 -04:00
hobokenchicken 8a8d8d1477 fix: Phase 1 - security & stability patches
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- AuthMiddleware now requires auth on /v1/* routes (returns 401)
- WebSocket origin check configurable via WSAllowedOrigin
- Removed debug fmt.Printf leaks (config, ollama, server)
- Registry access protected by sync.RWMutex (race condition fix)
- Session cleanup goroutine runs every 15 min
- RevokeSession returns error instead of silent no-op
2026-04-26 14:45:22 -04:00
hobokenchicken da074f52b4 fix: remove global auth middleware, causing webui login issues
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-09 12:21:02 -04:00
hobokenchicken 9b0aa4dbe8 fix: remove unused fmt import in circuit breaker provider
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-09 12:19:14 -04:00
hobokenchicken 212ac14a1b feat: implement circuit breaker, fix auth vulnerability
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-09 12:17:18 -04:00
hobokenchicken 2929f51556 fixed model visibility
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-09 12:13:53 +00:00
hobokenchicken e12418cc4c fix(gemini): ensure strict 1:1 pairing of model calls and function responses
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Gemini requires function results to immediately follow the model message that called them
- Implemented look-ahead grouping to pair assistant calls with their tool results
- Standardized system and orphaned tool message handling for Gemini compatibility
2026-04-07 18:57:13 +00:00
hobokenchicken be4ec3482a fix(gemini): group adjacent tool messages and ensure correct role sequence
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Group consecutive 'tool' messages into a single Gemini content message with multiple 'functionResponse' parts
- Ensure assistant tool calls are properly mapped and sent
- Maintain v1beta for preview and newer models
- Added debug logging for API errors
2026-04-07 18:50:48 +00:00
hobokenchicken e67aafdac1 fix(gemini): improve tool-calling support and handle function_call response
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Support tool definitions in Gemini requests
- Map tool role to 'function' in Gemini content
- Ensure tool results are wrapped in JSON objects for Gemini compatibility
- Parse FunctionCall from Gemini response and map to OpenAI-compatible ToolCalls
- Correctly map finish_reason for tool calls
2026-04-07 18:37:57 +00:00
hobokenchicken 21e5204abd fix(ollama): improve tool-calling support and restore gemma/llama context limits
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Explicitly set tool_choice: auto when tools are present to aid gemma/llama models
- Sync stop sequences into the options map for broader compatibility
- Restore gemma/llama to the high-context (32k) optimization list
2026-04-07 14:24:23 +00:00
hobokenchicken 4095c68822 fix(ollama): improve model detection and ensure robust token/context limits
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Use case-insensitive matching for model names and routing
- Default max_tokens/num_predict to 8192 for all Ollama models to prevent truncation
- Increase default context window and add more large-context model families
- Ensure DeepSeek routing handles Ollama-hosted variants correctly
2026-04-07 14:05:21 +00:00
hobokenchicken ef37dc5af0 fix(ollama): significantly increase context and prediction limits
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Increase timeout to 15m
- Set num_ctx to 32k for common models
- Set default num_predict to 8192 for common models
2026-04-07 13:48:02 +00:00
hobokenchicken fdbb068a6c fix(ollama): map max_tokens to num_predict and increase context window
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Map MaxTokens to num_predict in options map
- Set default num_ctx to 8192 for common models (gemma, llama, etc.)
- This ensures Ollama doesn't cut off responses early due to default limits
2026-04-07 13:44:17 +00:00
hobokenchicken dbbf48cb14 fix(ollama): increase timeout and add default max_tokens for large models
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Increase Ollama timeout to 5m for larger models (e.g. gemma4)
- Set default max_tokens to 4096 for common Ollama models
- Expand stream scanner buffer to 10MB to prevent truncation
- Improve model routing and prefix stripping in server
2026-04-07 13:42:10 +00:00
hobokenchicken 1e13b0376b feat(ollama): improve configuration and dashboard integration
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-04-07 12:53:17 +00:00
hobokenchicken 1b5cd2815e fix(ollama): improve error handling and add timeouts
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Add timeouts (30s) and retries to resty client
- Add debug logging for Ollama requests and responses
- Import time package for timeout configuration
- This should fix 500 errors and provide better error messages
2026-04-06 15:05:31 -04:00
hobokenchicken ba4c4af2f8 docs: update documentation for Ollama provider
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Add Ollama configuration instructions to README.md
- Update API usage section with Ollama examples
- Add Ollama to provider list in BACKEND_ARCHITECTURE.md
- All documentation now reflects complete Ollama support
2026-04-06 15:01:55 -04:00
hobokenchicken e56a284415 docs: update TODO.md with Ollama provider completion
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Mark Ollama as completed provider implementation
- Add Ollama-specific feature checklist
- Update provider list in completed tasks
2026-04-06 14:46:32 -04:00
hobokenchicken cbc9eeb453 fix(server): add Ollama model detection and registry support
- Add Ollama to allowed providers in model list endpoint
- Add model pattern detection for Ollama models (glm-, qwen, gemma, llama, mistral, codellama)
- This fixes 500 errors when using Ollama models via /v1/chat/completions
2026-04-06 14:45:57 -04:00
hobokenchicken 2f6b7deb2c feat(providers): add Ollama provider support
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Implement OllamaProvider with OpenAI-compatible API integration
- Add Ollama to provider initialization in server.go
- Update config.go to handle Ollama (no API key required)
- Configure .env with Ollama server at 172.20.1.222:11434
- Support models: glm-4.7-flash:latest, qwen3-coder:30b, gemma4:26b
2026-04-06 14:38:35 -04:00
hobokenchicken 9375448087 fix(moonshot): resolve 401 Unauthorized errors by trimming API keys and improving request compatibility
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-03-26 17:09:27 +00:00
hobokenchicken 5be2f6f7aa fix: use Moonshot test model
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-03-26 10:12:44 -04:00
hobokenchicken eebcadcba1 fix: surface moonshot on providers page
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-03-25 09:35:41 -04:00
hobokenchicken 6b2bd13903 chore: remove tracked binary gophergate
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-03-25 13:32:51 +00:00
hobokenchicken 5dfda0a10c merge: resolve conflicts in server.go and integrate moonshot support 2026-03-25 13:32:40 +00:00
hobokenchicken a8a02d9e1c feat: add moonshot kimi k2.5 support
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-03-25 09:28:52 -04:00
hobokenchicken bd1d17cc4d feat: add moonshot kimi k2.5 support
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-03-25 09:27:46 -04:00
hobokenchicken 9207a7231c chore: update all grok-2 references to grok-4
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-03-25 13:17:06 +00:00
hobokenchicken c6efff9034 fix: update grok test model to grok-4-1-fast-non-reasoning
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
2026-03-25 13:14:31 +00:00
hobokenchicken 27fbd8ed15 chore: cleanup repository and update gitignore
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Removed binary 'gophergate'
- Removed 'data/llm_proxy.db' from source control (kept locally)
- Removed old database backups in 'data/backups/'
- Updated .gitignore to exclude data directory and gophergate binary
2026-03-25 13:08:33 +00:00
hobokenchicken 348341f304 fix: prioritize database provider configs and implement API key encryption
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Added AES-GCM encryption/decryption for provider API keys in the database.
- Implemented RefreshProviders to load provider configs from the database with precedence over environment variables.
- Updated dashboard handlers to encrypt keys on save and trigger in-memory provider refresh.
- Updated Grok test model to grok-3-mini for better compatibility.
2026-03-25 13:04:26 +00:00
hobokenchicken 9380580504 fix: resolve dashboard websocket 'disconnected' status
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Fixed status indicator UI mapping in websocket.js and index.html.
- Added missing CSS for connection status indicator and pulse animation.
- Made initial model registry fetch asynchronous to prevent blocking server startup.
- Improved configuration loading to correctly handle LLM_PROXY__SERVER__PORT from environment.
2026-03-19 14:32:34 -04:00
hobokenchicken 08cf5cc1d9 fix: improve cost tracking accuracy for modern models
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Added support for reasoning tokens in cost calculations.
- Fixed DeepSeek cache-write token mapping (PromptCacheMissTokens).
- Improved CalculateCost debug logging to trace all pricing variables.
2026-03-19 14:14:54 -04:00
hobokenchicken 0f0486d8d4 fix: resolve user dashboard field mapping and session consistency
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Added JSON tags to the User struct to match frontend expectations and excluded sensitive fields.
Updated session management to include and persist DisplayName.
Unified user field names (using display_name) across backend, sessions, and frontend UI.
2026-03-19 14:01:59 -04:00
hobokenchicken 0ea2a3a985 fix: improve provider and model data accuracy in dashboard
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Updated handleGetProviders to include available models and last-used timestamps. Refined Model Pricing table to strictly filter by core providers and actual usage.
2026-03-19 13:51:46 -04:00