Commit Graph

12 Commits

Author SHA1 Message Date
hobokenchicken e5ef39f327 feat: add OpenAI Responses API support (POST /v1/responses)
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Add full Responses API endpoint alongside existing Chat Completions,
with identical logging/tracking/cost pipeline.

New:
- internal/models/responses.go — request/response/stream types + ToUsage() bridge
- internal/providers/openai_responses.go — OpenAI Responses/ResponsesStream

Modified:
- provider.go — Responses()+ResponsesStream() added to Provider interface
- helpers.go — BuildOpenAIResponsesBody, parsers, SSE stream reader
- circuit_breaker.go — CB wraps Responses, passthrough for stream
- server.go — POST /v1/responses route + handleResponses handler
- all non-OpenAI providers — stub methods with clear error messages

Logging: ResponsesUsage.ToUsage() bridges to models.Usage, feeding same
logRequest() -> DB insert -> dashboard WS -> client stats -> cost calc
pipeline. No schema or logger changes needed.
2026-05-02 16:38:17 -04:00
hobokenchicken 5ee539d95c feat: add image generation for OpenAI DALL-E and Gemini Imagen
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
New `/v1/images/generations` endpoint proxies DALL-E 2/3 (OpenAI)
and Imagen 3 (Gemini). Same auth/logging as chat completions.

- Add ImageGenerationRequest/Response models
- Extend Provider interface with ImageGeneration()
- OpenAI: forward to /v1/images/generations
- Gemini: call /v1beta/models/{model}:predict, map OpenAI params
- Circuit breaker wraps image gen like chat completions
- Model routing: dall-e* -> openai, imagen*/gemini* -> gemini
- Unsupported providers (deepseek/moonshot/grok/ollama) return error
- Fix pre-existing CachedContentTokenCount bug in StreamGemini
2026-04-27 10:06:07 -04:00
hobokenchicken af2c5b95f7 feat: Phase 3 - architecture & maintainability
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Split 1474-line dashboard.go into 5 domain files (clients, providers, users, system)
- Unit tests for ModelRegistry.FindModel and CalculateCost
- go mod tidy + verify (deps clean)
- .gitignore excludes tool cache dirs (.pi-lens/, .opencode/)
2026-04-26 14:52:10 -04:00
hobokenchicken 1f574d8134 feat: Phase 2 - reliability & observability
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Circuit breaker: proper thresholds (3 failures, 30s timeout)
- HTTP timeouts: 30s on all providers (was no timeout)
- Structured logging: slog replaces fmt.Printf throughout
- Stream errors: propagated as SSE error events to client
- Registry fetch: retry with backoff (3 attempts)
- Registry reads in dashboard protected by RWMutex
2026-04-26 14:48:56 -04:00
hobokenchicken 8a8d8d1477 fix: Phase 1 - security & stability patches
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- AuthMiddleware now requires auth on /v1/* routes (returns 401)
- WebSocket origin check configurable via WSAllowedOrigin
- Removed debug fmt.Printf leaks (config, ollama, server)
- Registry access protected by sync.RWMutex (race condition fix)
- Session cleanup goroutine runs every 15 min
- RevokeSession returns error instead of silent no-op
2026-04-26 14:45:22 -04:00
hobokenchicken 21e5204abd fix(ollama): improve tool-calling support and restore gemma/llama context limits
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Explicitly set tool_choice: auto when tools are present to aid gemma/llama models
- Sync stop sequences into the options map for broader compatibility
- Restore gemma/llama to the high-context (32k) optimization list
2026-04-07 14:24:23 +00:00
hobokenchicken 4095c68822 fix(ollama): improve model detection and ensure robust token/context limits
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Use case-insensitive matching for model names and routing
- Default max_tokens/num_predict to 8192 for all Ollama models to prevent truncation
- Increase default context window and add more large-context model families
- Ensure DeepSeek routing handles Ollama-hosted variants correctly
2026-04-07 14:05:21 +00:00
hobokenchicken ef37dc5af0 fix(ollama): significantly increase context and prediction limits
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Increase timeout to 15m
- Set num_ctx to 32k for common models
- Set default num_predict to 8192 for common models
2026-04-07 13:48:02 +00:00
hobokenchicken fdbb068a6c fix(ollama): map max_tokens to num_predict and increase context window
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Map MaxTokens to num_predict in options map
- Set default num_ctx to 8192 for common models (gemma, llama, etc.)
- This ensures Ollama doesn't cut off responses early due to default limits
2026-04-07 13:44:17 +00:00
hobokenchicken dbbf48cb14 fix(ollama): increase timeout and add default max_tokens for large models
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Increase Ollama timeout to 5m for larger models (e.g. gemma4)
- Set default max_tokens to 4096 for common Ollama models
- Expand stream scanner buffer to 10MB to prevent truncation
- Improve model routing and prefix stripping in server
2026-04-07 13:42:10 +00:00
hobokenchicken 1b5cd2815e fix(ollama): improve error handling and add timeouts
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Add timeouts (30s) and retries to resty client
- Add debug logging for Ollama requests and responses
- Import time package for timeout configuration
- This should fix 500 errors and provide better error messages
2026-04-06 15:05:31 -04:00
hobokenchicken 2f6b7deb2c feat(providers): add Ollama provider support
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
- Implement OllamaProvider with OpenAI-compatible API integration
- Add Ollama to provider initialization in server.go
- Update config.go to handle Ollama (no API key required)
- Configure .env with Ollama server at 172.20.1.222:11434
- Support models: glm-4.7-flash:latest, qwen3-coder:30b, gemma4:26b
2026-04-06 14:38:35 -04:00