- Filter out empty text parts in Gemini requests to avoid 400 errors.
- Inject 'assistant' role into the first streaming chunk for better compatibility with clients like opencode.
- Fallback to tool_call_id for Gemini function responses when name is missing.
Add client_tokens table with auto-generated sk-{hex} tokens so clients
created in the dashboard get working API keys. Auth flow: DB token lookup
first, then env token fallback, then permissive mode. Includes token
management CRUD endpoints and copy-once reveal modal in the frontend.
Track cache_read_tokens and cache_write_tokens end-to-end: parse from
provider responses (OpenAI, DeepSeek, Grok, Gemini), persist to SQLite,
apply cache-aware pricing from the model registry, and surface in API
responses and the dashboard.
- Add cache fields to ProviderResponse, StreamUsage, RequestLog structs
- Parse cached_tokens (OpenAI/Grok), prompt_cache_hit/miss (DeepSeek),
cachedContentTokenCount (Gemini) from provider responses
- Send stream_options.include_usage for streaming; capture real usage
from final SSE chunk in AggregatingStream
- ALTER TABLE migration for cache_read_tokens/cache_write_tokens columns
- Cache-aware cost formula using registry cache_read/cache_write rates
- Update Provider trait calculate_cost signature across all providers
- Add cache_read_tokens/cache_write_tokens to Usage API response
- Dashboard: cache hit rate card, cache columns in pricing and usage
tables, cache token aggregation in SQL queries
- Remove API debug panel and verbose console logging from api.js
- Bump static asset cache-bust to v5
The model registry from models.dev uses 'google' and 'xai' as provider
IDs, but internal providers use 'gemini' and 'grok'. Added mapping so
all provider models appear in the listing.
Open WebUI and other OpenAI-compatible clients call GET /v1/models to
discover available models. Lists all models from enabled providers via
the model registry, respects disabled models, and handles Ollama models
from TOML config.
- Add in-memory ModelConfigCache (30s refresh, explicit invalidation)
replacing 2 SQLite queries per request (model lookup + cost override)
- Configure all 5 provider HTTP clients with proper timeouts (300s),
connection pooling (4 idle/host, 90s idle timeout), and TCP keepalive
- Move client_usage update to tokio::spawn in non-streaming path
- Use fast chars/4 heuristic for token estimation on large inputs (>1KB)
- Generate single UUID/timestamp per SSE stream instead of per chunk
- Add shared LazyLock<Client> for image fetching in multimodal module
- Add proxy overhead timing instrumentation for both request paths
- Fix test helper to include new model_config_cache field
Implement full OpenAI-compatible tool-calling support across the proxy,
enabling OpenCode to use llm-proxy as its sole LLM backend.
- Add 9 tool-calling types (Tool, FunctionDef, ToolChoice, ToolCall, etc.)
- Update ChatCompletionRequest/ChatMessage/ChatStreamDelta with tool fields
- Update UnifiedRequest/UnifiedMessage to carry tool data through the pipeline
- Shared helpers: messages_to_openai_json handles tool messages, build_openai_body
includes tools/tool_choice, parse/stream extract tool_calls from responses
- Gemini: full OpenAI<->Gemini format translation (functionDeclarations,
functionCall/functionResponse, synthetic call IDs, tool_config mapping)
- Gemini: extract duplicated message-conversion into shared convert_messages()
- Server: SSE streams include tool_calls deltas, finish_reason='tool_calls'
- AggregatingStream: accumulate tool call deltas across stream chunks
- OpenAI provider: add o4- prefix to supports_model()
- Added 'provider_configs' and 'model_configs' tables to database.
- Refactored ProviderManager to support thread-safe dynamic updates and database overrides.
- Implemented 'Models' tab in dashboard to manage model visibility, mapping, and pricing.
- Added provider configuration modal to 'Providers' tab.
- Integrated database overrides into chat completion logic (enabled state, mapping, and cost).
- Added strict token validation against LLM_PROXY__SERVER__AUTH_TOKENS.
- Integrated 'reasoning_content' support into providers and server responses.
- Updated AppState to carry valid auth tokens for request-time validation.