Track cache_read_tokens and cache_write_tokens end-to-end: parse from
provider responses (OpenAI, DeepSeek, Grok, Gemini), persist to SQLite,
apply cache-aware pricing from the model registry, and surface in API
responses and the dashboard.
- Add cache fields to ProviderResponse, StreamUsage, RequestLog structs
- Parse cached_tokens (OpenAI/Grok), prompt_cache_hit/miss (DeepSeek),
cachedContentTokenCount (Gemini) from provider responses
- Send stream_options.include_usage for streaming; capture real usage
from final SSE chunk in AggregatingStream
- ALTER TABLE migration for cache_read_tokens/cache_write_tokens columns
- Cache-aware cost formula using registry cache_read/cache_write rates
- Update Provider trait calculate_cost signature across all providers
- Add cache_read_tokens/cache_write_tokens to Usage API response
- Dashboard: cache hit rate card, cache columns in pricing and usage
tables, cache token aggregation in SQL queries
- Remove API debug panel and verbose console logging from api.js
- Bump static asset cache-bust to v5
- Add in-memory ModelConfigCache (30s refresh, explicit invalidation)
replacing 2 SQLite queries per request (model lookup + cost override)
- Configure all 5 provider HTTP clients with proper timeouts (300s),
connection pooling (4 idle/host, 90s idle timeout), and TCP keepalive
- Move client_usage update to tokio::spawn in non-streaming path
- Use fast chars/4 heuristic for token estimation on large inputs (>1KB)
- Generate single UUID/timestamp per SSE stream instead of per chunk
- Add shared LazyLock<Client> for image fetching in multimodal module
- Add proxy overhead timing instrumentation for both request paths
- Fix test helper to include new model_config_cache field
Implement full OpenAI-compatible tool-calling support across the proxy,
enabling OpenCode to use llm-proxy as its sole LLM backend.
- Add 9 tool-calling types (Tool, FunctionDef, ToolChoice, ToolCall, etc.)
- Update ChatCompletionRequest/ChatMessage/ChatStreamDelta with tool fields
- Update UnifiedRequest/UnifiedMessage to carry tool data through the pipeline
- Shared helpers: messages_to_openai_json handles tool messages, build_openai_body
includes tools/tool_choice, parse/stream extract tool_calls from responses
- Gemini: full OpenAI<->Gemini format translation (functionDeclarations,
functionCall/functionResponse, synthetic call IDs, tool_config mapping)
- Gemini: extract duplicated message-conversion into shared convert_messages()
- Server: SSE streams include tool_calls deltas, finish_reason='tool_calls'
- AggregatingStream: accumulate tool call deltas across stream chunks
- OpenAI provider: add o4- prefix to supports_model()
- Added 'provider_configs' and 'model_configs' tables to database.
- Refactored ProviderManager to support thread-safe dynamic updates and database overrides.
- Implemented 'Models' tab in dashboard to manage model visibility, mapping, and pricing.
- Added provider configuration modal to 'Providers' tab.
- Integrated database overrides into chat completion logic (enabled state, mapping, and cost).