- The native Gemini REST API requires camelCase 'thoughtSignature' as a sibling to functionCall.
- Explicitly rename the field to match this requirement, resolving the 'missing thought_signature' 400 error.
- Only restore thought_signature if the tool call ID doesn't start with 'call_'.
- This ensures proxy-generated UUIDs are never sent back to Gemini as signatures, which was causing base64 decoding failures.
- Extract thought_signature from the Part level during response parsing.
- Provide thought_signature as a sibling to functionCall during request assembly.
- This fully resolves the 'Unknown name thoughtSignature at function_call' error.
- Update get_base_url to only perform replacement if the base_url specifically ends with /v1.
- This prevents malformed URLs like /v1betabeta when the base_url was already configured as v1beta.
- Switch Gemini 3 models to v1beta for both streaming and non-streaming (better reasoning support).
- Increase max_output_tokens cap to 65536 for reasoning-heavy models.
- Elevate API URL and chunk tracing to INFO level for easier production debugging.
- Exclude 'HARM_CATEGORY_CIVIC_INTEGRITY' when using v1 endpoint (v1beta only).
- Filter out empty strings from 'stop_sequences' which are rejected by Gemini.
- Update error probe to use non-streaming endpoint for better JSON error diagnostics.
- Recursively remove '$schema', 'additionalProperties', 'exclusiveMaximum', and 'exclusiveMinimum' from tool definitions.
- These fields are frequently included by clients like opencode but are rejected by the Gemini API with 400 errors.
- Merge all consecutive messages with the same role into a single GeminiContent object.
- Ensure the first message is always 'user' by prepending a placeholder if necessary.
- Add final check for empty contents to prevent sending malformed requests.
- This addresses strict role-sequence requirements in Gemini 2.0/3.0 models.
- Update get_base_url to prefer v1 for Gemini 3.0+ models even if they contain 'preview'.
- Add tracing::debug logs for the final API URLs used in both streaming and non-streaming requests.
- Switch to v1beta endpoint for 'preview' and 'thinking' models.
- Update model version checks to include gemini-3 as a known version.
- Use get_base_url helper to construct dynamic URLs for both streaming and non-streaming requests.
- Map unknown Gemini model names to the configured default model to prevent 400 errors.
- Clamp max_tokens to a safe limit of 8192 for Gemini models.
- Clean up message filtering and role injection for better client compatibility.
- Filter out empty text parts in Gemini requests to avoid 400 errors.
- Inject 'assistant' role into the first streaming chunk for better compatibility with clients like opencode.
- Fallback to tool_call_id for Gemini function responses when name is missing.
Gemini API requires the first message to be from the 'user' role.
This commit ensures that:
- If a conversation starts with a 'model' (assistant) role, a placeholder 'user' message is prepended.
- 'tool' results are correctly mapped to 'user' role parts.
- Sequential messages with the same role are merged.
- Empty content requests are prevented in both sync and stream paths.
This fixes 400 Bad Request errors when clients (like opencode) send
message histories that don't match Gemini's strict role requirements.
Track cache_read_tokens and cache_write_tokens end-to-end: parse from
provider responses (OpenAI, DeepSeek, Grok, Gemini), persist to SQLite,
apply cache-aware pricing from the model registry, and surface in API
responses and the dashboard.
- Add cache fields to ProviderResponse, StreamUsage, RequestLog structs
- Parse cached_tokens (OpenAI/Grok), prompt_cache_hit/miss (DeepSeek),
cachedContentTokenCount (Gemini) from provider responses
- Send stream_options.include_usage for streaming; capture real usage
from final SSE chunk in AggregatingStream
- ALTER TABLE migration for cache_read_tokens/cache_write_tokens columns
- Cache-aware cost formula using registry cache_read/cache_write rates
- Update Provider trait calculate_cost signature across all providers
- Add cache_read_tokens/cache_write_tokens to Usage API response
- Dashboard: cache hit rate card, cache columns in pricing and usage
tables, cache token aggregation in SQL queries
- Remove API debug panel and verbose console logging from api.js
- Bump static asset cache-bust to v5
- Add in-memory ModelConfigCache (30s refresh, explicit invalidation)
replacing 2 SQLite queries per request (model lookup + cost override)
- Configure all 5 provider HTTP clients with proper timeouts (300s),
connection pooling (4 idle/host, 90s idle timeout), and TCP keepalive
- Move client_usage update to tokio::spawn in non-streaming path
- Use fast chars/4 heuristic for token estimation on large inputs (>1KB)
- Generate single UUID/timestamp per SSE stream instead of per chunk
- Add shared LazyLock<Client> for image fetching in multimodal module
- Add proxy overhead timing instrumentation for both request paths
- Fix test helper to include new model_config_cache field
messages_to_openai_json_text_only() was missing tool-calling support,
causing DeepSeek 400 errors when conversations included tool turns.
Now mirrors messages_to_openai_json() logic for tool-role messages
(tool_call_id, name) and assistant tool_calls, with images replaced
by "[Image]" text.
Implement full OpenAI-compatible tool-calling support across the proxy,
enabling OpenCode to use llm-proxy as its sole LLM backend.
- Add 9 tool-calling types (Tool, FunctionDef, ToolChoice, ToolCall, etc.)
- Update ChatCompletionRequest/ChatMessage/ChatStreamDelta with tool fields
- Update UnifiedRequest/UnifiedMessage to carry tool data through the pipeline
- Shared helpers: messages_to_openai_json handles tool messages, build_openai_body
includes tools/tool_choice, parse/stream extract tool_calls from responses
- Gemini: full OpenAI<->Gemini format translation (functionDeclarations,
functionCall/functionResponse, synthetic call IDs, tool_config mapping)
- Gemini: extract duplicated message-conversion into shared convert_messages()
- Server: SSE streams include tool_calls deltas, finish_reason='tool_calls'
- AggregatingStream: accumulate tool call deltas across stream chunks
- OpenAI provider: add o4- prefix to supports_model()
- Added 'provider_configs' and 'model_configs' tables to database.
- Refactored ProviderManager to support thread-safe dynamic updates and database overrides.
- Implemented 'Models' tab in dashboard to manage model visibility, mapping, and pricing.
- Added provider configuration modal to 'Providers' tab.
- Integrated database overrides into chat completion logic (enabled state, mapping, and cost).
- Switched from mock data to real backend APIs.
- Implemented unified ApiClient for consistent frontend data fetching.
- Refactored dashboard structure and styles for a modern SaaS aesthetic.
- Fixed Axum 0.8+ routing and parameter syntax issues.
- Implemented real client creation/deletion and provider health monitoring.
- Synchronized WebSocket event structures between backend and frontend.