- Switch Gemini 3 models to v1beta for both streaming and non-streaming (better reasoning support).
- Increase max_output_tokens cap to 65536 for reasoning-heavy models.
- Elevate API URL and chunk tracing to INFO level for easier production debugging.
- Exclude 'HARM_CATEGORY_CIVIC_INTEGRITY' when using v1 endpoint (v1beta only).
- Filter out empty strings from 'stop_sequences' which are rejected by Gemini.
- Update error probe to use non-streaming endpoint for better JSON error diagnostics.
- Recursively remove '$schema', 'additionalProperties', 'exclusiveMaximum', and 'exclusiveMinimum' from tool definitions.
- These fields are frequently included by clients like opencode but are rejected by the Gemini API with 400 errors.
- Merge all consecutive messages with the same role into a single GeminiContent object.
- Ensure the first message is always 'user' by prepending a placeholder if necessary.
- Add final check for empty contents to prevent sending malformed requests.
- This addresses strict role-sequence requirements in Gemini 2.0/3.0 models.
- Update get_base_url to prefer v1 for Gemini 3.0+ models even if they contain 'preview'.
- Add tracing::debug logs for the final API URLs used in both streaming and non-streaming requests.
- Switch to v1beta endpoint for 'preview' and 'thinking' models.
- Update model version checks to include gemini-3 as a known version.
- Use get_base_url helper to construct dynamic URLs for both streaming and non-streaming requests.
- Map unknown Gemini model names to the configured default model to prevent 400 errors.
- Clamp max_tokens to a safe limit of 8192 for Gemini models.
- Clean up message filtering and role injection for better client compatibility.
- Filter out empty text parts in Gemini requests to avoid 400 errors.
- Inject 'assistant' role into the first streaming chunk for better compatibility with clients like opencode.
- Fallback to tool_call_id for Gemini function responses when name is missing.
Gemini API requires the first message to be from the 'user' role.
This commit ensures that:
- If a conversation starts with a 'model' (assistant) role, a placeholder 'user' message is prepended.
- 'tool' results are correctly mapped to 'user' role parts.
- Sequential messages with the same role are merged.
- Empty content requests are prevented in both sync and stream paths.
This fixes 400 Bad Request errors when clients (like opencode) send
message histories that don't match Gemini's strict role requirements.
The app never read the LLM_PROXY__CONFIG_PATH env var, so the systemd
service couldn't find /etc/llm-proxy/config.toml and fell back to
./data/llm_proxy.db (owned by root, readonly for llmproxy user).
- Add LLM_PROXY__CONFIG_PATH support to config loader (checks env var
before falling back to ./config.toml)
- Add LLM_PROXY__DATABASE__PATH to service env so the DB path always
resolves to /var/lib/llm-proxy/llm_proxy.db regardless of config
- Add GIT_REPO config variable with actual Gitea remote URL
- Auto-detect nologin path for Arch/Debian compatibility
- Rewrite update() to build before stopping service (minimizes downtime)
- Abort update cleanly if build fails without interrupting service
- Fix all placeholder github.com/yourusername URLs
- Add status and logs convenience commands
Replace login-style 'form-group' class (absolute-positioned floating
labels) with dashboard-standard 'form-control' class (block labels,
proper input styling). Add missing 'modal-title' class to headings
and remove incorrect 'form-control' class from individual inputs.
The create, edit, and reset-password modals were using 'modal-overlay'
and 'modal' classes instead of the standard 'modal active' and
'modal-content' pattern, making them invisible due to CSS rules that
hide .modal elements without the .active class.
Add complete multi-user support with role-based access control:
Backend:
- Add users CRUD endpoints (GET/POST/PUT/DELETE /api/users) with admin-only guards
- Add display_name column to users table with ALTER TABLE migration
- Fix auth to use session-based user identity (not hardcoded 'admin')
- Add POST /api/auth/logout to revoke server-side sessions
- Add require_admin() and extract_session() helpers for clean RBAC
- Guard all mutating endpoints (clients, providers, models, settings, backup)
Frontend:
- Add Users management page with create/edit/reset-password/delete modals
- Add role gating: hide edit/delete buttons for viewers on clients, providers, models
- Settings page hides auth tokens and admin actions for viewers
- Logout now revokes server session before clearing localStorage
- Sidebar shows real display_name and formatted role (Administrator/Viewer)
- Fix sidebar header: single logo with onerror fallback, renamed to 'LLM Proxy'
- Add badge and btn-action CSS classes for role pills and action buttons
- Bump cache-bust to v=7
- All usage endpoints now accept ?period=today|24h|7d|30d|all|custom
with optional &from=ISO&to=ISO for custom ranges
- Time-series chart adapts granularity: hourly for today/24h, daily for
7d/30d/all
- Analytics and Costs pages have period selector buttons with custom
date-range picker
- Pricing table on Costs page now only shows models that have actually
been used (GET /models?used_only=true)
- Cache-bust version bumped to v=6
Add client_tokens table with auto-generated sk-{hex} tokens so clients
created in the dashboard get working API keys. Auth flow: DB token lookup
first, then env token fallback, then permissive mode. Includes token
management CRUD endpoints and copy-once reveal modal in the frontend.
Add PUT /api/clients/{id} with dynamic UPDATE for name, description,
is_active, and rate_limit_per_minute. Expose description and
rate_limit_per_minute in client list/detail responses. Replace the
frontend editClient stub with a modal that fetches, edits, and saves
client data.
Track cache_read_tokens and cache_write_tokens end-to-end: parse from
provider responses (OpenAI, DeepSeek, Grok, Gemini), persist to SQLite,
apply cache-aware pricing from the model registry, and surface in API
responses and the dashboard.
- Add cache fields to ProviderResponse, StreamUsage, RequestLog structs
- Parse cached_tokens (OpenAI/Grok), prompt_cache_hit/miss (DeepSeek),
cachedContentTokenCount (Gemini) from provider responses
- Send stream_options.include_usage for streaming; capture real usage
from final SSE chunk in AggregatingStream
- ALTER TABLE migration for cache_read_tokens/cache_write_tokens columns
- Cache-aware cost formula using registry cache_read/cache_write rates
- Update Provider trait calculate_cost signature across all providers
- Add cache_read_tokens/cache_write_tokens to Usage API response
- Dashboard: cache hit rate card, cache columns in pricing and usage
tables, cache token aggregation in SQL queries
- Remove API debug panel and verbose console logging from api.js
- Bump static asset cache-bust to v5
The model registry from models.dev uses 'google' and 'xai' as provider
IDs, but internal providers use 'gemini' and 'grok'. Added mapping so
all provider models appear in the listing.