Gemini API requires the first message to be from the 'user' role. This commit ensures that: - If a conversation starts with a 'model' (assistant) role, a placeholder 'user' message is prepended. - 'tool' results are correctly mapped to 'user' role parts. - Sequential messages with the same role are merged. - Empty content requests are prevented in both sync and stream paths. This fixes 400 Bad Request errors when clients (like opencode) send message histories that don't match Gemini's strict role requirements.
LLM Proxy Gateway
A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok, Ollama) with built-in token tracking, real-time cost calculation, multi-user authentication, and a management dashboard.
Features
- Unified API: OpenAI-compatible
/v1/chat/completionsand/v1/modelsendpoints. - Multi-Provider Support:
- OpenAI: GPT-4o, GPT-4o Mini, o1, o3 reasoning models.
- Google Gemini: Gemini 2.0 Flash, Pro, and vision models.
- DeepSeek: DeepSeek Chat and Reasoner models.
- xAI Grok: Grok-beta models.
- Ollama: Local LLMs running on your network.
- Observability & Tracking:
- Real-time Costing: Fetches live pricing and context specs from
models.devon startup. - Token Counting: Precise estimation using
tiktoken-rs. - Database Logging: Every request logged to SQLite for historical analysis.
- Streaming Support: Full SSE (Server-Sent Events) with
[DONE]termination for client compatibility.
- Real-time Costing: Fetches live pricing and context specs from
- Multimodal (Vision): Image processing (Base64 and remote URLs) across compatible providers.
- Multi-User Access Control:
- Admin Role: Full access to all dashboard features, user management, and system configuration.
- Viewer Role: Read-only access to usage analytics, costs, and monitoring.
- Client API Keys: Create and manage multiple client tokens for external integrations.
- Reliability:
- Circuit Breaking: Automatically protects when providers are down.
- Rate Limiting: Per-client and global rate limits.
- Cache-Aware Costing: Tracks cache hit/miss tokens for accurate billing.
Tech Stack
- Runtime: Rust with Tokio.
- Web Framework: Axum.
- Database: SQLx with SQLite.
- Frontend: Vanilla JS/CSS with Chart.js for visualizations.
Getting Started
Prerequisites
- Rust (1.80+)
- SQLite3
Quick Start
-
Clone and build:
git clone ssh://git.dustin.coffee:2222/hobokenchicken/llm-proxy.git cd llm-proxy cargo build --release -
Configure environment:
cp .env.example .env # Edit .env and add your API keys: # OPENAI_API_KEY=sk-... # GEMINI_API_KEY=AIza... # DEEPSEEK_API_KEY=sk-... # GROK_API_KEY=gk-... (optional) -
Run the proxy:
cargo run --release
The server starts on http://localhost:8080 by default.
Configuration
Edit config.toml to customize providers, models, and settings:
[server]
port = 8080
host = "0.0.0.0"
[database]
path = "./data/llm_proxy.db"
[providers.openai]
enabled = true
default_model = "gpt-4o"
[providers.gemini]
enabled = true
default_model = "gemini-2.0-flash"
[providers.deepseek]
enabled = true
default_model = "deepseek-reasoner"
[providers.grok]
enabled = false
default_model = "grok-beta"
[providers.ollama]
enabled = false
base_url = "http://localhost:11434/v1"
Management Dashboard
Access the dashboard at http://localhost:8080:
- Overview: Real-time request counters, system health, provider status.
- Analytics: Time-series charts, filterable by date, client, provider, and model.
- Costs: Budget tracking, cost breakdown by provider/client/model, projections.
- Clients: Create, revoke, and rotate API tokens; per-client usage stats.
- Providers: Enable/disable providers, test connections, configure API keys.
- Monitoring: Live request stream via WebSocket, response times, error rates.
- Users: Admin/user management with role-based access control.
Default Credentials
- Username:
admin - Password:
admin123
Change the admin password in the dashboard after first login!
API Usage
The proxy is a drop-in replacement for OpenAI. Configure your client:
Python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="YOUR_CLIENT_API_KEY" # Create in dashboard
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
Open WebUI
API Base URL: http://your-server:8080/v1
API Key: YOUR_CLIENT_API_KEY
cURL
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_CLIENT_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": false
}'
Model Discovery
The proxy exposes /v1/models for OpenAI-compatible client model discovery:
curl http://localhost:8080/v1/models \
-H "Authorization: Bearer YOUR_CLIENT_API_KEY"
Troubleshooting
Streaming Issues
If clients timeout or show "TransferEncodingError", ensure:
- Proxy buffering is disabled in nginx:
proxy_buffering off; - Chunked transfer is enabled:
chunked_transfer_encoding on; - Timeouts are sufficient:
proxy_read_timeout 7200s;
Provider Errors
- Check API keys are set in
.env - Test provider in dashboard (Settings → Providers → Test)
- Review logs:
journalctl -u llm-proxy -f
License
MIT OR Apache-2.0