- Ensure assistant tool calls always have content: "" instead of null. - Add debug-level logging of the full request body when DeepSeek returns 400. - Improved R1 placeholder injection to include content sanitation.
LLM Proxy Gateway
A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok, Ollama) with built-in token tracking, real-time cost calculation, multi-user authentication, and a management dashboard.
Features
- Unified API: OpenAI-compatible
/v1/chat/completionsand/v1/modelsendpoints. - Multi-Provider Support:
- OpenAI: GPT-4o, GPT-4o Mini, o1, o3 reasoning models.
- Google Gemini: Gemini 2.0 Flash, Pro, and vision models.
- DeepSeek: DeepSeek Chat and Reasoner models.
- xAI Grok: Grok-beta models.
- Ollama: Local LLMs running on your network.
- Observability & Tracking:
- Real-time Costing: Fetches live pricing and context specs from
models.devon startup. - Token Counting: Precise estimation using
tiktoken-rs. - Database Logging: Every request logged to SQLite for historical analysis.
- Streaming Support: Full SSE (Server-Sent Events) with
[DONE]termination for client compatibility.
- Real-time Costing: Fetches live pricing and context specs from
- Multimodal (Vision): Image processing (Base64 and remote URLs) across compatible providers.
- Multi-User Access Control:
- Admin Role: Full access to all dashboard features, user management, and system configuration.
- Viewer Role: Read-only access to usage analytics, costs, and monitoring.
- Client API Keys: Create and manage multiple client tokens for external integrations.
- Reliability:
- Circuit Breaking: Automatically protects when providers are down.
- Rate Limiting: Per-client and global rate limits.
- Cache-Aware Costing: Tracks cache hit/miss tokens for accurate billing.
Tech Stack
- Runtime: Rust with Tokio.
- Web Framework: Axum.
- Database: SQLx with SQLite.
- Frontend: Vanilla JS/CSS with Chart.js for visualizations.
Getting Started
Prerequisites
- Rust (1.80+)
- SQLite3
Quick Start
-
Clone and build:
git clone ssh://git.dustin.coffee:2222/hobokenchicken/llm-proxy.git cd llm-proxy cargo build --release -
Configure environment:
cp .env.example .env # Edit .env and add your API keys: # OPENAI_API_KEY=sk-... # GEMINI_API_KEY=AIza... # DEEPSEEK_API_KEY=sk-... # GROK_API_KEY=gk-... (optional) -
Run the proxy:
cargo run --release
The server starts on http://localhost:8080 by default.
Configuration
Edit config.toml to customize providers, models, and settings:
[server]
port = 8080
host = "0.0.0.0"
[database]
path = "./data/llm_proxy.db"
[providers.openai]
enabled = true
default_model = "gpt-4o"
[providers.gemini]
enabled = true
default_model = "gemini-2.0-flash"
[providers.deepseek]
enabled = true
default_model = "deepseek-reasoner"
[providers.grok]
enabled = false
default_model = "grok-beta"
[providers.ollama]
enabled = false
base_url = "http://localhost:11434/v1"
Management Dashboard
Access the dashboard at http://localhost:8080:
- Overview: Real-time request counters, system health, provider status.
- Analytics: Time-series charts, filterable by date, client, provider, and model.
- Costs: Budget tracking, cost breakdown by provider/client/model, projections.
- Clients: Create, revoke, and rotate API tokens; per-client usage stats.
- Providers: Enable/disable providers, test connections, configure API keys.
- Monitoring: Live request stream via WebSocket, response times, error rates.
- Users: Admin/user management with role-based access control.
Default Credentials
- Username:
admin - Password:
admin123
Change the admin password in the dashboard after first login!
API Usage
The proxy is a drop-in replacement for OpenAI. Configure your client:
Python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="YOUR_CLIENT_API_KEY" # Create in dashboard
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
Open WebUI
API Base URL: http://your-server:8080/v1
API Key: YOUR_CLIENT_API_KEY
cURL
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_CLIENT_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": false
}'
Model Discovery
The proxy exposes /v1/models for OpenAI-compatible client model discovery:
curl http://localhost:8080/v1/models \
-H "Authorization: Bearer YOUR_CLIENT_API_KEY"
Troubleshooting
Streaming Issues
If clients timeout or show "TransferEncodingError", ensure:
- Proxy buffering is disabled in nginx:
proxy_buffering off; - Chunked transfer is enabled:
chunked_transfer_encoding on; - Timeouts are sufficient:
proxy_read_timeout 7200s;
Provider Errors
- Check API keys are set in
.env - Test provider in dashboard (Settings → Providers → Test)
- Review logs:
journalctl -u llm-proxy -f
License
MIT OR Apache-2.0