4.5 KiB
LLM Proxy Gateway
A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok, Ollama) with built-in token tracking, real-time cost calculation, multi-user authentication, and a management dashboard.
Features
- Unified API: OpenAI-compatible
/v1/chat/completionsand/v1/modelsendpoints. - Multi-Provider Support:
- OpenAI: GPT-4o, GPT-4o Mini, o1, o3 reasoning models.
- Google Gemini: Gemini 2.0 Flash, Pro, and vision models.
- DeepSeek: DeepSeek Chat and Reasoner models.
- xAI Grok: Grok-beta models.
- Ollama: Local LLMs running on your network.
- Observability & Tracking:
- Real-time Costing: Fetches live pricing and context specs from
models.devon startup. - Token Counting: Precise estimation using
tiktoken-rs. - Database Logging: Every request logged to SQLite for historical analysis.
- Streaming Support: Full SSE (Server-Sent Events) with
[DONE]termination for client compatibility.
- Real-time Costing: Fetches live pricing and context specs from
- Multimodal (Vision): Image processing (Base64 and remote URLs) across compatible providers.
- Multi-User Access Control:
- Admin Role: Full access to all dashboard features, user management, and system configuration.
- Viewer Role: Read-only access to usage analytics, costs, and monitoring.
- Client API Keys: Create and manage multiple client tokens for external integrations.
- Reliability:
- Circuit Breaking: Automatically protects when providers are down.
- Rate Limiting: Per-client and global rate limits.
- Cache-Aware Costing: Tracks cache hit/miss tokens for accurate billing.
Security
LLM Proxy is designed with security in mind:
- HMAC Session Tokens: Management dashboard sessions are secured using HMAC-SHA256 signed tokens.
- Encrypted Provider Keys: Sensitive LLM provider API keys are stored encrypted (AES-256-GCM) in the database.
- Session Refresh: Activity-based session extension prevents session hijacking while maintaining user convenience.
- XSS Prevention: Standardized frontend escaping using
window.api.escapeHtml.
Note: You must define a SESSION_SECRET in your .env file for secure session signing.
Tech Stack
- Runtime: Rust with Tokio.
- Web Framework: Axum.
- Database: SQLx with SQLite.
- Frontend: Vanilla JS/CSS with Chart.js for visualizations.
Getting Started
Prerequisites
- Rust (1.80+)
- SQLite3
- Docker (optional, for containerized deployment)
Quick Start
-
Clone and build:
git clone ssh://git.dustin.coffee:2222/hobokenchicken/llm-proxy.git cd llm-proxy cargo build --release -
Configure environment:
cp .env.example .env # Edit .env and add your API keys: # SESSION_SECRET=... (Generate a strong random secret) # OPENAI_API_KEY=sk-... # GEMINI_API_KEY=AIza... -
Run the proxy:
cargo run --release
The server starts on http://localhost:8080 by default.
Deployment (Docker)
A multi-stage Dockerfile is provided for efficient deployment:
# Build the container
docker build -t llm-proxy .
# Run the container
docker run -p 8080:8080 \
-e SESSION_SECRET=your-secure-secret \
-v ./data:/app/data \
llm-proxy
Management Dashboard
Access the dashboard at http://localhost:8080. The dashboard architecture has been refactored into modular sub-components for better maintainability:
- Auth (
/api/auth): Login, session management, and password changes. - Usage (
/api/usage): Summary stats, time-series analytics, and provider breakdown. - Clients (
/api/clients): API key management and per-client usage tracking. - Providers (
/api/providers): Provider configuration, status monitoring, and connection testing. - System (
/api/system): Health metrics, live logs, database backups, and global settings. - Monitoring: Live request stream via WebSocket.
Default Credentials
- Username:
admin - Password:
admin123
Change the admin password in the dashboard after first login!
API Usage
The proxy is a drop-in replacement for OpenAI. Configure your client:
Python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="YOUR_CLIENT_API_KEY" # Create in dashboard
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
License
MIT OR Apache-2.0