371e1f2e776adf4388ae2bdb00f6eb817db154cf
LLM Proxy Gateway
A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok) with built-in token tracking, real-time cost calculation, and a management dashboard.
🚀 Features
- Unified API: Fully OpenAI-compatible
/v1/chat/completionsendpoint. - Multi-Provider Support:
- OpenAI: Standard models (GPT-4o, GPT-3.5, etc.) and reasoning models (o1, o3).
- Google Gemini: Support for the latest Gemini 2.0 models.
- DeepSeek: High-performance, low-cost integration.
- xAI Grok: Integration for Grok-series models.
- Ollama: Support for local LLMs running on your machine or another host.
- Observability & Tracking:
- Real-time Costing: Fetches live pricing and context specs from
models.devon startup. - Token Counting: Precise estimation using
tiktoken-rs. - Database Logging: Every request is logged to SQLite for historical analysis.
- Streaming Support: Full SSE (Server-Sent Events) support with aggregated token tracking.
- Real-time Costing: Fetches live pricing and context specs from
- Multimodal (Vision): Support for image processing (Base64 and remote URLs) across compatible providers.
- Reliability:
- Circuit Breaking: Automatically protects your system when providers are down.
- Rate Limiting: Granular per-client and global rate limits.
- Management Dashboard: A modern, real-time web interface to monitor usage, costs, and system health.
🛠️ Tech Stack
- Runtime: Rust (2024 Edition) with Tokio.
- Web Framework: Axum.
- Database: SQLx with SQLite.
- Frontend: Vanilla JS/CSS (no heavyweight framework required).
🚦 Getting Started
Prerequisites
- Rust (1.80+)
- SQLite3
Installation
-
Clone the repository:
git clone https://github.com/yourusername/llm-proxy.git cd llm-proxy -
Set up your environment:
cp .env.example .env # Edit .env and add your API keys -
Configure providers and server: Edit
config.tomlto customize models, pricing fallbacks, and port settings.Ollama Example (config.toml):
[providers.ollama] enabled = true base_url = "http://192.168.1.50:11434/v1" models = ["llama3", "mistral"] -
Run the proxy:
cargo run --release
The server will start at http://localhost:3000 (by default).
📊 Management Dashboard
Access the built-in dashboard at http://localhost:3000 to see:
- Usage Summary: Total requests, tokens, and USD spent.
- Trend Charts: 24-hour request and cost distributions.
- Live Logs: Real-time stream of incoming LLM requests via WebSockets.
- Provider Health: Monitor which providers are online or degraded.
🔌 API Usage
The proxy is designed to be a drop-in replacement for OpenAI. Simply change your base URL:
Example Request (Python):
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:3000/v1",
api_key="your-proxy-client-id" # Hashed sk- keys are managed in the dashboard
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello from the proxy!"}]
)
⚖️ License
MIT OR Apache-2.0
Description
Languages
JavaScript
54.7%
Go
34.6%
CSS
8.3%
HTML
2.2%
Dockerfile
0.2%