232f092f27095fc37d552646f1b110dbe8f3cc63
The model registry from models.dev uses 'google' and 'xai' as provider IDs, but internal providers use 'gemini' and 'grok'. Added mapping so all provider models appear in the listing.
LLM Proxy Gateway
A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok) with built-in token tracking, real-time cost calculation, and a management dashboard.
🚀 Features
- Unified API: Fully OpenAI-compatible
/v1/chat/completionsendpoint. - Multi-Provider Support:
- OpenAI: Standard models (GPT-4o, GPT-3.5, etc.) and reasoning models (o1, o3).
- Google Gemini: Support for the latest Gemini 2.0 models.
- DeepSeek: High-performance, low-cost integration.
- xAI Grok: Integration for Grok-series models.
- Ollama: Support for local LLMs running on your machine or another host.
- Observability & Tracking:
- Real-time Costing: Fetches live pricing and context specs from
models.devon startup. - Token Counting: Precise estimation using
tiktoken-rs. - Database Logging: Every request is logged to SQLite for historical analysis.
- Streaming Support: Full SSE (Server-Sent Events) support with aggregated token tracking.
- Real-time Costing: Fetches live pricing and context specs from
- Multimodal (Vision): Support for image processing (Base64 and remote URLs) across compatible providers.
- Reliability:
- Circuit Breaking: Automatically protects your system when providers are down.
- Rate Limiting: Granular per-client and global rate limits.
- Management Dashboard: A modern, real-time web interface to monitor usage, costs, and system health.
🛠️ Tech Stack
- Runtime: Rust (2024 Edition) with Tokio.
- Web Framework: Axum.
- Database: SQLx with SQLite.
- Frontend: Vanilla JS/CSS (no heavyweight framework required).
🚦 Getting Started
Prerequisites
- Rust (1.80+)
- SQLite3
Installation
-
Clone the repository:
git clone https://github.com/yourusername/llm-proxy.git cd llm-proxy -
Set up your environment:
cp .env.example .env # Edit .env and add your API keys -
Configure providers and server: Edit
config.tomlto customize models, pricing fallbacks, and port settings.Ollama Example (config.toml):
[providers.ollama] enabled = true base_url = "http://192.168.1.50:11434/v1" models = ["llama3", "mistral"] -
Run the proxy:
cargo run --release
The server will start at http://localhost:3000 (by default).
📊 Management Dashboard
Access the built-in dashboard at http://localhost:3000 to see:
- Usage Summary: Total requests, tokens, and USD spent.
- Trend Charts: 24-hour request and cost distributions.
- Live Logs: Real-time stream of incoming LLM requests via WebSockets.
- Provider Health: Monitor which providers are online or degraded.
🔌 API Usage
The proxy is designed to be a drop-in replacement for OpenAI. Simply change your base URL:
Example Request (Python):
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:3000/v1",
api_key="your-proxy-client-id" # Hashed sk- keys are managed in the dashboard
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello from the proxy!"}]
)
⚖️ License
MIT OR Apache-2.0
Description
Languages
JavaScript
54.7%
Go
34.6%
CSS
8.3%
HTML
2.2%
Dockerfile
0.2%