A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok) with built-in token tracking, real-time cost calculation, and a management dashboard.

🚀 Features

Unified API: Fully OpenAI-compatible /v1/chat/completions endpoint.
Multi-Provider Support:
- OpenAI: Standard models (GPT-4o, GPT-3.5, etc.) and reasoning models (o1, o3).
- Google Gemini: Support for the latest Gemini 2.0 models.
- DeepSeek: High-performance, low-cost integration.
- xAI Grok: Integration for Grok-series models.
- Ollama: Support for local LLMs running on your machine or another host.
Observability & Tracking:
- Real-time Costing: Fetches live pricing and context specs from models.dev on startup.
- Token Counting: Precise estimation using tiktoken-rs.
- Database Logging: Every request is logged to SQLite for historical analysis.
- Streaming Support: Full SSE (Server-Sent Events) support with aggregated token tracking.
Multimodal (Vision): Support for image processing (Base64 and remote URLs) across compatible providers.
Reliability:
- Circuit Breaking: Automatically protects your system when providers are down.
- Rate Limiting: Granular per-client and global rate limits.
Management Dashboard: A modern, real-time web interface to monitor usage, costs, and system health.

🛠️ Tech Stack

Runtime: Rust (2024 Edition) with Tokio.
Web Framework: Axum.
Database: SQLx with SQLite.
Frontend: Vanilla JS/CSS (no heavyweight framework required).

🚦 Getting Started

Prerequisites

Rust (1.80+)
SQLite3

Installation

Clone the repository:

git clone https://github.com/yourusername/llm-proxy.git
cd llm-proxy

Set up your environment:

cp .env.example .env
# Edit .env and add your API keys

Configure providers and server: Edit config.toml to customize models, pricing fallbacks, and port settings.

Ollama Example (config.toml):
```
[providers.ollama]
enabled = true
base_url = "http://192.168.1.50:11434/v1"
models = ["llama3", "mistral"]
```
Run the proxy:
```
cargo run --release
```

The server will start at http://localhost:3000 (by default).

📊 Management Dashboard

Access the built-in dashboard at http://localhost:3000 to see:

Usage Summary: Total requests, tokens, and USD spent.
Trend Charts: 24-hour request and cost distributions.
Live Logs: Real-time stream of incoming LLM requests via WebSockets.
Provider Health: Monitor which providers are online or degraded.

🔌 API Usage

The proxy is designed to be a drop-in replacement for OpenAI. Simply change your base URL:

Example Request (Python):

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/v1",
    api_key="your-proxy-client-id" # Hashed sk- keys are managed in the dashboard
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello from the proxy!"}]
)

⚖️ License

MIT OR Apache-2.0

Languages

JavaScript 54.7%

Go 34.6%

CSS 8.3%

HTML 2.2%

Dockerfile 0.2%