# LLM Proxy Gateway A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok) with built-in token tracking, real-time cost calculation, and a management dashboard. ## 🚀 Features - **Unified API:** Fully OpenAI-compatible `/v1/chat/completions` endpoint. - **Multi-Provider Support:** * **OpenAI:** Standard models (GPT-4o, GPT-3.5, etc.) and reasoning models (o1, o3). * **Google Gemini:** Support for the latest Gemini 2.0 models. * **DeepSeek:** High-performance, low-cost integration. * **xAI Grok:** Integration for Grok-series models. * **Ollama:** Support for local LLMs running on your machine or another host. - **Observability & Tracking:** * **Real-time Costing:** Fetches live pricing and context specs from `models.dev` on startup. * **Token Counting:** Precise estimation using `tiktoken-rs`. * **Database Logging:** Every request is logged to SQLite for historical analysis. * **Streaming Support:** Full SSE (Server-Sent Events) support with aggregated token tracking. - **Multimodal (Vision):** Support for image processing (Base64 and remote URLs) across compatible providers. - **Reliability:** * **Circuit Breaking:** Automatically protects your system when providers are down. * **Rate Limiting:** Granular per-client and global rate limits. - **Management Dashboard:** A modern, real-time web interface to monitor usage, costs, and system health. ## 🛠️ Tech Stack - **Runtime:** Rust (2024 Edition) with [Tokio](https://tokio.rs/). - **Web Framework:** [Axum](https://github.com/tokio-rs/axum). - **Database:** [SQLx](https://github.com/launchbadge/sqlx) with SQLite. - **Frontend:** Vanilla JS/CSS (no heavyweight framework required). ## 🚦 Getting Started ### Prerequisites - Rust (1.80+) - SQLite3 ### Installation 1. Clone the repository: ```bash git clone https://github.com/yourusername/llm-proxy.git cd llm-proxy ``` 2. Set up your environment: ```bash cp .env.example .env # Edit .env and add your API keys ``` 3. Configure providers and server: Edit `config.toml` to customize models, pricing fallbacks, and port settings. **Ollama Example (config.toml):** ```toml [providers.ollama] enabled = true base_url = "http://192.168.1.50:11434/v1" models = ["llama3", "mistral"] ``` 4. Run the proxy: ```bash cargo run --release ``` The server will start at `http://localhost:3000` (by default). ## 📊 Management Dashboard Access the built-in dashboard at `http://localhost:3000` to see: - **Usage Summary:** Total requests, tokens, and USD spent. - **Trend Charts:** 24-hour request and cost distributions. - **Live Logs:** Real-time stream of incoming LLM requests via WebSockets. - **Provider Health:** Monitor which providers are online or degraded. ## 🔌 API Usage The proxy is designed to be a drop-in replacement for OpenAI. Simply change your base URL: **Example Request (Python):** ```python from openai import OpenAI client = OpenAI( base_url="http://localhost:3000/v1", api_key="your-proxy-client-id" # Hashed sk- keys are managed in the dashboard ) response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello from the proxy!"}] ) ``` ## ⚖️ License MIT OR Apache-2.0