# LLM Proxy Gateway A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok, Ollama) with built-in token tracking, real-time cost calculation, multi-user authentication, and a management dashboard. ## Features - **Unified API:** OpenAI-compatible `/v1/chat/completions` and `/v1/models` endpoints. - **Multi-Provider Support:** - **OpenAI:** GPT-4o, GPT-4o Mini, o1, o3 reasoning models. - **Google Gemini:** Gemini 2.0 Flash, Pro, and vision models. - **DeepSeek:** DeepSeek Chat and Reasoner models. - **xAI Grok:** Grok-beta models. - **Ollama:** Local LLMs running on your network. - **Observability & Tracking:** - **Real-time Costing:** Fetches live pricing and context specs from `models.dev` on startup. - **Token Counting:** Precise estimation using `tiktoken-rs`. - **Database Logging:** Every request logged to SQLite for historical analysis. - **Streaming Support:** Full SSE (Server-Sent Events) with `[DONE]` termination for client compatibility. - **Multimodal (Vision):** Image processing (Base64 and remote URLs) across compatible providers. - **Multi-User Access Control:** - **Admin Role:** Full access to all dashboard features, user management, and system configuration. - **Viewer Role:** Read-only access to usage analytics, costs, and monitoring. - **Client API Keys:** Create and manage multiple client tokens for external integrations. - **Reliability:** - **Circuit Breaking:** Automatically protects when providers are down. - **Rate Limiting:** Per-client and global rate limits. - **Cache-Aware Costing:** Tracks cache hit/miss tokens for accurate billing. ## Tech Stack - **Runtime:** Rust with Tokio. - **Web Framework:** Axum. - **Database:** SQLx with SQLite. - **Frontend:** Vanilla JS/CSS with Chart.js for visualizations. ## Getting Started ### Prerequisites - Rust (1.80+) - SQLite3 ### Quick Start 1. Clone and build: ```bash git clone ssh://git.dustin.coffee:2222/hobokenchicken/llm-proxy.git cd llm-proxy cargo build --release ``` 2. Configure environment: ```bash cp .env.example .env # Edit .env and add your API keys: # OPENAI_API_KEY=sk-... # GEMINI_API_KEY=AIza... # DEEPSEEK_API_KEY=sk-... # GROK_API_KEY=gk-... (optional) ``` 3. Run the proxy: ```bash cargo run --release ``` The server starts on `http://localhost:8080` by default. ### Configuration Edit `config.toml` to customize providers, models, and settings: ```toml [server] port = 8080 host = "0.0.0.0" [database] path = "./data/llm_proxy.db" [providers.openai] enabled = true default_model = "gpt-4o" [providers.gemini] enabled = true default_model = "gemini-2.0-flash" [providers.deepseek] enabled = true default_model = "deepseek-reasoner" [providers.grok] enabled = false default_model = "grok-beta" [providers.ollama] enabled = false base_url = "http://localhost:11434/v1" ``` ## Management Dashboard Access the dashboard at `http://localhost:8080`: - **Overview:** Real-time request counters, system health, provider status. - **Analytics:** Time-series charts, filterable by date, client, provider, and model. - **Costs:** Budget tracking, cost breakdown by provider/client/model, projections. - **Clients:** Create, revoke, and rotate API tokens; per-client usage stats. - **Providers:** Enable/disable providers, test connections, configure API keys. - **Monitoring:** Live request stream via WebSocket, response times, error rates. - **Users:** Admin/user management with role-based access control. ### Default Credentials - **Username:** `admin` - **Password:** `admin123` Change the admin password in the dashboard after first login! ## API Usage The proxy is a drop-in replacement for OpenAI. Configure your client: ### Python ```python from openai import OpenAI client = OpenAI( base_url="http://localhost:8080/v1", api_key="YOUR_CLIENT_API_KEY" # Create in dashboard ) response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] ) ``` ### Open WebUI ``` API Base URL: http://your-server:8080/v1 API Key: YOUR_CLIENT_API_KEY ``` ### cURL ```bash curl -X POST http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_CLIENT_API_KEY" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}], "stream": false }' ``` ## Model Discovery The proxy exposes `/v1/models` for OpenAI-compatible client model discovery: ```bash curl http://localhost:8080/v1/models \ -H "Authorization: Bearer YOUR_CLIENT_API_KEY" ``` ## Troubleshooting ### Streaming Issues If clients timeout or show "TransferEncodingError", ensure: 1. Proxy buffering is disabled in nginx: `proxy_buffering off;` 2. Chunked transfer is enabled: `chunked_transfer_encoding on;` 3. Timeouts are sufficient: `proxy_read_timeout 7200s;` ### Provider Errors - Check API keys are set in `.env` - Test provider in dashboard (Settings → Providers → Test) - Review logs: `journalctl -u llm-proxy -f` ## License MIT OR Apache-2.0