diff --git a/README.md b/README.md new file mode 100644 index 00000000..fbecffb5 --- /dev/null +++ b/README.md @@ -0,0 +1,91 @@ +# LLM Proxy Gateway + +A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok) with built-in token tracking, real-time cost calculation, and a management dashboard. + +## 🚀 Features + +- **Unified API:** Fully OpenAI-compatible `/v1/chat/completions` endpoint. +- **Multi-Provider Support:** + * **OpenAI:** Standard models (GPT-4o, GPT-3.5, etc.) and reasoning models (o1, o3). + * **Google Gemini:** Support for the latest Gemini 2.0 models. + * **DeepSeek:** High-performance, low-cost integration. + * **xAI Grok:** Integration for Grok-series models. +- **Observability & Tracking:** + * **Real-time Costing:** Fetches live pricing and context specs from `models.dev` on startup. + * **Token Counting:** Precise estimation using `tiktoken-rs`. + * **Database Logging:** Every request is logged to SQLite for historical analysis. + * **Streaming Support:** Full SSE (Server-Sent Events) support with aggregated token tracking. +- **Multimodal (Vision):** Support for image processing (Base64 and remote URLs) across compatible providers. +- **Reliability:** + * **Circuit Breaking:** Automatically protects your system when providers are down. + * **Rate Limiting:** Granular per-client and global rate limits. +- **Management Dashboard:** A modern, real-time web interface to monitor usage, costs, and system health. + +## 🛠️ Tech Stack + +- **Runtime:** Rust (2024 Edition) with [Tokio](https://tokio.rs/). +- **Web Framework:** [Axum](https://github.com/tokio-rs/axum). +- **Database:** [SQLx](https://github.com/launchbadge/sqlx) with SQLite. +- **Frontend:** Vanilla JS/CSS (no heavyweight framework required). + +## 🚦 Getting Started + +### Prerequisites + +- Rust (1.80+) +- SQLite3 + +### Installation + +1. Clone the repository: + ```bash + git clone https://github.com/yourusername/llm-proxy.git + cd llm-proxy + ``` + +2. Set up your environment: + ```bash + cp .env.example .env + # Edit .env and add your API keys + ``` + +3. Configure providers and server: + Edit `config.toml` to customize models, pricing fallbacks, and port settings. + +4. Run the proxy: + ```bash + cargo run --release + ``` + +The server will start at `http://localhost:3000` (by default). + +## 📊 Management Dashboard + +Access the built-in dashboard at `http://localhost:3000` to see: +- **Usage Summary:** Total requests, tokens, and USD spent. +- **Trend Charts:** 24-hour request and cost distributions. +- **Live Logs:** Real-time stream of incoming LLM requests via WebSockets. +- **Provider Health:** Monitor which providers are online or degraded. + +## 🔌 API Usage + +The proxy is designed to be a drop-in replacement for OpenAI. Simply change your base URL: + +**Example Request (Python):** +```python +from openai import OpenAI + +client = OpenAI( + base_url="http://localhost:3000/v1", + api_key="your-proxy-client-id" # Hashed sk- keys are managed in the dashboard +) + +response = client.chat.completions.create( + model="gpt-4o", + messages=[{"role": "user", "content": "Hello from the proxy!"}] +) +``` + +## ⚖️ License + +MIT OR Apache-2.0