diff --git a/DASHBOARD_README.md b/DASHBOARD_README.md index f72394fd..cbcf7ebe 100644 --- a/DASHBOARD_README.md +++ b/DASHBOARD_README.md @@ -6,37 +6,43 @@ This is a comprehensive admin dashboard for the LLM Proxy Gateway, providing rea ## Features -### 1. **Dashboard Overview** +### 1. Dashboard Overview - Real-time request counters and statistics - System health indicators - Provider status monitoring - Recent requests stream -### 2. **Usage Analytics** +### 2. Usage Analytics - Time series charts for requests, tokens, and costs - Filter by date range, client, provider, and model - Top clients and models analysis - Export functionality to CSV/JSON -### 3. **Cost Management** +### 3. Cost Management - Cost breakdown by provider, client, and model - Budget tracking with alerts - Cost projections - Pricing configuration management -### 4. **Client Management** +### 4. Client Management - List, create, revoke, and rotate API tokens - Client-specific rate limits - Usage statistics per client - Token management interface -### 5. **Provider Configuration** +### 5. Provider Configuration - Enable/disable LLM providers - Configure API keys (masked display) - Test provider connections - Model availability management -### 6. **Real-time Monitoring** +### 6. User Management (RBAC) +- **Admin Role:** Full access to all dashboard features, user management, system configuration +- **Viewer Role:** Read-only access to usage analytics, costs, and monitoring +- Create/manage dashboard users with role assignment +- Secure password management + +### 7. Real-time Monitoring - Live request stream via WebSocket - System metrics dashboard - Response time and error rate tracking @@ -99,9 +105,16 @@ http://localhost:8080 ### Clients - `GET /api/clients` - List all clients - `POST /api/clients` - Create new client +- `PUT /api/clients/{id}` - Update client - `DELETE /api/clients/{id}` - Revoke client - `GET /api/clients/{id}/usage` - Client-specific usage +### Users (RBAC) +- `GET /api/users` - List all dashboard users +- `POST /api/users` - Create new user +- `PUT /api/users/{id}` - Update user (admin only) +- `DELETE /api/users/{id}` - Delete user (admin only) + ### Providers - `GET /api/providers` - List providers and status - `PUT /api/providers/{name}` - Update provider config diff --git a/README.md b/README.md index 299b98e0..32e29c45 100644 --- a/README.md +++ b/README.md @@ -1,100 +1,182 @@ # LLM Proxy Gateway -A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok) with built-in token tracking, real-time cost calculation, and a management dashboard. +A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok, Ollama) with built-in token tracking, real-time cost calculation, multi-user authentication, and a management dashboard. -## 🚀 Features +## Features -- **Unified API:** Fully OpenAI-compatible `/v1/chat/completions` endpoint. -- **Multi-Provider Support:** - * **OpenAI:** Standard models (GPT-4o, GPT-3.5, etc.) and reasoning models (o1, o3). - * **Google Gemini:** Support for the latest Gemini 2.0 models. - * **DeepSeek:** High-performance, low-cost integration. - * **xAI Grok:** Integration for Grok-series models. - * **Ollama:** Support for local LLMs running on your machine or another host. -- **Observability & Tracking:** - * **Real-time Costing:** Fetches live pricing and context specs from `models.dev` on startup. - * **Token Counting:** Precise estimation using `tiktoken-rs`. - * **Database Logging:** Every request is logged to SQLite for historical analysis. - * **Streaming Support:** Full SSE (Server-Sent Events) support with aggregated token tracking. -- **Multimodal (Vision):** Support for image processing (Base64 and remote URLs) across compatible providers. -- **Reliability:** - * **Circuit Breaking:** Automatically protects your system when providers are down. - * **Rate Limiting:** Granular per-client and global rate limits. -- **Management Dashboard:** A modern, real-time web interface to monitor usage, costs, and system health. +- **Unified API:** OpenAI-compatible `/v1/chat/completions` and `/v1/models` endpoints. +- **Multi-Provider Support:** + - **OpenAI:** GPT-4o, GPT-4o Mini, o1, o3 reasoning models. + - **Google Gemini:** Gemini 2.0 Flash, Pro, and vision models. + - **DeepSeek:** DeepSeek Chat and Reasoner models. + - **xAI Grok:** Grok-beta models. + - **Ollama:** Local LLMs running on your network. +- **Observability & Tracking:** + - **Real-time Costing:** Fetches live pricing and context specs from `models.dev` on startup. + - **Token Counting:** Precise estimation using `tiktoken-rs`. + - **Database Logging:** Every request logged to SQLite for historical analysis. + - **Streaming Support:** Full SSE (Server-Sent Events) with `[DONE]` termination for client compatibility. +- **Multimodal (Vision):** Image processing (Base64 and remote URLs) across compatible providers. +- **Multi-User Access Control:** + - **Admin Role:** Full access to all dashboard features, user management, and system configuration. + - **Viewer Role:** Read-only access to usage analytics, costs, and monitoring. + - **Client API Keys:** Create and manage multiple client tokens for external integrations. +- **Reliability:** + - **Circuit Breaking:** Automatically protects when providers are down. + - **Rate Limiting:** Per-client and global rate limits. + - **Cache-Aware Costing:** Tracks cache hit/miss tokens for accurate billing. -## 🛠️ Tech Stack +## Tech Stack -- **Runtime:** Rust (2024 Edition) with [Tokio](https://tokio.rs/). -- **Web Framework:** [Axum](https://github.com/tokio-rs/axum). -- **Database:** [SQLx](https://github.com/launchbadge/sqlx) with SQLite. -- **Frontend:** Vanilla JS/CSS (no heavyweight framework required). +- **Runtime:** Rust with Tokio. +- **Web Framework:** Axum. +- **Database:** SQLx with SQLite. +- **Frontend:** Vanilla JS/CSS with Chart.js for visualizations. -## 🚦 Getting Started +## Getting Started ### Prerequisites -- Rust (1.80+) -- SQLite3 +- Rust (1.80+) +- SQLite3 -### Installation +### Quick Start -1. Clone the repository: - ```bash - git clone https://github.com/yourusername/llm-proxy.git - cd llm-proxy - ``` +1. Clone and build: + ```bash + git clone ssh://git.dustin.coffee:2222/hobokenchicken/llm-proxy.git + cd llm-proxy + cargo build --release + ``` -2. Set up your environment: - ```bash - cp .env.example .env - # Edit .env and add your API keys - ``` +2. Configure environment: + ```bash + cp .env.example .env + # Edit .env and add your API keys: + # OPENAI_API_KEY=sk-... + # GEMINI_API_KEY=AIza... + # DEEPSEEK_API_KEY=sk-... + # GROK_API_KEY=gk-... (optional) + ``` -3. Configure providers and server: - Edit `config.toml` to customize models, pricing fallbacks, and port settings. +3. Run the proxy: + ```bash + cargo run --release + ``` - **Ollama Example (config.toml):** - ```toml - [providers.ollama] - enabled = true - base_url = "http://192.168.1.50:11434/v1" - models = ["llama3", "mistral"] - ``` +The server starts on `http://localhost:8080` by default. -4. Run the proxy: - ```bash - cargo run --release - ``` +### Configuration -The server will start at `http://localhost:3000` (by default). +Edit `config.toml` to customize providers, models, and settings: -## 📊 Management Dashboard +```toml +[server] +port = 8080 +host = "0.0.0.0" -Access the built-in dashboard at `http://localhost:3000` to see: -- **Usage Summary:** Total requests, tokens, and USD spent. -- **Trend Charts:** 24-hour request and cost distributions. -- **Live Logs:** Real-time stream of incoming LLM requests via WebSockets. -- **Provider Health:** Monitor which providers are online or degraded. +[database] +path = "./data/llm_proxy.db" -## 🔌 API Usage +[providers.openai] +enabled = true +default_model = "gpt-4o" -The proxy is designed to be a drop-in replacement for OpenAI. Simply change your base URL: +[providers.gemini] +enabled = true +default_model = "gemini-2.0-flash" -**Example Request (Python):** +[providers.deepseek] +enabled = true +default_model = "deepseek-reasoner" + +[providers.grok] +enabled = false +default_model = "grok-beta" + +[providers.ollama] +enabled = false +base_url = "http://localhost:11434/v1" +``` + +## Management Dashboard + +Access the dashboard at `http://localhost:8080`: + +- **Overview:** Real-time request counters, system health, provider status. +- **Analytics:** Time-series charts, filterable by date, client, provider, and model. +- **Costs:** Budget tracking, cost breakdown by provider/client/model, projections. +- **Clients:** Create, revoke, and rotate API tokens; per-client usage stats. +- **Providers:** Enable/disable providers, test connections, configure API keys. +- **Monitoring:** Live request stream via WebSocket, response times, error rates. +- **Users:** Admin/user management with role-based access control. + +### Default Credentials + +- **Username:** `admin` +- **Password:** `admin123` + +Change the admin password in the dashboard after first login! + +## API Usage + +The proxy is a drop-in replacement for OpenAI. Configure your client: + +### Python ```python from openai import OpenAI client = OpenAI( - base_url="http://localhost:3000/v1", - api_key="your-proxy-client-id" # Hashed sk- keys are managed in the dashboard + base_url="http://localhost:8080/v1", + api_key="YOUR_CLIENT_API_KEY" # Create in dashboard ) response = client.chat.completions.create( model="gpt-4o", - messages=[{"role": "user", "content": "Hello from the proxy!"}] + messages=[{"role": "user", "content": "Hello!"}] ) ``` -## ⚖️ License +### Open WebUI +``` +API Base URL: http://your-server:8080/v1 +API Key: YOUR_CLIENT_API_KEY +``` + +### cURL +```bash +curl -X POST http://localhost:8080/v1/chat/completions \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer YOUR_CLIENT_API_KEY" \ + -d '{ + "model": "gpt-4o", + "messages": [{"role": "user", "content": "Hello!"}], + "stream": false + }' +``` + +## Model Discovery + +The proxy exposes `/v1/models` for OpenAI-compatible client model discovery: + +```bash +curl http://localhost:8080/v1/models \ + -H "Authorization: Bearer YOUR_CLIENT_API_KEY" +``` + +## Troubleshooting + +### Streaming Issues +If clients timeout or show "TransferEncodingError", ensure: +1. Proxy buffering is disabled in nginx: `proxy_buffering off;` +2. Chunked transfer is enabled: `chunked_transfer_encoding on;` +3. Timeouts are sufficient: `proxy_read_timeout 7200s;` + +### Provider Errors +- Check API keys are set in `.env` +- Test provider in dashboard (Settings → Providers → Test) +- Review logs: `journalctl -u llm-proxy -f` + +## License MIT OR Apache-2.0 diff --git a/deployment.md b/deployment.md index adeb1fea..e8a57711 100644 --- a/deployment.md +++ b/deployment.md @@ -118,6 +118,9 @@ default_model = "grok-beta" ``` ## Nginx Reverse Proxy Configuration + +**Important for SSE/Streaming:** Disable buffering and configure timeouts for proper SSE support. + ```nginx server { listen 80; @@ -126,12 +129,22 @@ server { location / { proxy_pass http://localhost:8080; proxy_http_version 1.1; - proxy_set_header Upgrade $http_upgrade; - proxy_set_header Connection 'upgrade'; - proxy_set_header Host $host; - proxy_cache_bypass $http_upgrade; - # WebSocket support + # SSE/Streaming support + proxy_buffering off; + chunked_transfer_encoding on; + proxy_set_header Connection ''; + + # Timeouts for long-running streams + proxy_connect_timeout 7200s; + proxy_read_timeout 7200s; + proxy_send_timeout 7200s; + + # Disable gzip for streaming + gzip off; + + # Headers + proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; @@ -144,6 +157,21 @@ server { } ``` +### NGINX Proxy Manager + +If using NGINX Proxy Manager, add this to **Advanced Settings**: + +```nginx +proxy_buffering off; +proxy_http_version 1.1; +proxy_set_header Connection ''; +chunked_transfer_encoding on; +proxy_connect_timeout 7200s; +proxy_read_timeout 7200s; +proxy_send_timeout 7200s; +gzip off; +``` + ## Security Considerations ### 1. Authentication