docs: update README, deployment guide, and dashboard docs

2026-03-03 13:06:37 -05:00
parent 2a7a380977
commit adbaa146fb
3 changed files with 199 additions and 76 deletions
@@ -6,37 +6,43 @@ This is a comprehensive admin dashboard for the LLM Proxy Gateway, providing rea

 ## Features

-### 1. **Dashboard Overview**
+### 1. Dashboard Overview
 - Real-time request counters and statistics
 - System health indicators
 - Provider status monitoring
 - Recent requests stream

-### 2. **Usage Analytics**
+### 2. Usage Analytics
 - Time series charts for requests, tokens, and costs
 - Filter by date range, client, provider, and model
 - Top clients and models analysis
 - Export functionality to CSV/JSON

-### 3. **Cost Management**
+### 3. Cost Management
 - Cost breakdown by provider, client, and model
 - Budget tracking with alerts
 - Cost projections
 - Pricing configuration management

-### 4. **Client Management**
+### 4. Client Management
 - List, create, revoke, and rotate API tokens
 - Client-specific rate limits
 - Usage statistics per client
 - Token management interface

-### 5. **Provider Configuration**
+### 5. Provider Configuration
 - Enable/disable LLM providers
 - Configure API keys (masked display)
 - Test provider connections
 - Model availability management

-### 6. **Real-time Monitoring**
+### 6. User Management (RBAC)
+- **Admin Role:** Full access to all dashboard features, user management, system configuration
+- **Viewer Role:** Read-only access to usage analytics, costs, and monitoring
+- Create/manage dashboard users with role assignment
+- Secure password management
+
+### 7. Real-time Monitoring
 - Live request stream via WebSocket
 - System metrics dashboard
 - Response time and error rate tracking
@@ -99,9 +105,16 @@ http://localhost:8080
 ### Clients
 - `GET /api/clients` - List all clients
 - `POST /api/clients` - Create new client
+- `PUT /api/clients/{id}` - Update client
 - `DELETE /api/clients/{id}` - Revoke client
 - `GET /api/clients/{id}/usage` - Client-specific usage

+### Users (RBAC)
+- `GET /api/users` - List all dashboard users
+- `POST /api/users` - Create new user
+- `PUT /api/users/{id}` - Update user (admin only)
+- `DELETE /api/users/{id}` - Delete user (admin only)
+
 ### Providers
 - `GET /api/providers` - List providers and status
 - `PUT /api/providers/{name}` - Update provider config
@@ -1,100 +1,182 @@
 # LLM Proxy Gateway

-A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok) with built-in token tracking, real-time cost calculation, and a management dashboard.
+A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok, Ollama) with built-in token tracking, real-time cost calculation, multi-user authentication, and a management dashboard.

-## 🚀 Features
+## Features

-   **Unified API:** Fully OpenAI-compatible `/v1/chat/completions` endpoint.
+- **Unified API:** OpenAI-compatible `/v1/chat/completions` and `/v1/models` endpoints.
 - **Multi-Provider Support:**
-    *   **OpenAI:** Standard models (GPT-4o, GPT-3.5, etc.) and reasoning models (o1, o3).
-    *   **Google Gemini:** Support for the latest Gemini 2.0 models.
-    *   **DeepSeek:** High-performance, low-cost integration.
-    *   **xAI Grok:** Integration for Grok-series models.
-    *   **Ollama:** Support for local LLMs running on your machine or another host.
+  - **OpenAI:** GPT-4o, GPT-4o Mini, o1, o3 reasoning models.
+  - **Google Gemini:** Gemini 2.0 Flash, Pro, and vision models.
+  - **DeepSeek:** DeepSeek Chat and Reasoner models.
+  - **xAI Grok:** Grok-beta models.
+  - **Ollama:** Local LLMs running on your network.
 - **Observability & Tracking:**
-    *   **Real-time Costing:** Fetches live pricing and context specs from `models.dev` on startup.
-    *   **Token Counting:** Precise estimation using `tiktoken-rs`.
-    *   **Database Logging:** Every request is logged to SQLite for historical analysis.
-    *   **Streaming Support:** Full SSE (Server-Sent Events) support with aggregated token tracking.
-   **Multimodal (Vision):** Support for image processing (Base64 and remote URLs) across compatible providers.
+  - **Real-time Costing:** Fetches live pricing and context specs from `models.dev` on startup.
+  - **Token Counting:** Precise estimation using `tiktoken-rs`.
+  - **Database Logging:** Every request logged to SQLite for historical analysis.
+  - **Streaming Support:** Full SSE (Server-Sent Events) with `[DONE]` termination for client compatibility.
+- **Multimodal (Vision):** Image processing (Base64 and remote URLs) across compatible providers.
+- **Multi-User Access Control:**
+  - **Admin Role:** Full access to all dashboard features, user management, and system configuration.
+  - **Viewer Role:** Read-only access to usage analytics, costs, and monitoring.
+  - **Client API Keys:** Create and manage multiple client tokens for external integrations.
 - **Reliability:**
-    *   **Circuit Breaking:** Automatically protects your system when providers are down.
-    *   **Rate Limiting:** Granular per-client and global rate limits.
-   **Management Dashboard:** A modern, real-time web interface to monitor usage, costs, and system health.
+  - **Circuit Breaking:** Automatically protects when providers are down.
+  - **Rate Limiting:** Per-client and global rate limits.
+  - **Cache-Aware Costing:** Tracks cache hit/miss tokens for accurate billing.

-## 🛠️ Tech Stack
+## Tech Stack

-   **Runtime:** Rust (2024 Edition) with [Tokio](https://tokio.rs/).
-   **Web Framework:** [Axum](https://github.com/tokio-rs/axum).
-   **Database:** [SQLx](https://github.com/launchbadge/sqlx) with SQLite.
-   **Frontend:** Vanilla JS/CSS (no heavyweight framework required).
+- **Runtime:** Rust with Tokio.
+- **Web Framework:** Axum.
+- **Database:** SQLx with SQLite.
+- **Frontend:** Vanilla JS/CSS with Chart.js for visualizations.

-## 🚦 Getting Started
+## Getting Started

 ### Prerequisites

 - Rust (1.80+)
 - SQLite3

-### Installation
+### Quick Start

-1.  Clone the repository:
+1. Clone and build:
   ```bash
-    git clone https://github.com/yourusername/llm-proxy.git
+   git clone ssh://git.dustin.coffee:2222/hobokenchicken/llm-proxy.git
   cd llm-proxy
+   cargo build --release
   ```

-2.  Set up your environment:
+2. Configure environment:
   ```bash
   cp .env.example .env
-    # Edit .env and add your API keys
+   # Edit .env and add your API keys:
+   # OPENAI_API_KEY=sk-...
+   # GEMINI_API_KEY=AIza...
+   # DEEPSEEK_API_KEY=sk-...
+   # GROK_API_KEY=gk-...  (optional)
   ```

-3.  Configure providers and server:
-    Edit `config.toml` to customize models, pricing fallbacks, and port settings.
-
-    **Ollama Example (config.toml):**
-    ```toml
-    [providers.ollama]
-    enabled = true
-    base_url = "http://192.168.1.50:11434/v1"
-    models = ["llama3", "mistral"]
-    ```
-
-4.  Run the proxy:
+3. Run the proxy:
   ```bash
   cargo run --release
   ```

-The server will start at `http://localhost:3000` (by default).
+The server starts on `http://localhost:8080` by default.

-## 📊 Management Dashboard
+### Configuration

-Access the built-in dashboard at `http://localhost:3000` to see:
-   **Usage Summary:** Total requests, tokens, and USD spent.
-   **Trend Charts:** 24-hour request and cost distributions.
-   **Live Logs:** Real-time stream of incoming LLM requests via WebSockets.
-   **Provider Health:** Monitor which providers are online or degraded.
+Edit `config.toml` to customize providers, models, and settings:

-## 🔌 API Usage
+```toml
+[server]
+port = 8080
+host = "0.0.0.0"

-The proxy is designed to be a drop-in replacement for OpenAI. Simply change your base URL:
+[database]
+path = "./data/llm_proxy.db"

-**Example Request (Python):**
+[providers.openai]
+enabled = true
+default_model = "gpt-4o"
+
+[providers.gemini]
+enabled = true
+default_model = "gemini-2.0-flash"
+
+[providers.deepseek]
+enabled = true
+default_model = "deepseek-reasoner"
+
+[providers.grok]
+enabled = false
+default_model = "grok-beta"
+
+[providers.ollama]
+enabled = false
+base_url = "http://localhost:11434/v1"
+```
+
+## Management Dashboard
+
+Access the dashboard at `http://localhost:8080`:
+
+- **Overview:** Real-time request counters, system health, provider status.
+- **Analytics:** Time-series charts, filterable by date, client, provider, and model.
+- **Costs:** Budget tracking, cost breakdown by provider/client/model, projections.
+- **Clients:** Create, revoke, and rotate API tokens; per-client usage stats.
+- **Providers:** Enable/disable providers, test connections, configure API keys.
+- **Monitoring:** Live request stream via WebSocket, response times, error rates.
+- **Users:** Admin/user management with role-based access control.
+
+### Default Credentials
+
+- **Username:** `admin`
+- **Password:** `admin123`
+
+Change the admin password in the dashboard after first login!
+
+## API Usage
+
+The proxy is a drop-in replacement for OpenAI. Configure your client:
+
+### Python
 ```python
 from openai import OpenAI

 client = OpenAI(
-    base_url="http://localhost:3000/v1",
-    api_key="your-proxy-client-id" # Hashed sk- keys are managed in the dashboard
+    base_url="http://localhost:8080/v1",
+    api_key="YOUR_CLIENT_API_KEY"  # Create in dashboard
 )

 response = client.chat.completions.create(
    model="gpt-4o",
-    messages=[{"role": "user", "content": "Hello from the proxy!"}]
+    messages=[{"role": "user", "content": "Hello!"}]
 )
 ```

-## ⚖️ License
+### Open WebUI
+```
+API Base URL: http://your-server:8080/v1
+API Key: YOUR_CLIENT_API_KEY
+```
+
+### cURL
+```bash
+curl -X POST http://localhost:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer YOUR_CLIENT_API_KEY" \
+  -d '{
+    "model": "gpt-4o",
+    "messages": [{"role": "user", "content": "Hello!"}],
+    "stream": false
+  }'
+```
+
+## Model Discovery
+
+The proxy exposes `/v1/models` for OpenAI-compatible client model discovery:
+
+```bash
+curl http://localhost:8080/v1/models \
+  -H "Authorization: Bearer YOUR_CLIENT_API_KEY"
+```
+
+## Troubleshooting
+
+### Streaming Issues
+If clients timeout or show "TransferEncodingError", ensure:
+1. Proxy buffering is disabled in nginx: `proxy_buffering off;`
+2. Chunked transfer is enabled: `chunked_transfer_encoding on;`
+3. Timeouts are sufficient: `proxy_read_timeout 7200s;`
+
+### Provider Errors
+- Check API keys are set in `.env`
+- Test provider in dashboard (Settings → Providers → Test)
+- Review logs: `journalctl -u llm-proxy -f`
+
+## License

 MIT OR Apache-2.0
@@ -118,6 +118,9 @@ default_model = "grok-beta"
 ```

 ## Nginx Reverse Proxy Configuration
+
+**Important for SSE/Streaming:** Disable buffering and configure timeouts for proper SSE support.
+
 ```nginx
 server {
    listen 80;
@@ -126,12 +129,22 @@ server {
    location / {
        proxy_pass http://localhost:8080;
        proxy_http_version 1.1;
-        proxy_set_header Upgrade $http_upgrade;
-        proxy_set_header Connection 'upgrade';
-        proxy_set_header Host $host;
-        proxy_cache_bypass $http_upgrade;
        
-        # WebSocket support
+        # SSE/Streaming support
+        proxy_buffering off;
+        chunked_transfer_encoding on;
+        proxy_set_header Connection '';
+        
+        # Timeouts for long-running streams
+        proxy_connect_timeout 7200s;
+        proxy_read_timeout 7200s;
+        proxy_send_timeout 7200s;
+        
+        # Disable gzip for streaming
+        gzip off;
+        
+        # Headers
+        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
@@ -144,6 +157,21 @@ server {
 }
 ```

+### NGINX Proxy Manager
+
+If using NGINX Proxy Manager, add this to **Advanced Settings**:
+
+```nginx
+proxy_buffering off;
+proxy_http_version 1.1;
+proxy_set_header Connection '';
+chunked_transfer_encoding on;
+proxy_connect_timeout 7200s;
+proxy_read_timeout 7200s;
+proxy_send_timeout 7200s;
+gzip off;
+```
+
 ## Security Considerations

 ### 1. Authentication