docs: update README, deployment guide, and dashboard docs
This commit is contained in:
@@ -6,37 +6,43 @@ This is a comprehensive admin dashboard for the LLM Proxy Gateway, providing rea
|
|||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
### 1. **Dashboard Overview**
|
### 1. Dashboard Overview
|
||||||
- Real-time request counters and statistics
|
- Real-time request counters and statistics
|
||||||
- System health indicators
|
- System health indicators
|
||||||
- Provider status monitoring
|
- Provider status monitoring
|
||||||
- Recent requests stream
|
- Recent requests stream
|
||||||
|
|
||||||
### 2. **Usage Analytics**
|
### 2. Usage Analytics
|
||||||
- Time series charts for requests, tokens, and costs
|
- Time series charts for requests, tokens, and costs
|
||||||
- Filter by date range, client, provider, and model
|
- Filter by date range, client, provider, and model
|
||||||
- Top clients and models analysis
|
- Top clients and models analysis
|
||||||
- Export functionality to CSV/JSON
|
- Export functionality to CSV/JSON
|
||||||
|
|
||||||
### 3. **Cost Management**
|
### 3. Cost Management
|
||||||
- Cost breakdown by provider, client, and model
|
- Cost breakdown by provider, client, and model
|
||||||
- Budget tracking with alerts
|
- Budget tracking with alerts
|
||||||
- Cost projections
|
- Cost projections
|
||||||
- Pricing configuration management
|
- Pricing configuration management
|
||||||
|
|
||||||
### 4. **Client Management**
|
### 4. Client Management
|
||||||
- List, create, revoke, and rotate API tokens
|
- List, create, revoke, and rotate API tokens
|
||||||
- Client-specific rate limits
|
- Client-specific rate limits
|
||||||
- Usage statistics per client
|
- Usage statistics per client
|
||||||
- Token management interface
|
- Token management interface
|
||||||
|
|
||||||
### 5. **Provider Configuration**
|
### 5. Provider Configuration
|
||||||
- Enable/disable LLM providers
|
- Enable/disable LLM providers
|
||||||
- Configure API keys (masked display)
|
- Configure API keys (masked display)
|
||||||
- Test provider connections
|
- Test provider connections
|
||||||
- Model availability management
|
- Model availability management
|
||||||
|
|
||||||
### 6. **Real-time Monitoring**
|
### 6. User Management (RBAC)
|
||||||
|
- **Admin Role:** Full access to all dashboard features, user management, system configuration
|
||||||
|
- **Viewer Role:** Read-only access to usage analytics, costs, and monitoring
|
||||||
|
- Create/manage dashboard users with role assignment
|
||||||
|
- Secure password management
|
||||||
|
|
||||||
|
### 7. Real-time Monitoring
|
||||||
- Live request stream via WebSocket
|
- Live request stream via WebSocket
|
||||||
- System metrics dashboard
|
- System metrics dashboard
|
||||||
- Response time and error rate tracking
|
- Response time and error rate tracking
|
||||||
@@ -99,9 +105,16 @@ http://localhost:8080
|
|||||||
### Clients
|
### Clients
|
||||||
- `GET /api/clients` - List all clients
|
- `GET /api/clients` - List all clients
|
||||||
- `POST /api/clients` - Create new client
|
- `POST /api/clients` - Create new client
|
||||||
|
- `PUT /api/clients/{id}` - Update client
|
||||||
- `DELETE /api/clients/{id}` - Revoke client
|
- `DELETE /api/clients/{id}` - Revoke client
|
||||||
- `GET /api/clients/{id}/usage` - Client-specific usage
|
- `GET /api/clients/{id}/usage` - Client-specific usage
|
||||||
|
|
||||||
|
### Users (RBAC)
|
||||||
|
- `GET /api/users` - List all dashboard users
|
||||||
|
- `POST /api/users` - Create new user
|
||||||
|
- `PUT /api/users/{id}` - Update user (admin only)
|
||||||
|
- `DELETE /api/users/{id}` - Delete user (admin only)
|
||||||
|
|
||||||
### Providers
|
### Providers
|
||||||
- `GET /api/providers` - List providers and status
|
- `GET /api/providers` - List providers and status
|
||||||
- `PUT /api/providers/{name}` - Update provider config
|
- `PUT /api/providers/{name}` - Update provider config
|
||||||
|
|||||||
188
README.md
188
README.md
@@ -1,100 +1,182 @@
|
|||||||
# LLM Proxy Gateway
|
# LLM Proxy Gateway
|
||||||
|
|
||||||
A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok) with built-in token tracking, real-time cost calculation, and a management dashboard.
|
A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok, Ollama) with built-in token tracking, real-time cost calculation, multi-user authentication, and a management dashboard.
|
||||||
|
|
||||||
## 🚀 Features
|
## Features
|
||||||
|
|
||||||
- **Unified API:** Fully OpenAI-compatible `/v1/chat/completions` endpoint.
|
- **Unified API:** OpenAI-compatible `/v1/chat/completions` and `/v1/models` endpoints.
|
||||||
- **Multi-Provider Support:**
|
- **Multi-Provider Support:**
|
||||||
* **OpenAI:** Standard models (GPT-4o, GPT-3.5, etc.) and reasoning models (o1, o3).
|
- **OpenAI:** GPT-4o, GPT-4o Mini, o1, o3 reasoning models.
|
||||||
* **Google Gemini:** Support for the latest Gemini 2.0 models.
|
- **Google Gemini:** Gemini 2.0 Flash, Pro, and vision models.
|
||||||
* **DeepSeek:** High-performance, low-cost integration.
|
- **DeepSeek:** DeepSeek Chat and Reasoner models.
|
||||||
* **xAI Grok:** Integration for Grok-series models.
|
- **xAI Grok:** Grok-beta models.
|
||||||
* **Ollama:** Support for local LLMs running on your machine or another host.
|
- **Ollama:** Local LLMs running on your network.
|
||||||
- **Observability & Tracking:**
|
- **Observability & Tracking:**
|
||||||
* **Real-time Costing:** Fetches live pricing and context specs from `models.dev` on startup.
|
- **Real-time Costing:** Fetches live pricing and context specs from `models.dev` on startup.
|
||||||
* **Token Counting:** Precise estimation using `tiktoken-rs`.
|
- **Token Counting:** Precise estimation using `tiktoken-rs`.
|
||||||
* **Database Logging:** Every request is logged to SQLite for historical analysis.
|
- **Database Logging:** Every request logged to SQLite for historical analysis.
|
||||||
* **Streaming Support:** Full SSE (Server-Sent Events) support with aggregated token tracking.
|
- **Streaming Support:** Full SSE (Server-Sent Events) with `[DONE]` termination for client compatibility.
|
||||||
- **Multimodal (Vision):** Support for image processing (Base64 and remote URLs) across compatible providers.
|
- **Multimodal (Vision):** Image processing (Base64 and remote URLs) across compatible providers.
|
||||||
|
- **Multi-User Access Control:**
|
||||||
|
- **Admin Role:** Full access to all dashboard features, user management, and system configuration.
|
||||||
|
- **Viewer Role:** Read-only access to usage analytics, costs, and monitoring.
|
||||||
|
- **Client API Keys:** Create and manage multiple client tokens for external integrations.
|
||||||
- **Reliability:**
|
- **Reliability:**
|
||||||
* **Circuit Breaking:** Automatically protects your system when providers are down.
|
- **Circuit Breaking:** Automatically protects when providers are down.
|
||||||
* **Rate Limiting:** Granular per-client and global rate limits.
|
- **Rate Limiting:** Per-client and global rate limits.
|
||||||
- **Management Dashboard:** A modern, real-time web interface to monitor usage, costs, and system health.
|
- **Cache-Aware Costing:** Tracks cache hit/miss tokens for accurate billing.
|
||||||
|
|
||||||
## 🛠️ Tech Stack
|
## Tech Stack
|
||||||
|
|
||||||
- **Runtime:** Rust (2024 Edition) with [Tokio](https://tokio.rs/).
|
- **Runtime:** Rust with Tokio.
|
||||||
- **Web Framework:** [Axum](https://github.com/tokio-rs/axum).
|
- **Web Framework:** Axum.
|
||||||
- **Database:** [SQLx](https://github.com/launchbadge/sqlx) with SQLite.
|
- **Database:** SQLx with SQLite.
|
||||||
- **Frontend:** Vanilla JS/CSS (no heavyweight framework required).
|
- **Frontend:** Vanilla JS/CSS with Chart.js for visualizations.
|
||||||
|
|
||||||
## 🚦 Getting Started
|
## Getting Started
|
||||||
|
|
||||||
### Prerequisites
|
### Prerequisites
|
||||||
|
|
||||||
- Rust (1.80+)
|
- Rust (1.80+)
|
||||||
- SQLite3
|
- SQLite3
|
||||||
|
|
||||||
### Installation
|
### Quick Start
|
||||||
|
|
||||||
1. Clone the repository:
|
1. Clone and build:
|
||||||
```bash
|
```bash
|
||||||
git clone https://github.com/yourusername/llm-proxy.git
|
git clone ssh://git.dustin.coffee:2222/hobokenchicken/llm-proxy.git
|
||||||
cd llm-proxy
|
cd llm-proxy
|
||||||
|
cargo build --release
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Set up your environment:
|
2. Configure environment:
|
||||||
```bash
|
```bash
|
||||||
cp .env.example .env
|
cp .env.example .env
|
||||||
# Edit .env and add your API keys
|
# Edit .env and add your API keys:
|
||||||
|
# OPENAI_API_KEY=sk-...
|
||||||
|
# GEMINI_API_KEY=AIza...
|
||||||
|
# DEEPSEEK_API_KEY=sk-...
|
||||||
|
# GROK_API_KEY=gk-... (optional)
|
||||||
```
|
```
|
||||||
|
|
||||||
3. Configure providers and server:
|
3. Run the proxy:
|
||||||
Edit `config.toml` to customize models, pricing fallbacks, and port settings.
|
|
||||||
|
|
||||||
**Ollama Example (config.toml):**
|
|
||||||
```toml
|
|
||||||
[providers.ollama]
|
|
||||||
enabled = true
|
|
||||||
base_url = "http://192.168.1.50:11434/v1"
|
|
||||||
models = ["llama3", "mistral"]
|
|
||||||
```
|
|
||||||
|
|
||||||
4. Run the proxy:
|
|
||||||
```bash
|
```bash
|
||||||
cargo run --release
|
cargo run --release
|
||||||
```
|
```
|
||||||
|
|
||||||
The server will start at `http://localhost:3000` (by default).
|
The server starts on `http://localhost:8080` by default.
|
||||||
|
|
||||||
## 📊 Management Dashboard
|
### Configuration
|
||||||
|
|
||||||
Access the built-in dashboard at `http://localhost:3000` to see:
|
Edit `config.toml` to customize providers, models, and settings:
|
||||||
- **Usage Summary:** Total requests, tokens, and USD spent.
|
|
||||||
- **Trend Charts:** 24-hour request and cost distributions.
|
|
||||||
- **Live Logs:** Real-time stream of incoming LLM requests via WebSockets.
|
|
||||||
- **Provider Health:** Monitor which providers are online or degraded.
|
|
||||||
|
|
||||||
## 🔌 API Usage
|
```toml
|
||||||
|
[server]
|
||||||
|
port = 8080
|
||||||
|
host = "0.0.0.0"
|
||||||
|
|
||||||
The proxy is designed to be a drop-in replacement for OpenAI. Simply change your base URL:
|
[database]
|
||||||
|
path = "./data/llm_proxy.db"
|
||||||
|
|
||||||
**Example Request (Python):**
|
[providers.openai]
|
||||||
|
enabled = true
|
||||||
|
default_model = "gpt-4o"
|
||||||
|
|
||||||
|
[providers.gemini]
|
||||||
|
enabled = true
|
||||||
|
default_model = "gemini-2.0-flash"
|
||||||
|
|
||||||
|
[providers.deepseek]
|
||||||
|
enabled = true
|
||||||
|
default_model = "deepseek-reasoner"
|
||||||
|
|
||||||
|
[providers.grok]
|
||||||
|
enabled = false
|
||||||
|
default_model = "grok-beta"
|
||||||
|
|
||||||
|
[providers.ollama]
|
||||||
|
enabled = false
|
||||||
|
base_url = "http://localhost:11434/v1"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Management Dashboard
|
||||||
|
|
||||||
|
Access the dashboard at `http://localhost:8080`:
|
||||||
|
|
||||||
|
- **Overview:** Real-time request counters, system health, provider status.
|
||||||
|
- **Analytics:** Time-series charts, filterable by date, client, provider, and model.
|
||||||
|
- **Costs:** Budget tracking, cost breakdown by provider/client/model, projections.
|
||||||
|
- **Clients:** Create, revoke, and rotate API tokens; per-client usage stats.
|
||||||
|
- **Providers:** Enable/disable providers, test connections, configure API keys.
|
||||||
|
- **Monitoring:** Live request stream via WebSocket, response times, error rates.
|
||||||
|
- **Users:** Admin/user management with role-based access control.
|
||||||
|
|
||||||
|
### Default Credentials
|
||||||
|
|
||||||
|
- **Username:** `admin`
|
||||||
|
- **Password:** `admin123`
|
||||||
|
|
||||||
|
Change the admin password in the dashboard after first login!
|
||||||
|
|
||||||
|
## API Usage
|
||||||
|
|
||||||
|
The proxy is a drop-in replacement for OpenAI. Configure your client:
|
||||||
|
|
||||||
|
### Python
|
||||||
```python
|
```python
|
||||||
from openai import OpenAI
|
from openai import OpenAI
|
||||||
|
|
||||||
client = OpenAI(
|
client = OpenAI(
|
||||||
base_url="http://localhost:3000/v1",
|
base_url="http://localhost:8080/v1",
|
||||||
api_key="your-proxy-client-id" # Hashed sk- keys are managed in the dashboard
|
api_key="YOUR_CLIENT_API_KEY" # Create in dashboard
|
||||||
)
|
)
|
||||||
|
|
||||||
response = client.chat.completions.create(
|
response = client.chat.completions.create(
|
||||||
model="gpt-4o",
|
model="gpt-4o",
|
||||||
messages=[{"role": "user", "content": "Hello from the proxy!"}]
|
messages=[{"role": "user", "content": "Hello!"}]
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
## ⚖️ License
|
### Open WebUI
|
||||||
|
```
|
||||||
|
API Base URL: http://your-server:8080/v1
|
||||||
|
API Key: YOUR_CLIENT_API_KEY
|
||||||
|
```
|
||||||
|
|
||||||
|
### cURL
|
||||||
|
```bash
|
||||||
|
curl -X POST http://localhost:8080/v1/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "Authorization: Bearer YOUR_CLIENT_API_KEY" \
|
||||||
|
-d '{
|
||||||
|
"model": "gpt-4o",
|
||||||
|
"messages": [{"role": "user", "content": "Hello!"}],
|
||||||
|
"stream": false
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Model Discovery
|
||||||
|
|
||||||
|
The proxy exposes `/v1/models` for OpenAI-compatible client model discovery:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl http://localhost:8080/v1/models \
|
||||||
|
-H "Authorization: Bearer YOUR_CLIENT_API_KEY"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Streaming Issues
|
||||||
|
If clients timeout or show "TransferEncodingError", ensure:
|
||||||
|
1. Proxy buffering is disabled in nginx: `proxy_buffering off;`
|
||||||
|
2. Chunked transfer is enabled: `chunked_transfer_encoding on;`
|
||||||
|
3. Timeouts are sufficient: `proxy_read_timeout 7200s;`
|
||||||
|
|
||||||
|
### Provider Errors
|
||||||
|
- Check API keys are set in `.env`
|
||||||
|
- Test provider in dashboard (Settings → Providers → Test)
|
||||||
|
- Review logs: `journalctl -u llm-proxy -f`
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
MIT OR Apache-2.0
|
MIT OR Apache-2.0
|
||||||
|
|||||||
@@ -118,6 +118,9 @@ default_model = "grok-beta"
|
|||||||
```
|
```
|
||||||
|
|
||||||
## Nginx Reverse Proxy Configuration
|
## Nginx Reverse Proxy Configuration
|
||||||
|
|
||||||
|
**Important for SSE/Streaming:** Disable buffering and configure timeouts for proper SSE support.
|
||||||
|
|
||||||
```nginx
|
```nginx
|
||||||
server {
|
server {
|
||||||
listen 80;
|
listen 80;
|
||||||
@@ -126,12 +129,22 @@ server {
|
|||||||
location / {
|
location / {
|
||||||
proxy_pass http://localhost:8080;
|
proxy_pass http://localhost:8080;
|
||||||
proxy_http_version 1.1;
|
proxy_http_version 1.1;
|
||||||
proxy_set_header Upgrade $http_upgrade;
|
|
||||||
proxy_set_header Connection 'upgrade';
|
|
||||||
proxy_set_header Host $host;
|
|
||||||
proxy_cache_bypass $http_upgrade;
|
|
||||||
|
|
||||||
# WebSocket support
|
# SSE/Streaming support
|
||||||
|
proxy_buffering off;
|
||||||
|
chunked_transfer_encoding on;
|
||||||
|
proxy_set_header Connection '';
|
||||||
|
|
||||||
|
# Timeouts for long-running streams
|
||||||
|
proxy_connect_timeout 7200s;
|
||||||
|
proxy_read_timeout 7200s;
|
||||||
|
proxy_send_timeout 7200s;
|
||||||
|
|
||||||
|
# Disable gzip for streaming
|
||||||
|
gzip off;
|
||||||
|
|
||||||
|
# Headers
|
||||||
|
proxy_set_header Host $host;
|
||||||
proxy_set_header X-Real-IP $remote_addr;
|
proxy_set_header X-Real-IP $remote_addr;
|
||||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||||
proxy_set_header X-Forwarded-Proto $scheme;
|
proxy_set_header X-Forwarded-Proto $scheme;
|
||||||
@@ -144,6 +157,21 @@ server {
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### NGINX Proxy Manager
|
||||||
|
|
||||||
|
If using NGINX Proxy Manager, add this to **Advanced Settings**:
|
||||||
|
|
||||||
|
```nginx
|
||||||
|
proxy_buffering off;
|
||||||
|
proxy_http_version 1.1;
|
||||||
|
proxy_set_header Connection '';
|
||||||
|
chunked_transfer_encoding on;
|
||||||
|
proxy_connect_timeout 7200s;
|
||||||
|
proxy_read_timeout 7200s;
|
||||||
|
proxy_send_timeout 7200s;
|
||||||
|
gzip off;
|
||||||
|
```
|
||||||
|
|
||||||
## Security Considerations
|
## Security Considerations
|
||||||
|
|
||||||
### 1. Authentication
|
### 1. Authentication
|
||||||
|
|||||||
Reference in New Issue
Block a user