docs: update README, deployment guide, and dashboard docs
Some checks failed
CI / Check (push) Has been cancelled
CI / Clippy (push) Has been cancelled
CI / Formatting (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Release Build (push) Has been cancelled

This commit is contained in:
2026-03-03 13:06:37 -05:00
parent 2a7a380977
commit adbaa146fb
3 changed files with 199 additions and 76 deletions

212
README.md
View File

@@ -1,100 +1,182 @@
# LLM Proxy Gateway
A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok) with built-in token tracking, real-time cost calculation, and a management dashboard.
A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok, Ollama) with built-in token tracking, real-time cost calculation, multi-user authentication, and a management dashboard.
## 🚀 Features
## Features
- **Unified API:** Fully OpenAI-compatible `/v1/chat/completions` endpoint.
- **Multi-Provider Support:**
* **OpenAI:** Standard models (GPT-4o, GPT-3.5, etc.) and reasoning models (o1, o3).
* **Google Gemini:** Support for the latest Gemini 2.0 models.
* **DeepSeek:** High-performance, low-cost integration.
* **xAI Grok:** Integration for Grok-series models.
* **Ollama:** Support for local LLMs running on your machine or another host.
- **Observability & Tracking:**
* **Real-time Costing:** Fetches live pricing and context specs from `models.dev` on startup.
* **Token Counting:** Precise estimation using `tiktoken-rs`.
* **Database Logging:** Every request is logged to SQLite for historical analysis.
* **Streaming Support:** Full SSE (Server-Sent Events) support with aggregated token tracking.
- **Multimodal (Vision):** Support for image processing (Base64 and remote URLs) across compatible providers.
- **Reliability:**
* **Circuit Breaking:** Automatically protects your system when providers are down.
* **Rate Limiting:** Granular per-client and global rate limits.
- **Management Dashboard:** A modern, real-time web interface to monitor usage, costs, and system health.
- **Unified API:** OpenAI-compatible `/v1/chat/completions` and `/v1/models` endpoints.
- **Multi-Provider Support:**
- **OpenAI:** GPT-4o, GPT-4o Mini, o1, o3 reasoning models.
- **Google Gemini:** Gemini 2.0 Flash, Pro, and vision models.
- **DeepSeek:** DeepSeek Chat and Reasoner models.
- **xAI Grok:** Grok-beta models.
- **Ollama:** Local LLMs running on your network.
- **Observability & Tracking:**
- **Real-time Costing:** Fetches live pricing and context specs from `models.dev` on startup.
- **Token Counting:** Precise estimation using `tiktoken-rs`.
- **Database Logging:** Every request logged to SQLite for historical analysis.
- **Streaming Support:** Full SSE (Server-Sent Events) with `[DONE]` termination for client compatibility.
- **Multimodal (Vision):** Image processing (Base64 and remote URLs) across compatible providers.
- **Multi-User Access Control:**
- **Admin Role:** Full access to all dashboard features, user management, and system configuration.
- **Viewer Role:** Read-only access to usage analytics, costs, and monitoring.
- **Client API Keys:** Create and manage multiple client tokens for external integrations.
- **Reliability:**
- **Circuit Breaking:** Automatically protects when providers are down.
- **Rate Limiting:** Per-client and global rate limits.
- **Cache-Aware Costing:** Tracks cache hit/miss tokens for accurate billing.
## 🛠️ Tech Stack
## Tech Stack
- **Runtime:** Rust (2024 Edition) with [Tokio](https://tokio.rs/).
- **Web Framework:** [Axum](https://github.com/tokio-rs/axum).
- **Database:** [SQLx](https://github.com/launchbadge/sqlx) with SQLite.
- **Frontend:** Vanilla JS/CSS (no heavyweight framework required).
- **Runtime:** Rust with Tokio.
- **Web Framework:** Axum.
- **Database:** SQLx with SQLite.
- **Frontend:** Vanilla JS/CSS with Chart.js for visualizations.
## 🚦 Getting Started
## Getting Started
### Prerequisites
- Rust (1.80+)
- SQLite3
- Rust (1.80+)
- SQLite3
### Installation
### Quick Start
1. Clone the repository:
```bash
git clone https://github.com/yourusername/llm-proxy.git
cd llm-proxy
```
1. Clone and build:
```bash
git clone ssh://git.dustin.coffee:2222/hobokenchicken/llm-proxy.git
cd llm-proxy
cargo build --release
```
2. Set up your environment:
```bash
cp .env.example .env
# Edit .env and add your API keys
```
2. Configure environment:
```bash
cp .env.example .env
# Edit .env and add your API keys:
# OPENAI_API_KEY=sk-...
# GEMINI_API_KEY=AIza...
# DEEPSEEK_API_KEY=sk-...
# GROK_API_KEY=gk-... (optional)
```
3. Configure providers and server:
Edit `config.toml` to customize models, pricing fallbacks, and port settings.
3. Run the proxy:
```bash
cargo run --release
```
**Ollama Example (config.toml):**
```toml
[providers.ollama]
enabled = true
base_url = "http://192.168.1.50:11434/v1"
models = ["llama3", "mistral"]
```
The server starts on `http://localhost:8080` by default.
4. Run the proxy:
```bash
cargo run --release
```
### Configuration
The server will start at `http://localhost:3000` (by default).
Edit `config.toml` to customize providers, models, and settings:
## 📊 Management Dashboard
```toml
[server]
port = 8080
host = "0.0.0.0"
Access the built-in dashboard at `http://localhost:3000` to see:
- **Usage Summary:** Total requests, tokens, and USD spent.
- **Trend Charts:** 24-hour request and cost distributions.
- **Live Logs:** Real-time stream of incoming LLM requests via WebSockets.
- **Provider Health:** Monitor which providers are online or degraded.
[database]
path = "./data/llm_proxy.db"
## 🔌 API Usage
[providers.openai]
enabled = true
default_model = "gpt-4o"
The proxy is designed to be a drop-in replacement for OpenAI. Simply change your base URL:
[providers.gemini]
enabled = true
default_model = "gemini-2.0-flash"
**Example Request (Python):**
[providers.deepseek]
enabled = true
default_model = "deepseek-reasoner"
[providers.grok]
enabled = false
default_model = "grok-beta"
[providers.ollama]
enabled = false
base_url = "http://localhost:11434/v1"
```
## Management Dashboard
Access the dashboard at `http://localhost:8080`:
- **Overview:** Real-time request counters, system health, provider status.
- **Analytics:** Time-series charts, filterable by date, client, provider, and model.
- **Costs:** Budget tracking, cost breakdown by provider/client/model, projections.
- **Clients:** Create, revoke, and rotate API tokens; per-client usage stats.
- **Providers:** Enable/disable providers, test connections, configure API keys.
- **Monitoring:** Live request stream via WebSocket, response times, error rates.
- **Users:** Admin/user management with role-based access control.
### Default Credentials
- **Username:** `admin`
- **Password:** `admin123`
Change the admin password in the dashboard after first login!
## API Usage
The proxy is a drop-in replacement for OpenAI. Configure your client:
### Python
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:3000/v1",
api_key="your-proxy-client-id" # Hashed sk- keys are managed in the dashboard
base_url="http://localhost:8080/v1",
api_key="YOUR_CLIENT_API_KEY" # Create in dashboard
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello from the proxy!"}]
messages=[{"role": "user", "content": "Hello!"}]
)
```
## ⚖️ License
### Open WebUI
```
API Base URL: http://your-server:8080/v1
API Key: YOUR_CLIENT_API_KEY
```
### cURL
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_CLIENT_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": false
}'
```
## Model Discovery
The proxy exposes `/v1/models` for OpenAI-compatible client model discovery:
```bash
curl http://localhost:8080/v1/models \
-H "Authorization: Bearer YOUR_CLIENT_API_KEY"
```
## Troubleshooting
### Streaming Issues
If clients timeout or show "TransferEncodingError", ensure:
1. Proxy buffering is disabled in nginx: `proxy_buffering off;`
2. Chunked transfer is enabled: `chunked_transfer_encoding on;`
3. Timeouts are sufficient: `proxy_read_timeout 7200s;`
### Provider Errors
- Check API keys are set in `.env`
- Test provider in dashboard (Settings → Providers → Test)
- Review logs: `journalctl -u llm-proxy -f`
## License
MIT OR Apache-2.0