183 lines
5.1 KiB
Markdown
183 lines
5.1 KiB
Markdown
# LLM Proxy Gateway
|
|
|
|
A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok, Ollama) with built-in token tracking, real-time cost calculation, multi-user authentication, and a management dashboard.
|
|
|
|
## Features
|
|
|
|
- **Unified API:** OpenAI-compatible `/v1/chat/completions` and `/v1/models` endpoints.
|
|
- **Multi-Provider Support:**
|
|
- **OpenAI:** GPT-4o, GPT-4o Mini, o1, o3 reasoning models.
|
|
- **Google Gemini:** Gemini 2.0 Flash, Pro, and vision models.
|
|
- **DeepSeek:** DeepSeek Chat and Reasoner models.
|
|
- **xAI Grok:** Grok-beta models.
|
|
- **Ollama:** Local LLMs running on your network.
|
|
- **Observability & Tracking:**
|
|
- **Real-time Costing:** Fetches live pricing and context specs from `models.dev` on startup.
|
|
- **Token Counting:** Precise estimation using `tiktoken-rs`.
|
|
- **Database Logging:** Every request logged to SQLite for historical analysis.
|
|
- **Streaming Support:** Full SSE (Server-Sent Events) with `[DONE]` termination for client compatibility.
|
|
- **Multimodal (Vision):** Image processing (Base64 and remote URLs) across compatible providers.
|
|
- **Multi-User Access Control:**
|
|
- **Admin Role:** Full access to all dashboard features, user management, and system configuration.
|
|
- **Viewer Role:** Read-only access to usage analytics, costs, and monitoring.
|
|
- **Client API Keys:** Create and manage multiple client tokens for external integrations.
|
|
- **Reliability:**
|
|
- **Circuit Breaking:** Automatically protects when providers are down.
|
|
- **Rate Limiting:** Per-client and global rate limits.
|
|
- **Cache-Aware Costing:** Tracks cache hit/miss tokens for accurate billing.
|
|
|
|
## Tech Stack
|
|
|
|
- **Runtime:** Rust with Tokio.
|
|
- **Web Framework:** Axum.
|
|
- **Database:** SQLx with SQLite.
|
|
- **Frontend:** Vanilla JS/CSS with Chart.js for visualizations.
|
|
|
|
## Getting Started
|
|
|
|
### Prerequisites
|
|
|
|
- Rust (1.80+)
|
|
- SQLite3
|
|
|
|
### Quick Start
|
|
|
|
1. Clone and build:
|
|
```bash
|
|
git clone ssh://git.dustin.coffee:2222/hobokenchicken/llm-proxy.git
|
|
cd llm-proxy
|
|
cargo build --release
|
|
```
|
|
|
|
2. Configure environment:
|
|
```bash
|
|
cp .env.example .env
|
|
# Edit .env and add your API keys:
|
|
# OPENAI_API_KEY=sk-...
|
|
# GEMINI_API_KEY=AIza...
|
|
# DEEPSEEK_API_KEY=sk-...
|
|
# GROK_API_KEY=gk-... (optional)
|
|
```
|
|
|
|
3. Run the proxy:
|
|
```bash
|
|
cargo run --release
|
|
```
|
|
|
|
The server starts on `http://localhost:8080` by default.
|
|
|
|
### Configuration
|
|
|
|
Edit `config.toml` to customize providers, models, and settings:
|
|
|
|
```toml
|
|
[server]
|
|
port = 8080
|
|
host = "0.0.0.0"
|
|
|
|
[database]
|
|
path = "./data/llm_proxy.db"
|
|
|
|
[providers.openai]
|
|
enabled = true
|
|
default_model = "gpt-4o"
|
|
|
|
[providers.gemini]
|
|
enabled = true
|
|
default_model = "gemini-2.0-flash"
|
|
|
|
[providers.deepseek]
|
|
enabled = true
|
|
default_model = "deepseek-reasoner"
|
|
|
|
[providers.grok]
|
|
enabled = false
|
|
default_model = "grok-beta"
|
|
|
|
[providers.ollama]
|
|
enabled = false
|
|
base_url = "http://localhost:11434/v1"
|
|
```
|
|
|
|
## Management Dashboard
|
|
|
|
Access the dashboard at `http://localhost:8080`:
|
|
|
|
- **Overview:** Real-time request counters, system health, provider status.
|
|
- **Analytics:** Time-series charts, filterable by date, client, provider, and model.
|
|
- **Costs:** Budget tracking, cost breakdown by provider/client/model, projections.
|
|
- **Clients:** Create, revoke, and rotate API tokens; per-client usage stats.
|
|
- **Providers:** Enable/disable providers, test connections, configure API keys.
|
|
- **Monitoring:** Live request stream via WebSocket, response times, error rates.
|
|
- **Users:** Admin/user management with role-based access control.
|
|
|
|
### Default Credentials
|
|
|
|
- **Username:** `admin`
|
|
- **Password:** `admin123`
|
|
|
|
Change the admin password in the dashboard after first login!
|
|
|
|
## API Usage
|
|
|
|
The proxy is a drop-in replacement for OpenAI. Configure your client:
|
|
|
|
### Python
|
|
```python
|
|
from openai import OpenAI
|
|
|
|
client = OpenAI(
|
|
base_url="http://localhost:8080/v1",
|
|
api_key="YOUR_CLIENT_API_KEY" # Create in dashboard
|
|
)
|
|
|
|
response = client.chat.completions.create(
|
|
model="gpt-4o",
|
|
messages=[{"role": "user", "content": "Hello!"}]
|
|
)
|
|
```
|
|
|
|
### Open WebUI
|
|
```
|
|
API Base URL: http://your-server:8080/v1
|
|
API Key: YOUR_CLIENT_API_KEY
|
|
```
|
|
|
|
### cURL
|
|
```bash
|
|
curl -X POST http://localhost:8080/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer YOUR_CLIENT_API_KEY" \
|
|
-d '{
|
|
"model": "gpt-4o",
|
|
"messages": [{"role": "user", "content": "Hello!"}],
|
|
"stream": false
|
|
}'
|
|
```
|
|
|
|
## Model Discovery
|
|
|
|
The proxy exposes `/v1/models` for OpenAI-compatible client model discovery:
|
|
|
|
```bash
|
|
curl http://localhost:8080/v1/models \
|
|
-H "Authorization: Bearer YOUR_CLIENT_API_KEY"
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Streaming Issues
|
|
If clients timeout or show "TransferEncodingError", ensure:
|
|
1. Proxy buffering is disabled in nginx: `proxy_buffering off;`
|
|
2. Chunked transfer is enabled: `chunked_transfer_encoding on;`
|
|
3. Timeouts are sufficient: `proxy_read_timeout 7200s;`
|
|
|
|
### Provider Errors
|
|
- Check API keys are set in `.env`
|
|
- Test provider in dashboard (Settings → Providers → Test)
|
|
- Review logs: `journalctl -u llm-proxy -f`
|
|
|
|
## License
|
|
|
|
MIT OR Apache-2.0
|