docs: sync documentation with current implementation and archive stale plan
Some checks failed
CI / Check (push) Has been cancelled
CI / Clippy (push) Has been cancelled
CI / Formatting (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Release Build (push) Has been cancelled

This commit is contained in:
2026-03-06 14:28:04 -05:00
parent 975ae124d1
commit 633b69a07b
8 changed files with 767 additions and 78 deletions

108
README.md
View File

@@ -26,6 +26,17 @@ A unified, high-performance LLM proxy gateway built in Rust. It provides a singl
- **Rate Limiting:** Per-client and global rate limits.
- **Cache-Aware Costing:** Tracks cache hit/miss tokens for accurate billing.
## Security
LLM Proxy is designed with security in mind:
- **HMAC Session Tokens:** Management dashboard sessions are secured using HMAC-SHA256 signed tokens.
- **Encrypted Provider Keys:** Sensitive LLM provider API keys are stored encrypted (AES-256-GCM) in the database.
- **Session Refresh:** Activity-based session extension prevents session hijacking while maintaining user convenience.
- **XSS Prevention:** Standardized frontend escaping using `window.api.escapeHtml`.
**Note:** You must define a `SESSION_SECRET` in your `.env` file for secure session signing.
## Tech Stack
- **Runtime:** Rust with Tokio.
@@ -39,6 +50,7 @@ A unified, high-performance LLM proxy gateway built in Rust. It provides a singl
- Rust (1.80+)
- SQLite3
- Docker (optional, for containerized deployment)
### Quick Start
@@ -53,10 +65,9 @@ A unified, high-performance LLM proxy gateway built in Rust. It provides a singl
```bash
cp .env.example .env
# Edit .env and add your API keys:
# SESSION_SECRET=... (Generate a strong random secret)
# OPENAI_API_KEY=sk-...
# GEMINI_API_KEY=AIza...
# DEEPSEEK_API_KEY=sk-...
# GROK_API_KEY=gk-... (optional)
```
3. Run the proxy:
@@ -66,50 +77,31 @@ A unified, high-performance LLM proxy gateway built in Rust. It provides a singl
The server starts on `http://localhost:8080` by default.
### Configuration
### Deployment (Docker)
Edit `config.toml` to customize providers, models, and settings:
A multi-stage `Dockerfile` is provided for efficient deployment:
```toml
[server]
port = 8080
host = "0.0.0.0"
```bash
# Build the container
docker build -t llm-proxy .
[database]
path = "./data/llm_proxy.db"
[providers.openai]
enabled = true
default_model = "gpt-4o"
[providers.gemini]
enabled = true
default_model = "gemini-2.0-flash"
[providers.deepseek]
enabled = true
default_model = "deepseek-reasoner"
[providers.grok]
enabled = false
default_model = "grok-beta"
[providers.ollama]
enabled = false
base_url = "http://localhost:11434/v1"
# Run the container
docker run -p 8080:8080 \
-e SESSION_SECRET=your-secure-secret \
-v ./data:/app/data \
llm-proxy
```
## Management Dashboard
Access the dashboard at `http://localhost:8080`:
Access the dashboard at `http://localhost:8080`. The dashboard architecture has been refactored into modular sub-components for better maintainability:
- **Overview:** Real-time request counters, system health, provider status.
- **Analytics:** Time-series charts, filterable by date, client, provider, and model.
- **Costs:** Budget tracking, cost breakdown by provider/client/model, projections.
- **Clients:** Create, revoke, and rotate API tokens; per-client usage stats.
- **Providers:** Enable/disable providers, test connections, configure API keys.
- **Monitoring:** Live request stream via WebSocket, response times, error rates.
- **Users:** Admin/user management with role-based access control.
- **Auth (`/api/auth`):** Login, session management, and password changes.
- **Usage (`/api/usage`):** Summary stats, time-series analytics, and provider breakdown.
- **Clients (`/api/clients`):** API key management and per-client usage tracking.
- **Providers (`/api/providers`):** Provider configuration, status monitoring, and connection testing.
- **System (`/api/system`):** Health metrics, live logs, database backups, and global settings.
- **Monitoring:** Live request stream via WebSocket.
### Default Credentials
@@ -137,46 +129,6 @@ response = client.chat.completions.create(
)
```
### Open WebUI
```
API Base URL: http://your-server:8080/v1
API Key: YOUR_CLIENT_API_KEY
```
### cURL
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_CLIENT_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": false
}'
```
## Model Discovery
The proxy exposes `/v1/models` for OpenAI-compatible client model discovery:
```bash
curl http://localhost:8080/v1/models \
-H "Authorization: Bearer YOUR_CLIENT_API_KEY"
```
## Troubleshooting
### Streaming Issues
If clients timeout or show "TransferEncodingError", ensure:
1. Proxy buffering is disabled in nginx: `proxy_buffering off;`
2. Chunked transfer is enabled: `chunked_transfer_encoding on;`
3. Timeouts are sufficient: `proxy_read_timeout 7200s;`
### Provider Errors
- Check API keys are set in `.env`
- Test provider in dashboard (Settings → Providers → Test)
- Review logs: `journalctl -u llm-proxy -f`
## License
MIT OR Apache-2.0