docs: sync documentation with current implementation and archive stale plan
This commit is contained in:
108
README.md
108
README.md
@@ -26,6 +26,17 @@ A unified, high-performance LLM proxy gateway built in Rust. It provides a singl
|
||||
- **Rate Limiting:** Per-client and global rate limits.
|
||||
- **Cache-Aware Costing:** Tracks cache hit/miss tokens for accurate billing.
|
||||
|
||||
## Security
|
||||
|
||||
LLM Proxy is designed with security in mind:
|
||||
|
||||
- **HMAC Session Tokens:** Management dashboard sessions are secured using HMAC-SHA256 signed tokens.
|
||||
- **Encrypted Provider Keys:** Sensitive LLM provider API keys are stored encrypted (AES-256-GCM) in the database.
|
||||
- **Session Refresh:** Activity-based session extension prevents session hijacking while maintaining user convenience.
|
||||
- **XSS Prevention:** Standardized frontend escaping using `window.api.escapeHtml`.
|
||||
|
||||
**Note:** You must define a `SESSION_SECRET` in your `.env` file for secure session signing.
|
||||
|
||||
## Tech Stack
|
||||
|
||||
- **Runtime:** Rust with Tokio.
|
||||
@@ -39,6 +50,7 @@ A unified, high-performance LLM proxy gateway built in Rust. It provides a singl
|
||||
|
||||
- Rust (1.80+)
|
||||
- SQLite3
|
||||
- Docker (optional, for containerized deployment)
|
||||
|
||||
### Quick Start
|
||||
|
||||
@@ -53,10 +65,9 @@ A unified, high-performance LLM proxy gateway built in Rust. It provides a singl
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# Edit .env and add your API keys:
|
||||
# SESSION_SECRET=... (Generate a strong random secret)
|
||||
# OPENAI_API_KEY=sk-...
|
||||
# GEMINI_API_KEY=AIza...
|
||||
# DEEPSEEK_API_KEY=sk-...
|
||||
# GROK_API_KEY=gk-... (optional)
|
||||
```
|
||||
|
||||
3. Run the proxy:
|
||||
@@ -66,50 +77,31 @@ A unified, high-performance LLM proxy gateway built in Rust. It provides a singl
|
||||
|
||||
The server starts on `http://localhost:8080` by default.
|
||||
|
||||
### Configuration
|
||||
### Deployment (Docker)
|
||||
|
||||
Edit `config.toml` to customize providers, models, and settings:
|
||||
A multi-stage `Dockerfile` is provided for efficient deployment:
|
||||
|
||||
```toml
|
||||
[server]
|
||||
port = 8080
|
||||
host = "0.0.0.0"
|
||||
```bash
|
||||
# Build the container
|
||||
docker build -t llm-proxy .
|
||||
|
||||
[database]
|
||||
path = "./data/llm_proxy.db"
|
||||
|
||||
[providers.openai]
|
||||
enabled = true
|
||||
default_model = "gpt-4o"
|
||||
|
||||
[providers.gemini]
|
||||
enabled = true
|
||||
default_model = "gemini-2.0-flash"
|
||||
|
||||
[providers.deepseek]
|
||||
enabled = true
|
||||
default_model = "deepseek-reasoner"
|
||||
|
||||
[providers.grok]
|
||||
enabled = false
|
||||
default_model = "grok-beta"
|
||||
|
||||
[providers.ollama]
|
||||
enabled = false
|
||||
base_url = "http://localhost:11434/v1"
|
||||
# Run the container
|
||||
docker run -p 8080:8080 \
|
||||
-e SESSION_SECRET=your-secure-secret \
|
||||
-v ./data:/app/data \
|
||||
llm-proxy
|
||||
```
|
||||
|
||||
## Management Dashboard
|
||||
|
||||
Access the dashboard at `http://localhost:8080`:
|
||||
Access the dashboard at `http://localhost:8080`. The dashboard architecture has been refactored into modular sub-components for better maintainability:
|
||||
|
||||
- **Overview:** Real-time request counters, system health, provider status.
|
||||
- **Analytics:** Time-series charts, filterable by date, client, provider, and model.
|
||||
- **Costs:** Budget tracking, cost breakdown by provider/client/model, projections.
|
||||
- **Clients:** Create, revoke, and rotate API tokens; per-client usage stats.
|
||||
- **Providers:** Enable/disable providers, test connections, configure API keys.
|
||||
- **Monitoring:** Live request stream via WebSocket, response times, error rates.
|
||||
- **Users:** Admin/user management with role-based access control.
|
||||
- **Auth (`/api/auth`):** Login, session management, and password changes.
|
||||
- **Usage (`/api/usage`):** Summary stats, time-series analytics, and provider breakdown.
|
||||
- **Clients (`/api/clients`):** API key management and per-client usage tracking.
|
||||
- **Providers (`/api/providers`):** Provider configuration, status monitoring, and connection testing.
|
||||
- **System (`/api/system`):** Health metrics, live logs, database backups, and global settings.
|
||||
- **Monitoring:** Live request stream via WebSocket.
|
||||
|
||||
### Default Credentials
|
||||
|
||||
@@ -137,46 +129,6 @@ response = client.chat.completions.create(
|
||||
)
|
||||
```
|
||||
|
||||
### Open WebUI
|
||||
```
|
||||
API Base URL: http://your-server:8080/v1
|
||||
API Key: YOUR_CLIENT_API_KEY
|
||||
```
|
||||
|
||||
### cURL
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer YOUR_CLIENT_API_KEY" \
|
||||
-d '{
|
||||
"model": "gpt-4o",
|
||||
"messages": [{"role": "user", "content": "Hello!"}],
|
||||
"stream": false
|
||||
}'
|
||||
```
|
||||
|
||||
## Model Discovery
|
||||
|
||||
The proxy exposes `/v1/models` for OpenAI-compatible client model discovery:
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/v1/models \
|
||||
-H "Authorization: Bearer YOUR_CLIENT_API_KEY"
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Streaming Issues
|
||||
If clients timeout or show "TransferEncodingError", ensure:
|
||||
1. Proxy buffering is disabled in nginx: `proxy_buffering off;`
|
||||
2. Chunked transfer is enabled: `chunked_transfer_encoding on;`
|
||||
3. Timeouts are sufficient: `proxy_read_timeout 7200s;`
|
||||
|
||||
### Provider Errors
|
||||
- Check API keys are set in `.env`
|
||||
- Test provider in dashboard (Settings → Providers → Test)
|
||||
- Review logs: `journalctl -u llm-proxy -f`
|
||||
|
||||
## License
|
||||
|
||||
MIT OR Apache-2.0
|
||||
|
||||
Reference in New Issue
Block a user