294 lines
6.9 KiB
Markdown
294 lines
6.9 KiB
Markdown
# LLM Proxy Gateway - Deployment Guide
|
|
|
|
## Overview
|
|
A unified LLM proxy gateway supporting OpenAI, Google Gemini, DeepSeek, and xAI Grok with token tracking, cost calculation, and admin dashboard.
|
|
|
|
## System Requirements
|
|
- **CPU**: 2 cores minimum
|
|
- **RAM**: 512MB minimum (1GB recommended)
|
|
- **Storage**: 10GB minimum
|
|
- **OS**: Linux (tested on Arch Linux, Ubuntu, Debian)
|
|
- **Runtime**: Rust 1.70+ with Cargo
|
|
|
|
## Deployment Options
|
|
|
|
### Option 1: Docker (Recommended)
|
|
```dockerfile
|
|
FROM rust:1.70-alpine as builder
|
|
WORKDIR /app
|
|
COPY . .
|
|
RUN cargo build --release
|
|
|
|
FROM alpine:latest
|
|
RUN apk add --no-cache libgcc
|
|
COPY --from=builder /app/target/release/llm-proxy /usr/local/bin/
|
|
COPY --from=builder /app/static /app/static
|
|
WORKDIR /app
|
|
EXPOSE 8080
|
|
CMD ["llm-proxy"]
|
|
```
|
|
|
|
### Option 2: Systemd Service (Bare Metal/LXC)
|
|
```ini
|
|
# /etc/systemd/system/llm-proxy.service
|
|
[Unit]
|
|
Description=LLM Proxy Gateway
|
|
After=network.target
|
|
|
|
[Service]
|
|
Type=simple
|
|
User=llmproxy
|
|
Group=llmproxy
|
|
WorkingDirectory=/opt/llm-proxy
|
|
ExecStart=/opt/llm-proxy/llm-proxy
|
|
Restart=always
|
|
RestartSec=10
|
|
Environment="RUST_LOG=info"
|
|
Environment="LLM_PROXY__SERVER__PORT=8080"
|
|
Environment="LLM_PROXY__SERVER__AUTH_TOKENS=sk-test-123,sk-test-456"
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
### Option 3: LXC Container (Proxmox)
|
|
1. Create Alpine Linux LXC container
|
|
2. Install Rust: `apk add rust cargo`
|
|
3. Copy application files
|
|
4. Build: `cargo build --release`
|
|
5. Run: `./target/release/llm-proxy`
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
```bash
|
|
# Required API Keys
|
|
OPENAI_API_KEY=sk-...
|
|
GEMINI_API_KEY=AIza...
|
|
DEEPSEEK_API_KEY=sk-...
|
|
GROK_API_KEY=gk-... # Optional
|
|
|
|
# Server Configuration (with LLM_PROXY__ prefix)
|
|
LLM_PROXY__SERVER__PORT=8080
|
|
LLM_PROXY__SERVER__HOST=0.0.0.0
|
|
LLM_PROXY__SERVER__AUTH_TOKENS=sk-test-123,sk-test-456
|
|
|
|
# Database Configuration
|
|
LLM_PROXY__DATABASE__PATH=./data/llm_proxy.db
|
|
LLM_PROXY__DATABASE__MAX_CONNECTIONS=10
|
|
|
|
# Provider Configuration
|
|
LLM_PROXY__PROVIDERS__OPENAI__ENABLED=true
|
|
LLM_PROXY__PROVIDERS__GEMINI__ENABLED=true
|
|
LLM_PROXY__PROVIDERS__DEEPSEEK__ENABLED=true
|
|
LLM_PROXY__PROVIDERS__GROK__ENABLED=false
|
|
```
|
|
|
|
### Configuration File (config.toml)
|
|
Create `config.toml` in the application directory:
|
|
```toml
|
|
[server]
|
|
port = 8080
|
|
host = "0.0.0.0"
|
|
auth_tokens = ["sk-test-123", "sk-test-456"]
|
|
|
|
[database]
|
|
path = "./data/llm_proxy.db"
|
|
max_connections = 10
|
|
|
|
[providers.openai]
|
|
enabled = true
|
|
base_url = "https://api.openai.com/v1"
|
|
default_model = "gpt-4o"
|
|
|
|
[providers.gemini]
|
|
enabled = true
|
|
base_url = "https://generativelanguage.googleapis.com/v1"
|
|
default_model = "gemini-2.0-flash"
|
|
|
|
[providers.deepseek]
|
|
enabled = true
|
|
base_url = "https://api.deepseek.com"
|
|
default_model = "deepseek-reasoner"
|
|
|
|
[providers.grok]
|
|
enabled = false
|
|
base_url = "https://api.x.ai/v1"
|
|
default_model = "grok-beta"
|
|
```
|
|
|
|
## Nginx Reverse Proxy Configuration
|
|
```nginx
|
|
server {
|
|
listen 80;
|
|
server_name llm-proxy.yourdomain.com;
|
|
|
|
location / {
|
|
proxy_pass http://localhost:8080;
|
|
proxy_http_version 1.1;
|
|
proxy_set_header Upgrade $http_upgrade;
|
|
proxy_set_header Connection 'upgrade';
|
|
proxy_set_header Host $host;
|
|
proxy_cache_bypass $http_upgrade;
|
|
|
|
# WebSocket support
|
|
proxy_set_header X-Real-IP $remote_addr;
|
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
|
proxy_set_header X-Forwarded-Proto $scheme;
|
|
}
|
|
|
|
# SSL configuration (recommended)
|
|
listen 443 ssl http2;
|
|
ssl_certificate /etc/letsencrypt/live/llm-proxy.yourdomain.com/fullchain.pem;
|
|
ssl_certificate_key /etc/letsencrypt/live/llm-proxy.yourdomain.com/privkey.pem;
|
|
}
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
### 1. Authentication
|
|
- Use strong Bearer tokens
|
|
- Rotate tokens regularly
|
|
- Consider implementing JWT for production
|
|
|
|
### 2. Rate Limiting
|
|
- Implement per-client rate limiting
|
|
- Consider using `governor` crate for advanced rate limiting
|
|
|
|
### 3. Network Security
|
|
- Run behind reverse proxy (nginx)
|
|
- Enable HTTPS
|
|
- Restrict access by IP if needed
|
|
- Use firewall rules
|
|
|
|
### 4. Data Security
|
|
- Database encryption (SQLCipher for SQLite)
|
|
- Secure API key storage
|
|
- Regular backups
|
|
|
|
## Monitoring & Maintenance
|
|
|
|
### Logging
|
|
- Application logs: `RUST_LOG=info` (or `debug` for troubleshooting)
|
|
- Access logs via nginx
|
|
- Database logs for audit trail
|
|
|
|
### Health Checks
|
|
```bash
|
|
# Health endpoint
|
|
curl http://localhost:8080/health
|
|
|
|
# Database check
|
|
sqlite3 ./data/llm_proxy.db "SELECT COUNT(*) FROM llm_requests;"
|
|
```
|
|
|
|
### Backup Strategy
|
|
```bash
|
|
#!/bin/bash
|
|
# backup.sh
|
|
BACKUP_DIR="/backups/llm-proxy"
|
|
DATE=$(date +%Y%m%d_%H%M%S)
|
|
|
|
# Backup database
|
|
sqlite3 ./data/llm_proxy.db ".backup $BACKUP_DIR/llm_proxy_$DATE.db"
|
|
|
|
# Backup configuration
|
|
cp config.toml $BACKUP_DIR/config_$DATE.toml
|
|
|
|
# Rotate old backups (keep 30 days)
|
|
find $BACKUP_DIR -name "*.db" -mtime +30 -delete
|
|
find $BACKUP_DIR -name "*.toml" -mtime +30 -delete
|
|
```
|
|
|
|
## Performance Tuning
|
|
|
|
### Database Optimization
|
|
```sql
|
|
-- Run these SQL commands periodically
|
|
VACUUM;
|
|
ANALYZE;
|
|
```
|
|
|
|
### Memory Management
|
|
- Monitor memory usage with `htop` or `ps aux`
|
|
- Adjust `max_connections` based on load
|
|
- Consider connection pooling for high traffic
|
|
|
|
### Scaling
|
|
1. **Vertical Scaling**: Increase container resources
|
|
2. **Horizontal Scaling**: Deploy multiple instances behind load balancer
|
|
3. **Database**: Migrate to PostgreSQL for high-volume usage
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
1. **Port already in use**
|
|
```bash
|
|
netstat -tulpn | grep :8080
|
|
kill <PID> # or change port in config
|
|
```
|
|
|
|
2. **Database permissions**
|
|
```bash
|
|
chown -R llmproxy:llmproxy /opt/llm-proxy/data
|
|
chmod 600 /opt/llm-proxy/data/llm_proxy.db
|
|
```
|
|
|
|
3. **API key errors**
|
|
- Verify environment variables are set
|
|
- Check provider status (dashboard)
|
|
- Test connectivity: `curl https://api.openai.com/v1/models`
|
|
|
|
4. **High memory usage**
|
|
- Check for memory leaks
|
|
- Reduce `max_connections`
|
|
- Implement connection timeouts
|
|
|
|
### Debug Mode
|
|
```bash
|
|
# Run with debug logging
|
|
RUST_LOG=debug ./llm-proxy
|
|
|
|
# Check system logs
|
|
journalctl -u llm-proxy -f
|
|
```
|
|
|
|
## Integration
|
|
|
|
### Open-WebUI Compatibility
|
|
The proxy provides OpenAI-compatible API, so configure Open-WebUI:
|
|
```
|
|
API Base URL: http://your-proxy-address:8080
|
|
API Key: sk-test-123 (or your configured token)
|
|
```
|
|
|
|
### Custom Clients
|
|
```python
|
|
import openai
|
|
|
|
client = openai.OpenAI(
|
|
base_url="http://localhost:8080/v1",
|
|
api_key="sk-test-123"
|
|
)
|
|
|
|
response = client.chat.completions.create(
|
|
model="gpt-4",
|
|
messages=[{"role": "user", "content": "Hello"}]
|
|
)
|
|
```
|
|
|
|
## Updates & Upgrades
|
|
|
|
1. **Backup** current configuration and database
|
|
2. **Stop** the service: `systemctl stop llm-proxy`
|
|
3. **Update** code: `git pull` or copy new binaries
|
|
4. **Migrate** database if needed (check migrations/)
|
|
5. **Restart**: `systemctl start llm-proxy`
|
|
6. **Verify**: Check logs and test endpoints
|
|
|
|
## Support
|
|
- Check logs in `/var/log/llm-proxy/`
|
|
- Monitor dashboard at `http://your-server:8080`
|
|
- Review database metrics in dashboard
|
|
- Enable debug logging for troubleshooting |