6.9 KiB
6.9 KiB
LLM Proxy Gateway - Deployment Guide
Overview
A unified LLM proxy gateway supporting OpenAI, Google Gemini, DeepSeek, and xAI Grok with token tracking, cost calculation, and admin dashboard.
System Requirements
- CPU: 2 cores minimum
- RAM: 512MB minimum (1GB recommended)
- Storage: 10GB minimum
- OS: Linux (tested on Arch Linux, Ubuntu, Debian)
- Runtime: Rust 1.70+ with Cargo
Deployment Options
Option 1: Docker (Recommended)
FROM rust:1.70-alpine as builder
WORKDIR /app
COPY . .
RUN cargo build --release
FROM alpine:latest
RUN apk add --no-cache libgcc
COPY --from=builder /app/target/release/llm-proxy /usr/local/bin/
COPY --from=builder /app/static /app/static
WORKDIR /app
EXPOSE 8080
CMD ["llm-proxy"]
Option 2: Systemd Service (Bare Metal/LXC)
# /etc/systemd/system/llm-proxy.service
[Unit]
Description=LLM Proxy Gateway
After=network.target
[Service]
Type=simple
User=llmproxy
Group=llmproxy
WorkingDirectory=/opt/llm-proxy
ExecStart=/opt/llm-proxy/llm-proxy
Restart=always
RestartSec=10
Environment="RUST_LOG=info"
Environment="LLM_PROXY__SERVER__PORT=8080"
Environment="LLM_PROXY__SERVER__AUTH_TOKENS=sk-test-123,sk-test-456"
[Install]
WantedBy=multi-user.target
Option 3: LXC Container (Proxmox)
- Create Alpine Linux LXC container
- Install Rust:
apk add rust cargo - Copy application files
- Build:
cargo build --release - Run:
./target/release/llm-proxy
Configuration
Environment Variables
# Required API Keys
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=AIza...
DEEPSEEK_API_KEY=sk-...
GROK_API_KEY=gk-... # Optional
# Server Configuration (with LLM_PROXY__ prefix)
LLM_PROXY__SERVER__PORT=8080
LLM_PROXY__SERVER__HOST=0.0.0.0
LLM_PROXY__SERVER__AUTH_TOKENS=sk-test-123,sk-test-456
# Database Configuration
LLM_PROXY__DATABASE__PATH=./data/llm_proxy.db
LLM_PROXY__DATABASE__MAX_CONNECTIONS=10
# Provider Configuration
LLM_PROXY__PROVIDERS__OPENAI__ENABLED=true
LLM_PROXY__PROVIDERS__GEMINI__ENABLED=true
LLM_PROXY__PROVIDERS__DEEPSEEK__ENABLED=true
LLM_PROXY__PROVIDERS__GROK__ENABLED=false
Configuration File (config.toml)
Create config.toml in the application directory:
[server]
port = 8080
host = "0.0.0.0"
auth_tokens = ["sk-test-123", "sk-test-456"]
[database]
path = "./data/llm_proxy.db"
max_connections = 10
[providers.openai]
enabled = true
base_url = "https://api.openai.com/v1"
default_model = "gpt-4o"
[providers.gemini]
enabled = true
base_url = "https://generativelanguage.googleapis.com/v1"
default_model = "gemini-2.0-flash"
[providers.deepseek]
enabled = true
base_url = "https://api.deepseek.com"
default_model = "deepseek-reasoner"
[providers.grok]
enabled = false
base_url = "https://api.x.ai/v1"
default_model = "grok-beta"
Nginx Reverse Proxy Configuration
server {
listen 80;
server_name llm-proxy.yourdomain.com;
location / {
proxy_pass http://localhost:8080;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
# WebSocket support
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# SSL configuration (recommended)
listen 443 ssl http2;
ssl_certificate /etc/letsencrypt/live/llm-proxy.yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/llm-proxy.yourdomain.com/privkey.pem;
}
Security Considerations
1. Authentication
- Use strong Bearer tokens
- Rotate tokens regularly
- Consider implementing JWT for production
2. Rate Limiting
- Implement per-client rate limiting
- Consider using
governorcrate for advanced rate limiting
3. Network Security
- Run behind reverse proxy (nginx)
- Enable HTTPS
- Restrict access by IP if needed
- Use firewall rules
4. Data Security
- Database encryption (SQLCipher for SQLite)
- Secure API key storage
- Regular backups
Monitoring & Maintenance
Logging
- Application logs:
RUST_LOG=info(ordebugfor troubleshooting) - Access logs via nginx
- Database logs for audit trail
Health Checks
# Health endpoint
curl http://localhost:8080/health
# Database check
sqlite3 ./data/llm_proxy.db "SELECT COUNT(*) FROM llm_requests;"
Backup Strategy
#!/bin/bash
# backup.sh
BACKUP_DIR="/backups/llm-proxy"
DATE=$(date +%Y%m%d_%H%M%S)
# Backup database
sqlite3 ./data/llm_proxy.db ".backup $BACKUP_DIR/llm_proxy_$DATE.db"
# Backup configuration
cp config.toml $BACKUP_DIR/config_$DATE.toml
# Rotate old backups (keep 30 days)
find $BACKUP_DIR -name "*.db" -mtime +30 -delete
find $BACKUP_DIR -name "*.toml" -mtime +30 -delete
Performance Tuning
Database Optimization
-- Run these SQL commands periodically
VACUUM;
ANALYZE;
Memory Management
- Monitor memory usage with
htoporps aux - Adjust
max_connectionsbased on load - Consider connection pooling for high traffic
Scaling
- Vertical Scaling: Increase container resources
- Horizontal Scaling: Deploy multiple instances behind load balancer
- Database: Migrate to PostgreSQL for high-volume usage
Troubleshooting
Common Issues
-
Port already in use
netstat -tulpn | grep :8080 kill <PID> # or change port in config -
Database permissions
chown -R llmproxy:llmproxy /opt/llm-proxy/data chmod 600 /opt/llm-proxy/data/llm_proxy.db -
API key errors
- Verify environment variables are set
- Check provider status (dashboard)
- Test connectivity:
curl https://api.openai.com/v1/models
-
High memory usage
- Check for memory leaks
- Reduce
max_connections - Implement connection timeouts
Debug Mode
# Run with debug logging
RUST_LOG=debug ./llm-proxy
# Check system logs
journalctl -u llm-proxy -f
Integration
Open-WebUI Compatibility
The proxy provides OpenAI-compatible API, so configure Open-WebUI:
API Base URL: http://your-proxy-address:8080
API Key: sk-test-123 (or your configured token)
Custom Clients
import openai
client = openai.OpenAI(
base_url="http://localhost:8080/v1",
api_key="sk-test-123"
)
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
Updates & Upgrades
- Backup current configuration and database
- Stop the service:
systemctl stop llm-proxy - Update code:
git pullor copy new binaries - Migrate database if needed (check migrations/)
- Restart:
systemctl start llm-proxy - Verify: Check logs and test endpoints
Support
- Check logs in
/var/log/llm-proxy/ - Monitor dashboard at
http://your-server:8080 - Review database metrics in dashboard
- Enable debug logging for troubleshooting