Files
GopherGate/deployment.md
hobokenchicken adbaa146fb
Some checks failed
CI / Check (push) Has been cancelled
CI / Clippy (push) Has been cancelled
CI / Formatting (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Release Build (push) Has been cancelled
docs: update README, deployment guide, and dashboard docs
2026-03-03 13:06:37 -05:00

322 lines
7.5 KiB
Markdown

# LLM Proxy Gateway - Deployment Guide
## Overview
A unified LLM proxy gateway supporting OpenAI, Google Gemini, DeepSeek, and xAI Grok with token tracking, cost calculation, and admin dashboard.
## System Requirements
- **CPU**: 2 cores minimum
- **RAM**: 512MB minimum (1GB recommended)
- **Storage**: 10GB minimum
- **OS**: Linux (tested on Arch Linux, Ubuntu, Debian)
- **Runtime**: Rust 1.70+ with Cargo
## Deployment Options
### Option 1: Docker (Recommended)
```dockerfile
FROM rust:1.70-alpine as builder
WORKDIR /app
COPY . .
RUN cargo build --release
FROM alpine:latest
RUN apk add --no-cache libgcc
COPY --from=builder /app/target/release/llm-proxy /usr/local/bin/
COPY --from=builder /app/static /app/static
WORKDIR /app
EXPOSE 8080
CMD ["llm-proxy"]
```
### Option 2: Systemd Service (Bare Metal/LXC)
```ini
# /etc/systemd/system/llm-proxy.service
[Unit]
Description=LLM Proxy Gateway
After=network.target
[Service]
Type=simple
User=llmproxy
Group=llmproxy
WorkingDirectory=/opt/llm-proxy
ExecStart=/opt/llm-proxy/llm-proxy
Restart=always
RestartSec=10
Environment="RUST_LOG=info"
Environment="LLM_PROXY__SERVER__PORT=8080"
Environment="LLM_PROXY__SERVER__AUTH_TOKENS=sk-test-123,sk-test-456"
[Install]
WantedBy=multi-user.target
```
### Option 3: LXC Container (Proxmox)
1. Create Alpine Linux LXC container
2. Install Rust: `apk add rust cargo`
3. Copy application files
4. Build: `cargo build --release`
5. Run: `./target/release/llm-proxy`
## Configuration
### Environment Variables
```bash
# Required API Keys
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=AIza...
DEEPSEEK_API_KEY=sk-...
GROK_API_KEY=gk-... # Optional
# Server Configuration (with LLM_PROXY__ prefix)
LLM_PROXY__SERVER__PORT=8080
LLM_PROXY__SERVER__HOST=0.0.0.0
LLM_PROXY__SERVER__AUTH_TOKENS=sk-test-123,sk-test-456
# Database Configuration
LLM_PROXY__DATABASE__PATH=./data/llm_proxy.db
LLM_PROXY__DATABASE__MAX_CONNECTIONS=10
# Provider Configuration
LLM_PROXY__PROVIDERS__OPENAI__ENABLED=true
LLM_PROXY__PROVIDERS__GEMINI__ENABLED=true
LLM_PROXY__PROVIDERS__DEEPSEEK__ENABLED=true
LLM_PROXY__PROVIDERS__GROK__ENABLED=false
```
### Configuration File (config.toml)
Create `config.toml` in the application directory:
```toml
[server]
port = 8080
host = "0.0.0.0"
auth_tokens = ["sk-test-123", "sk-test-456"]
[database]
path = "./data/llm_proxy.db"
max_connections = 10
[providers.openai]
enabled = true
base_url = "https://api.openai.com/v1"
default_model = "gpt-4o"
[providers.gemini]
enabled = true
base_url = "https://generativelanguage.googleapis.com/v1"
default_model = "gemini-2.0-flash"
[providers.deepseek]
enabled = true
base_url = "https://api.deepseek.com"
default_model = "deepseek-reasoner"
[providers.grok]
enabled = false
base_url = "https://api.x.ai/v1"
default_model = "grok-beta"
```
## Nginx Reverse Proxy Configuration
**Important for SSE/Streaming:** Disable buffering and configure timeouts for proper SSE support.
```nginx
server {
listen 80;
server_name llm-proxy.yourdomain.com;
location / {
proxy_pass http://localhost:8080;
proxy_http_version 1.1;
# SSE/Streaming support
proxy_buffering off;
chunked_transfer_encoding on;
proxy_set_header Connection '';
# Timeouts for long-running streams
proxy_connect_timeout 7200s;
proxy_read_timeout 7200s;
proxy_send_timeout 7200s;
# Disable gzip for streaming
gzip off;
# Headers
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# SSL configuration (recommended)
listen 443 ssl http2;
ssl_certificate /etc/letsencrypt/live/llm-proxy.yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/llm-proxy.yourdomain.com/privkey.pem;
}
```
### NGINX Proxy Manager
If using NGINX Proxy Manager, add this to **Advanced Settings**:
```nginx
proxy_buffering off;
proxy_http_version 1.1;
proxy_set_header Connection '';
chunked_transfer_encoding on;
proxy_connect_timeout 7200s;
proxy_read_timeout 7200s;
proxy_send_timeout 7200s;
gzip off;
```
## Security Considerations
### 1. Authentication
- Use strong Bearer tokens
- Rotate tokens regularly
- Consider implementing JWT for production
### 2. Rate Limiting
- Implement per-client rate limiting
- Consider using `governor` crate for advanced rate limiting
### 3. Network Security
- Run behind reverse proxy (nginx)
- Enable HTTPS
- Restrict access by IP if needed
- Use firewall rules
### 4. Data Security
- Database encryption (SQLCipher for SQLite)
- Secure API key storage
- Regular backups
## Monitoring & Maintenance
### Logging
- Application logs: `RUST_LOG=info` (or `debug` for troubleshooting)
- Access logs via nginx
- Database logs for audit trail
### Health Checks
```bash
# Health endpoint
curl http://localhost:8080/health
# Database check
sqlite3 ./data/llm_proxy.db "SELECT COUNT(*) FROM llm_requests;"
```
### Backup Strategy
```bash
#!/bin/bash
# backup.sh
BACKUP_DIR="/backups/llm-proxy"
DATE=$(date +%Y%m%d_%H%M%S)
# Backup database
sqlite3 ./data/llm_proxy.db ".backup $BACKUP_DIR/llm_proxy_$DATE.db"
# Backup configuration
cp config.toml $BACKUP_DIR/config_$DATE.toml
# Rotate old backups (keep 30 days)
find $BACKUP_DIR -name "*.db" -mtime +30 -delete
find $BACKUP_DIR -name "*.toml" -mtime +30 -delete
```
## Performance Tuning
### Database Optimization
```sql
-- Run these SQL commands periodically
VACUUM;
ANALYZE;
```
### Memory Management
- Monitor memory usage with `htop` or `ps aux`
- Adjust `max_connections` based on load
- Consider connection pooling for high traffic
### Scaling
1. **Vertical Scaling**: Increase container resources
2. **Horizontal Scaling**: Deploy multiple instances behind load balancer
3. **Database**: Migrate to PostgreSQL for high-volume usage
## Troubleshooting
### Common Issues
1. **Port already in use**
```bash
netstat -tulpn | grep :8080
kill <PID> # or change port in config
```
2. **Database permissions**
```bash
chown -R llmproxy:llmproxy /opt/llm-proxy/data
chmod 600 /opt/llm-proxy/data/llm_proxy.db
```
3. **API key errors**
- Verify environment variables are set
- Check provider status (dashboard)
- Test connectivity: `curl https://api.openai.com/v1/models`
4. **High memory usage**
- Check for memory leaks
- Reduce `max_connections`
- Implement connection timeouts
### Debug Mode
```bash
# Run with debug logging
RUST_LOG=debug ./llm-proxy
# Check system logs
journalctl -u llm-proxy -f
```
## Integration
### Open-WebUI Compatibility
The proxy provides OpenAI-compatible API, so configure Open-WebUI:
```
API Base URL: http://your-proxy-address:8080
API Key: sk-test-123 (or your configured token)
```
### Custom Clients
```python
import openai
client = openai.OpenAI(
base_url="http://localhost:8080/v1",
api_key="sk-test-123"
)
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
```
## Updates & Upgrades
1. **Backup** current configuration and database
2. **Stop** the service: `systemctl stop llm-proxy`
3. **Update** code: `git pull` or copy new binaries
4. **Migrate** database if needed (check migrations/)
5. **Restart**: `systemctl start llm-proxy`
6. **Verify**: Check logs and test endpoints
## Support
- Check logs in `/var/log/llm-proxy/`
- Monitor dashboard at `http://your-server:8080`
- Review database metrics in dashboard
- Enable debug logging for troubleshooting