GopherGate/deployment.md

# LLM Proxy Gateway - Deployment Guide

## Overview
A unified LLM proxy gateway supporting OpenAI, Google Gemini, DeepSeek, and xAI Grok with token tracking, cost calculation, and admin dashboard.

## System Requirements
- **CPU**: 2 cores minimum
- **RAM**: 512MB minimum (1GB recommended)
- **Storage**: 10GB minimum
- **OS**: Linux (tested on Arch Linux, Ubuntu, Debian)
- **Runtime**: Rust 1.70+ with Cargo

## Deployment Options

### Option 1: Docker (Recommended)
```dockerfile
FROM rust:1.70-alpine as builder
WORKDIR /app
COPY . .
RUN cargo build --release

FROM alpine:latest
RUN apk add --no-cache libgcc
COPY --from=builder /app/target/release/llm-proxy /usr/local/bin/
COPY --from=builder /app/static /app/static
WORKDIR /app
EXPOSE 8080
CMD ["llm-proxy"]
```

### Option 2: Systemd Service (Bare Metal/LXC)
```ini
# /etc/systemd/system/llm-proxy.service
[Unit]
Description=LLM Proxy Gateway
After=network.target

[Service]
Type=simple
User=llmproxy
Group=llmproxy
WorkingDirectory=/opt/llm-proxy
ExecStart=/opt/llm-proxy/llm-proxy
Restart=always
RestartSec=10
Environment="RUST_LOG=info"
Environment="LLM_PROXY__SERVER__PORT=8080"
Environment="LLM_PROXY__SERVER__AUTH_TOKENS=sk-test-123,sk-test-456"

[Install]
WantedBy=multi-user.target
```

### Option 3: LXC Container (Proxmox)
1. Create Alpine Linux LXC container
2. Install Rust: `apk add rust cargo`
3. Copy application files
4. Build: `cargo build --release`
5. Run: `./target/release/llm-proxy`

## Configuration

### Environment Variables
```bash
# Required API Keys
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=AIza...
DEEPSEEK_API_KEY=sk-...
GROK_API_KEY=gk-...  # Optional

# Server Configuration (with LLM_PROXY__ prefix)
LLM_PROXY__SERVER__PORT=8080
LLM_PROXY__SERVER__HOST=0.0.0.0
LLM_PROXY__SERVER__AUTH_TOKENS=sk-test-123,sk-test-456

# Database Configuration
LLM_PROXY__DATABASE__PATH=./data/llm_proxy.db
LLM_PROXY__DATABASE__MAX_CONNECTIONS=10

# Provider Configuration
LLM_PROXY__PROVIDERS__OPENAI__ENABLED=true
LLM_PROXY__PROVIDERS__GEMINI__ENABLED=true
LLM_PROXY__PROVIDERS__DEEPSEEK__ENABLED=true
LLM_PROXY__PROVIDERS__GROK__ENABLED=false
```

### Configuration File (config.toml)
Create `config.toml` in the application directory:
```toml
[server]
port = 8080
host = "0.0.0.0"
auth_tokens = ["sk-test-123", "sk-test-456"]

[database]
path = "./data/llm_proxy.db"
max_connections = 10

[providers.openai]
enabled = true
base_url = "https://api.openai.com/v1"
default_model = "gpt-4o"

[providers.gemini]
enabled = true
base_url = "https://generativelanguage.googleapis.com/v1"
default_model = "gemini-2.0-flash"

[providers.deepseek]
enabled = true
base_url = "https://api.deepseek.com"
default_model = "deepseek-reasoner"

[providers.grok]
enabled = false
base_url = "https://api.x.ai/v1"
default_model = "grok-beta"
```

## Nginx Reverse Proxy Configuration

**Important for SSE/Streaming:** Disable buffering and configure timeouts for proper SSE support.

```nginx
server {
    listen 80;
    server_name llm-proxy.yourdomain.com;

    location / {
        proxy_pass http://localhost:8080;
        proxy_http_version 1.1;

        # SSE/Streaming support
        proxy_buffering off;
        chunked_transfer_encoding on;
        proxy_set_header Connection '';

        # Timeouts for long-running streams
        proxy_connect_timeout 7200s;
        proxy_read_timeout 7200s;
        proxy_send_timeout 7200s;

        # Disable gzip for streaming
        gzip off;

        # Headers
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    # SSL configuration (recommended)
    listen 443 ssl http2;
    ssl_certificate /etc/letsencrypt/live/llm-proxy.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/llm-proxy.yourdomain.com/privkey.pem;
}
```

### NGINX Proxy Manager

If using NGINX Proxy Manager, add this to **Advanced Settings**:

```nginx
proxy_buffering off;
proxy_http_version 1.1;
proxy_set_header Connection '';
chunked_transfer_encoding on;
proxy_connect_timeout 7200s;
proxy_read_timeout 7200s;
proxy_send_timeout 7200s;
gzip off;
```

## Security Considerations

### 1. Authentication
- Use strong Bearer tokens
- Rotate tokens regularly
- Consider implementing JWT for production

### 2. Rate Limiting
- Implement per-client rate limiting
- Consider using `governor` crate for advanced rate limiting

### 3. Network Security
- Run behind reverse proxy (nginx)
- Enable HTTPS
- Restrict access by IP if needed
- Use firewall rules

### 4. Data Security
- Database encryption (SQLCipher for SQLite)
- Secure API key storage
- Regular backups

## Monitoring & Maintenance

### Logging
- Application logs: `RUST_LOG=info` (or `debug` for troubleshooting)
- Access logs via nginx
- Database logs for audit trail

### Health Checks
```bash
# Health endpoint
curl http://localhost:8080/health

# Database check
sqlite3 ./data/llm_proxy.db "SELECT COUNT(*) FROM llm_requests;"
```

### Backup Strategy
```bash
#!/bin/bash
# backup.sh
BACKUP_DIR="/backups/llm-proxy"
DATE=$(date +%Y%m%d_%H%M%S)

# Backup database
sqlite3 ./data/llm_proxy.db ".backup $BACKUP_DIR/llm_proxy_$DATE.db"

# Backup configuration
cp config.toml $BACKUP_DIR/config_$DATE.toml

# Rotate old backups (keep 30 days)
find $BACKUP_DIR -name "*.db" -mtime +30 -delete
find $BACKUP_DIR -name "*.toml" -mtime +30 -delete
```

## Performance Tuning

### Database Optimization
```sql
-- Run these SQL commands periodically
VACUUM;
ANALYZE;
```

### Memory Management
- Monitor memory usage with `htop` or `ps aux`
- Adjust `max_connections` based on load
- Consider connection pooling for high traffic

### Scaling
1. **Vertical Scaling**: Increase container resources
2. **Horizontal Scaling**: Deploy multiple instances behind load balancer
3. **Database**: Migrate to PostgreSQL for high-volume usage

## Troubleshooting

### Common Issues

1. **Port already in use**
   ```bash
   netstat -tulpn | grep :8080
   kill <PID>  # or change port in config
   ```

2. **Database permissions**
   ```bash
   chown -R llmproxy:llmproxy /opt/llm-proxy/data
   chmod 600 /opt/llm-proxy/data/llm_proxy.db
   ```

3. **API key errors**
   - Verify environment variables are set
   - Check provider status (dashboard)
   - Test connectivity: `curl https://api.openai.com/v1/models`

4. **High memory usage**
   - Check for memory leaks
   - Reduce `max_connections`
   - Implement connection timeouts

### Debug Mode
```bash
# Run with debug logging
RUST_LOG=debug ./llm-proxy

# Check system logs
journalctl -u llm-proxy -f
```

## Integration

### Open-WebUI Compatibility
The proxy provides OpenAI-compatible API, so configure Open-WebUI:
```
API Base URL: http://your-proxy-address:8080
API Key: sk-test-123 (or your configured token)
```

### Custom Clients
```python
import openai

client = openai.OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="sk-test-123"
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)
```

## Updates & Upgrades

1. **Backup** current configuration and database
2. **Stop** the service: `systemctl stop llm-proxy`
3. **Update** code: `git pull` or copy new binaries
4. **Migrate** database if needed (check migrations/)
5. **Restart**: `systemctl start llm-proxy`
6. **Verify**: Check logs and test endpoints

## Support
- Check logs in `/var/log/llm-proxy/`
- Monitor dashboard at `http://your-server:8080`
- Review database metrics in dashboard
- Enable debug logging for troubleshooting