Files
GopherGate/deployment.md
2026-02-26 11:51:36 -05:00

6.9 KiB

LLM Proxy Gateway - Deployment Guide

Overview

A unified LLM proxy gateway supporting OpenAI, Google Gemini, DeepSeek, and xAI Grok with token tracking, cost calculation, and admin dashboard.

System Requirements

  • CPU: 2 cores minimum
  • RAM: 512MB minimum (1GB recommended)
  • Storage: 10GB minimum
  • OS: Linux (tested on Arch Linux, Ubuntu, Debian)
  • Runtime: Rust 1.70+ with Cargo

Deployment Options

FROM rust:1.70-alpine as builder
WORKDIR /app
COPY . .
RUN cargo build --release

FROM alpine:latest
RUN apk add --no-cache libgcc
COPY --from=builder /app/target/release/llm-proxy /usr/local/bin/
COPY --from=builder /app/static /app/static
WORKDIR /app
EXPOSE 8080
CMD ["llm-proxy"]

Option 2: Systemd Service (Bare Metal/LXC)

# /etc/systemd/system/llm-proxy.service
[Unit]
Description=LLM Proxy Gateway
After=network.target

[Service]
Type=simple
User=llmproxy
Group=llmproxy
WorkingDirectory=/opt/llm-proxy
ExecStart=/opt/llm-proxy/llm-proxy
Restart=always
RestartSec=10
Environment="RUST_LOG=info"
Environment="LLM_PROXY__SERVER__PORT=8080"
Environment="LLM_PROXY__SERVER__AUTH_TOKENS=sk-test-123,sk-test-456"

[Install]
WantedBy=multi-user.target

Option 3: LXC Container (Proxmox)

  1. Create Alpine Linux LXC container
  2. Install Rust: apk add rust cargo
  3. Copy application files
  4. Build: cargo build --release
  5. Run: ./target/release/llm-proxy

Configuration

Environment Variables

# Required API Keys
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=AIza...
DEEPSEEK_API_KEY=sk-...
GROK_API_KEY=gk-...  # Optional

# Server Configuration (with LLM_PROXY__ prefix)
LLM_PROXY__SERVER__PORT=8080
LLM_PROXY__SERVER__HOST=0.0.0.0
LLM_PROXY__SERVER__AUTH_TOKENS=sk-test-123,sk-test-456

# Database Configuration
LLM_PROXY__DATABASE__PATH=./data/llm_proxy.db
LLM_PROXY__DATABASE__MAX_CONNECTIONS=10

# Provider Configuration
LLM_PROXY__PROVIDERS__OPENAI__ENABLED=true
LLM_PROXY__PROVIDERS__GEMINI__ENABLED=true
LLM_PROXY__PROVIDERS__DEEPSEEK__ENABLED=true
LLM_PROXY__PROVIDERS__GROK__ENABLED=false

Configuration File (config.toml)

Create config.toml in the application directory:

[server]
port = 8080
host = "0.0.0.0"
auth_tokens = ["sk-test-123", "sk-test-456"]

[database]
path = "./data/llm_proxy.db"
max_connections = 10

[providers.openai]
enabled = true
base_url = "https://api.openai.com/v1"
default_model = "gpt-4o"

[providers.gemini]
enabled = true
base_url = "https://generativelanguage.googleapis.com/v1"
default_model = "gemini-2.0-flash"

[providers.deepseek]
enabled = true
base_url = "https://api.deepseek.com"
default_model = "deepseek-reasoner"

[providers.grok]
enabled = false
base_url = "https://api.x.ai/v1"
default_model = "grok-beta"

Nginx Reverse Proxy Configuration

server {
    listen 80;
    server_name llm-proxy.yourdomain.com;
    
    location / {
        proxy_pass http://localhost:8080;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
        
        # WebSocket support
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
    
    # SSL configuration (recommended)
    listen 443 ssl http2;
    ssl_certificate /etc/letsencrypt/live/llm-proxy.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/llm-proxy.yourdomain.com/privkey.pem;
}

Security Considerations

1. Authentication

  • Use strong Bearer tokens
  • Rotate tokens regularly
  • Consider implementing JWT for production

2. Rate Limiting

  • Implement per-client rate limiting
  • Consider using governor crate for advanced rate limiting

3. Network Security

  • Run behind reverse proxy (nginx)
  • Enable HTTPS
  • Restrict access by IP if needed
  • Use firewall rules

4. Data Security

  • Database encryption (SQLCipher for SQLite)
  • Secure API key storage
  • Regular backups

Monitoring & Maintenance

Logging

  • Application logs: RUST_LOG=info (or debug for troubleshooting)
  • Access logs via nginx
  • Database logs for audit trail

Health Checks

# Health endpoint
curl http://localhost:8080/health

# Database check
sqlite3 ./data/llm_proxy.db "SELECT COUNT(*) FROM llm_requests;"

Backup Strategy

#!/bin/bash
# backup.sh
BACKUP_DIR="/backups/llm-proxy"
DATE=$(date +%Y%m%d_%H%M%S)

# Backup database
sqlite3 ./data/llm_proxy.db ".backup $BACKUP_DIR/llm_proxy_$DATE.db"

# Backup configuration
cp config.toml $BACKUP_DIR/config_$DATE.toml

# Rotate old backups (keep 30 days)
find $BACKUP_DIR -name "*.db" -mtime +30 -delete
find $BACKUP_DIR -name "*.toml" -mtime +30 -delete

Performance Tuning

Database Optimization

-- Run these SQL commands periodically
VACUUM;
ANALYZE;

Memory Management

  • Monitor memory usage with htop or ps aux
  • Adjust max_connections based on load
  • Consider connection pooling for high traffic

Scaling

  1. Vertical Scaling: Increase container resources
  2. Horizontal Scaling: Deploy multiple instances behind load balancer
  3. Database: Migrate to PostgreSQL for high-volume usage

Troubleshooting

Common Issues

  1. Port already in use

    netstat -tulpn | grep :8080
    kill <PID>  # or change port in config
    
  2. Database permissions

    chown -R llmproxy:llmproxy /opt/llm-proxy/data
    chmod 600 /opt/llm-proxy/data/llm_proxy.db
    
  3. API key errors

    • Verify environment variables are set
    • Check provider status (dashboard)
    • Test connectivity: curl https://api.openai.com/v1/models
  4. High memory usage

    • Check for memory leaks
    • Reduce max_connections
    • Implement connection timeouts

Debug Mode

# Run with debug logging
RUST_LOG=debug ./llm-proxy

# Check system logs
journalctl -u llm-proxy -f

Integration

Open-WebUI Compatibility

The proxy provides OpenAI-compatible API, so configure Open-WebUI:

API Base URL: http://your-proxy-address:8080
API Key: sk-test-123 (or your configured token)

Custom Clients

import openai

client = openai.OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="sk-test-123"
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

Updates & Upgrades

  1. Backup current configuration and database
  2. Stop the service: systemctl stop llm-proxy
  3. Update code: git pull or copy new binaries
  4. Migrate database if needed (check migrations/)
  5. Restart: systemctl start llm-proxy
  6. Verify: Check logs and test endpoints

Support

  • Check logs in /var/log/llm-proxy/
  • Monitor dashboard at http://your-server:8080
  • Review database metrics in dashboard
  • Enable debug logging for troubleshooting