Files

2026-02-26 11:51:36 -05:00

6.9 KiB

Raw Blame History

LLM Proxy Gateway - Deployment Guide

Overview

A unified LLM proxy gateway supporting OpenAI, Google Gemini, DeepSeek, and xAI Grok with token tracking, cost calculation, and admin dashboard.

System Requirements

CPU: 2 cores minimum
RAM: 512MB minimum (1GB recommended)
Storage: 10GB minimum
OS: Linux (tested on Arch Linux, Ubuntu, Debian)
Runtime: Rust 1.70+ with Cargo

Deployment Options

Option 1: Docker (Recommended)

FROM rust:1.70-alpine as builder
WORKDIR /app
COPY . .
RUN cargo build --release

FROM alpine:latest
RUN apk add --no-cache libgcc
COPY --from=builder /app/target/release/llm-proxy /usr/local/bin/
COPY --from=builder /app/static /app/static
WORKDIR /app
EXPOSE 8080
CMD ["llm-proxy"]

Option 2: Systemd Service (Bare Metal/LXC)

# /etc/systemd/system/llm-proxy.service
[Unit]
Description=LLM Proxy Gateway
After=network.target

[Service]
Type=simple
User=llmproxy
Group=llmproxy
WorkingDirectory=/opt/llm-proxy
ExecStart=/opt/llm-proxy/llm-proxy
Restart=always
RestartSec=10
Environment="RUST_LOG=info"
Environment="LLM_PROXY__SERVER__PORT=8080"
Environment="LLM_PROXY__SERVER__AUTH_TOKENS=sk-test-123,sk-test-456"

[Install]
WantedBy=multi-user.target

Option 3: LXC Container (Proxmox)

Create Alpine Linux LXC container
Install Rust: apk add rust cargo
Copy application files
Build: cargo build --release
Run: ./target/release/llm-proxy

Configuration

Environment Variables

# Required API Keys
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=AIza...
DEEPSEEK_API_KEY=sk-...
GROK_API_KEY=gk-...  # Optional

# Server Configuration (with LLM_PROXY__ prefix)
LLM_PROXY__SERVER__PORT=8080
LLM_PROXY__SERVER__HOST=0.0.0.0
LLM_PROXY__SERVER__AUTH_TOKENS=sk-test-123,sk-test-456

# Database Configuration
LLM_PROXY__DATABASE__PATH=./data/llm_proxy.db
LLM_PROXY__DATABASE__MAX_CONNECTIONS=10

# Provider Configuration
LLM_PROXY__PROVIDERS__OPENAI__ENABLED=true
LLM_PROXY__PROVIDERS__GEMINI__ENABLED=true
LLM_PROXY__PROVIDERS__DEEPSEEK__ENABLED=true
LLM_PROXY__PROVIDERS__GROK__ENABLED=false

Configuration File (config.toml)

Create config.toml in the application directory:

[server]
port = 8080
host = "0.0.0.0"
auth_tokens = ["sk-test-123", "sk-test-456"]

[database]
path = "./data/llm_proxy.db"
max_connections = 10

[providers.openai]
enabled = true
base_url = "https://api.openai.com/v1"
default_model = "gpt-4o"

[providers.gemini]
enabled = true
base_url = "https://generativelanguage.googleapis.com/v1"
default_model = "gemini-2.0-flash"

[providers.deepseek]
enabled = true
base_url = "https://api.deepseek.com"
default_model = "deepseek-reasoner"

[providers.grok]
enabled = false
base_url = "https://api.x.ai/v1"
default_model = "grok-beta"

Nginx Reverse Proxy Configuration

server {
    listen 80;
    server_name llm-proxy.yourdomain.com;
    
    location / {
        proxy_pass http://localhost:8080;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
        
        # WebSocket support
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
    
    # SSL configuration (recommended)
    listen 443 ssl http2;
    ssl_certificate /etc/letsencrypt/live/llm-proxy.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/llm-proxy.yourdomain.com/privkey.pem;
}

Security Considerations

1. Authentication

Use strong Bearer tokens
Rotate tokens regularly
Consider implementing JWT for production

2. Rate Limiting

Implement per-client rate limiting
Consider using governor crate for advanced rate limiting

3. Network Security

Run behind reverse proxy (nginx)
Enable HTTPS
Restrict access by IP if needed
Use firewall rules

4. Data Security

Database encryption (SQLCipher for SQLite)
Secure API key storage
Regular backups

Monitoring & Maintenance

Logging

Application logs: RUST_LOG=info (or debug for troubleshooting)
Access logs via nginx
Database logs for audit trail

Health Checks

# Health endpoint
curl http://localhost:8080/health

# Database check
sqlite3 ./data/llm_proxy.db "SELECT COUNT(*) FROM llm_requests;"

Backup Strategy

#!/bin/bash
# backup.sh
BACKUP_DIR="/backups/llm-proxy"
DATE=$(date +%Y%m%d_%H%M%S)

# Backup database
sqlite3 ./data/llm_proxy.db ".backup $BACKUP_DIR/llm_proxy_$DATE.db"

# Backup configuration
cp config.toml $BACKUP_DIR/config_$DATE.toml

# Rotate old backups (keep 30 days)
find $BACKUP_DIR -name "*.db" -mtime +30 -delete
find $BACKUP_DIR -name "*.toml" -mtime +30 -delete

Performance Tuning

Database Optimization

-- Run these SQL commands periodically
VACUUM;
ANALYZE;

Memory Management

Monitor memory usage with htop or ps aux
Adjust max_connections based on load
Consider connection pooling for high traffic

Scaling

Vertical Scaling: Increase container resources
Horizontal Scaling: Deploy multiple instances behind load balancer
Database: Migrate to PostgreSQL for high-volume usage

Troubleshooting

Common Issues

Port already in use

netstat -tulpn | grep :8080
kill <PID>  # or change port in config

Database permissions

chown -R llmproxy:llmproxy /opt/llm-proxy/data
chmod 600 /opt/llm-proxy/data/llm_proxy.db

API key errors
- Verify environment variables are set
- Check provider status (dashboard)
- Test connectivity: curl https://api.openai.com/v1/models
High memory usage
- Check for memory leaks
- Reduce max_connections
- Implement connection timeouts

Debug Mode

# Run with debug logging
RUST_LOG=debug ./llm-proxy

# Check system logs
journalctl -u llm-proxy -f

Integration

Open-WebUI Compatibility

The proxy provides OpenAI-compatible API, so configure Open-WebUI:

API Base URL: http://your-proxy-address:8080
API Key: sk-test-123 (or your configured token)

Custom Clients

import openai

client = openai.OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="sk-test-123"
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

Updates & Upgrades

Backup current configuration and database
Stop the service: systemctl stop llm-proxy
Update code: git pull or copy new binaries
Migrate database if needed (check migrations/)
Restart: systemctl start llm-proxy
Verify: Check logs and test endpoints

Support

Check logs in /var/log/llm-proxy/
Monitor dashboard at http://your-server:8080
Review database metrics in dashboard
Enable debug logging for troubleshooting

6.9 KiB Raw Blame History