# LLM Proxy Gateway - Deployment Guide ## Overview A unified LLM proxy gateway supporting OpenAI, Google Gemini, DeepSeek, and xAI Grok with token tracking, cost calculation, and admin dashboard. ## System Requirements - **CPU**: 2 cores minimum - **RAM**: 512MB minimum (1GB recommended) - **Storage**: 10GB minimum - **OS**: Linux (tested on Arch Linux, Ubuntu, Debian) - **Runtime**: Rust 1.70+ with Cargo ## Deployment Options ### Option 1: Docker (Recommended) ```dockerfile FROM rust:1.70-alpine as builder WORKDIR /app COPY . . RUN cargo build --release FROM alpine:latest RUN apk add --no-cache libgcc COPY --from=builder /app/target/release/llm-proxy /usr/local/bin/ COPY --from=builder /app/static /app/static WORKDIR /app EXPOSE 8080 CMD ["llm-proxy"] ``` ### Option 2: Systemd Service (Bare Metal/LXC) ```ini # /etc/systemd/system/llm-proxy.service [Unit] Description=LLM Proxy Gateway After=network.target [Service] Type=simple User=llmproxy Group=llmproxy WorkingDirectory=/opt/llm-proxy ExecStart=/opt/llm-proxy/llm-proxy Restart=always RestartSec=10 Environment="RUST_LOG=info" Environment="LLM_PROXY__SERVER__PORT=8080" Environment="LLM_PROXY__SERVER__AUTH_TOKENS=sk-test-123,sk-test-456" [Install] WantedBy=multi-user.target ``` ### Option 3: LXC Container (Proxmox) 1. Create Alpine Linux LXC container 2. Install Rust: `apk add rust cargo` 3. Copy application files 4. Build: `cargo build --release` 5. Run: `./target/release/llm-proxy` ## Configuration ### Environment Variables ```bash # Required API Keys OPENAI_API_KEY=sk-... GEMINI_API_KEY=AIza... DEEPSEEK_API_KEY=sk-... GROK_API_KEY=gk-... # Optional # Server Configuration (with LLM_PROXY__ prefix) LLM_PROXY__SERVER__PORT=8080 LLM_PROXY__SERVER__HOST=0.0.0.0 LLM_PROXY__SERVER__AUTH_TOKENS=sk-test-123,sk-test-456 # Database Configuration LLM_PROXY__DATABASE__PATH=./data/llm_proxy.db LLM_PROXY__DATABASE__MAX_CONNECTIONS=10 # Provider Configuration LLM_PROXY__PROVIDERS__OPENAI__ENABLED=true LLM_PROXY__PROVIDERS__GEMINI__ENABLED=true LLM_PROXY__PROVIDERS__DEEPSEEK__ENABLED=true LLM_PROXY__PROVIDERS__GROK__ENABLED=false ``` ### Configuration File (config.toml) Create `config.toml` in the application directory: ```toml [server] port = 8080 host = "0.0.0.0" auth_tokens = ["sk-test-123", "sk-test-456"] [database] path = "./data/llm_proxy.db" max_connections = 10 [providers.openai] enabled = true base_url = "https://api.openai.com/v1" default_model = "gpt-4o" [providers.gemini] enabled = true base_url = "https://generativelanguage.googleapis.com/v1" default_model = "gemini-2.0-flash" [providers.deepseek] enabled = true base_url = "https://api.deepseek.com" default_model = "deepseek-reasoner" [providers.grok] enabled = false base_url = "https://api.x.ai/v1" default_model = "grok-beta" ``` ## Nginx Reverse Proxy Configuration **Important for SSE/Streaming:** Disable buffering and configure timeouts for proper SSE support. ```nginx server { listen 80; server_name llm-proxy.yourdomain.com; location / { proxy_pass http://localhost:8080; proxy_http_version 1.1; # SSE/Streaming support proxy_buffering off; chunked_transfer_encoding on; proxy_set_header Connection ''; # Timeouts for long-running streams proxy_connect_timeout 7200s; proxy_read_timeout 7200s; proxy_send_timeout 7200s; # Disable gzip for streaming gzip off; # Headers proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } # SSL configuration (recommended) listen 443 ssl http2; ssl_certificate /etc/letsencrypt/live/llm-proxy.yourdomain.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/llm-proxy.yourdomain.com/privkey.pem; } ``` ### NGINX Proxy Manager If using NGINX Proxy Manager, add this to **Advanced Settings**: ```nginx proxy_buffering off; proxy_http_version 1.1; proxy_set_header Connection ''; chunked_transfer_encoding on; proxy_connect_timeout 7200s; proxy_read_timeout 7200s; proxy_send_timeout 7200s; gzip off; ``` ## Security Considerations ### 1. Authentication - Use strong Bearer tokens - Rotate tokens regularly - Consider implementing JWT for production ### 2. Rate Limiting - Implement per-client rate limiting - Consider using `governor` crate for advanced rate limiting ### 3. Network Security - Run behind reverse proxy (nginx) - Enable HTTPS - Restrict access by IP if needed - Use firewall rules ### 4. Data Security - Database encryption (SQLCipher for SQLite) - Secure API key storage - Regular backups ## Monitoring & Maintenance ### Logging - Application logs: `RUST_LOG=info` (or `debug` for troubleshooting) - Access logs via nginx - Database logs for audit trail ### Health Checks ```bash # Health endpoint curl http://localhost:8080/health # Database check sqlite3 ./data/llm_proxy.db "SELECT COUNT(*) FROM llm_requests;" ``` ### Backup Strategy ```bash #!/bin/bash # backup.sh BACKUP_DIR="/backups/llm-proxy" DATE=$(date +%Y%m%d_%H%M%S) # Backup database sqlite3 ./data/llm_proxy.db ".backup $BACKUP_DIR/llm_proxy_$DATE.db" # Backup configuration cp config.toml $BACKUP_DIR/config_$DATE.toml # Rotate old backups (keep 30 days) find $BACKUP_DIR -name "*.db" -mtime +30 -delete find $BACKUP_DIR -name "*.toml" -mtime +30 -delete ``` ## Performance Tuning ### Database Optimization ```sql -- Run these SQL commands periodically VACUUM; ANALYZE; ``` ### Memory Management - Monitor memory usage with `htop` or `ps aux` - Adjust `max_connections` based on load - Consider connection pooling for high traffic ### Scaling 1. **Vertical Scaling**: Increase container resources 2. **Horizontal Scaling**: Deploy multiple instances behind load balancer 3. **Database**: Migrate to PostgreSQL for high-volume usage ## Troubleshooting ### Common Issues 1. **Port already in use** ```bash netstat -tulpn | grep :8080 kill # or change port in config ``` 2. **Database permissions** ```bash chown -R llmproxy:llmproxy /opt/llm-proxy/data chmod 600 /opt/llm-proxy/data/llm_proxy.db ``` 3. **API key errors** - Verify environment variables are set - Check provider status (dashboard) - Test connectivity: `curl https://api.openai.com/v1/models` 4. **High memory usage** - Check for memory leaks - Reduce `max_connections` - Implement connection timeouts ### Debug Mode ```bash # Run with debug logging RUST_LOG=debug ./llm-proxy # Check system logs journalctl -u llm-proxy -f ``` ## Integration ### Open-WebUI Compatibility The proxy provides OpenAI-compatible API, so configure Open-WebUI: ``` API Base URL: http://your-proxy-address:8080 API Key: sk-test-123 (or your configured token) ``` ### Custom Clients ```python import openai client = openai.OpenAI( base_url="http://localhost:8080/v1", api_key="sk-test-123" ) response = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": "Hello"}] ) ``` ## Updates & Upgrades 1. **Backup** current configuration and database 2. **Stop** the service: `systemctl stop llm-proxy` 3. **Update** code: `git pull` or copy new binaries 4. **Migrate** database if needed (check migrations/) 5. **Restart**: `systemctl start llm-proxy` 6. **Verify**: Check logs and test endpoints ## Support - Check logs in `/var/log/llm-proxy/` - Monitor dashboard at `http://your-server:8080` - Review database metrics in dashboard - Enable debug logging for troubleshooting