Files
GopherGate/OPTIMIZATION.md

6.1 KiB

Optimization for 512MB RAM Environment

This document provides guidance for optimizing the LLM Proxy Gateway for deployment in resource-constrained environments (512MB RAM).

Memory Optimization Strategies

1. Build Optimization

The project is already configured with optimized build settings in Cargo.toml:

[profile.release]
opt-level = 3      # Maximum optimization
lto = true         # Link-time optimization
codegen-units = 1  # Single codegen unit for better optimization
strip = true       # Strip debug symbols

Additional optimizations you can apply:

# Build with specific target for better optimization
cargo build --release --target x86_64-unknown-linux-musl

# Or for ARM (Raspberry Pi, etc.)
cargo build --release --target aarch64-unknown-linux-musl

2. Runtime Memory Management

Database Connection Pool

  • Default: 10 connections
  • Recommended for 512MB: 5 connections

Update config.toml:

[database]
max_connections = 5

Rate Limiting Memory Usage

  • Client rate limit buckets: Store in memory
  • Circuit breakers: Minimal memory usage
  • Consider reducing burst capacity if memory is critical

Provider Management

  • Only enable providers you actually use
  • Disable unused providers in configuration

3. Configuration for Low Memory

Create a config-low-memory.toml:

[server]
port = 8080
host = "0.0.0.0"

[database]
path = "./data/llm_proxy.db"
max_connections = 3  # Reduced from default 10

[providers]
# Only enable providers you need
openai.enabled = true
gemini.enabled = false  # Disable if not used
deepseek.enabled = false  # Disable if not used
grok.enabled = false  # Disable if not used

[rate_limiting]
# Reduce memory usage for rate limiting
client_requests_per_minute = 30  # Reduced from 60
client_burst_size = 5  # Reduced from 10
global_requests_per_minute = 300  # Reduced from 600

4. System-Level Optimizations

Linux Kernel Parameters

Add to /etc/sysctl.conf:

# Reduce TCP buffer sizes
net.ipv4.tcp_rmem = 4096 87380 174760
net.ipv4.tcp_wmem = 4096 65536 131072

# Reduce connection tracking
net.netfilter.nf_conntrack_max = 65536
net.netfilter.nf_conntrack_tcp_timeout_established = 1200

# Reduce socket buffer sizes
net.core.rmem_max = 131072
net.core.wmem_max = 131072
net.core.rmem_default = 65536
net.core.wmem_default = 65536

Systemd Service Configuration

Create /etc/systemd/system/llm-proxy.service:

[Unit]
Description=LLM Proxy Gateway
After=network.target

[Service]
Type=simple
User=llmproxy
Group=llmproxy
WorkingDirectory=/opt/llm-proxy
ExecStart=/opt/llm-proxy/llm-proxy
Restart=on-failure
RestartSec=5

# Memory limits
MemoryMax=400M
MemorySwapMax=100M

# CPU limits
CPUQuota=50%

# Process limits
LimitNOFILE=65536
LimitNPROC=512

Environment="RUST_LOG=info"
Environment="LLM_PROXY__DATABASE__MAX_CONNECTIONS=3"

[Install]
WantedBy=multi-user.target

5. Application-Specific Optimizations

Disable Unused Features

  • Multimodal support: If not using images, disable image processing dependencies
  • Dashboard: The dashboard uses WebSockets and additional memory. Consider disabling if not needed.
  • Detailed logging: Reduce log verbosity in production

Memory Pool Sizes

The application uses several memory pools:

  1. Database connection pool: Configured via max_connections
  2. HTTP client pool: Reqwest client pool (defaults to reasonable values)
  3. Async runtime: Tokio worker threads

Reduce Tokio worker threads for low-core systems:

// In main.rs, modify tokio runtime creation
#[tokio::main(flavor = "current_thread")]  // Single-threaded runtime
async fn main() -> Result<()> {
    // Or for multi-threaded with limited threads:
    // #[tokio::main(worker_threads = 2)]

6. Monitoring and Profiling

Memory Usage Monitoring

# Install heaptrack for memory profiling
cargo install heaptrack

# Profile memory usage
heaptrack ./target/release/llm-proxy

# Monitor with ps
ps aux --sort=-%mem | head -10

# Monitor with top
top -p $(pgrep llm-proxy)

Performance Benchmarks

Test with different configurations:

# Test with 100 concurrent connections
wrk -t4 -c100 -d30s http://localhost:8080/health

# Test chat completion endpoint
ab -n 1000 -c 10 -p test_request.json -T application/json http://localhost:8080/v1/chat/completions

7. Deployment Checklist for 512MB RAM

  • Build with release profile: cargo build --release
  • Configure database with max_connections = 3
  • Disable unused providers in configuration
  • Set appropriate rate limiting limits
  • Configure systemd with memory limits
  • Set up log rotation to prevent disk space issues
  • Monitor memory usage during initial deployment
  • Consider using swap space (512MB-1GB) for safety

8. Troubleshooting High Memory Usage

Common Issues and Solutions:

  1. Database connection leaks: Ensure connections are properly closed
  2. Memory fragmentation: Use jemalloc or mimalloc as allocator
  3. Unbounded queues: Check WebSocket message queues
  4. Cache growth: Implement cache limits or TTL

Add to Cargo.toml for alternative allocator:

[dependencies]
mimalloc = { version = "0.1", default-features = false }

[features]
default = ["mimalloc"]

In main.rs:

#[global_allocator]
static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;

9. Expected Memory Usage

Component Baseline With 10 clients With 100 clients
Base executable 15MB 15MB 15MB
Database connections 5MB 8MB 15MB
Rate limiting 2MB 5MB 20MB
HTTP clients 3MB 5MB 10MB
Total 25MB 33MB 60MB

Note: These are estimates. Actual usage depends on request volume, payload sizes, and configuration.

10. Further Reading