6.1 KiB
6.1 KiB
Optimization for 512MB RAM Environment
This document provides guidance for optimizing the LLM Proxy Gateway for deployment in resource-constrained environments (512MB RAM).
Memory Optimization Strategies
1. Build Optimization
The project is already configured with optimized build settings in Cargo.toml:
[profile.release]
opt-level = 3 # Maximum optimization
lto = true # Link-time optimization
codegen-units = 1 # Single codegen unit for better optimization
strip = true # Strip debug symbols
Additional optimizations you can apply:
# Build with specific target for better optimization
cargo build --release --target x86_64-unknown-linux-musl
# Or for ARM (Raspberry Pi, etc.)
cargo build --release --target aarch64-unknown-linux-musl
2. Runtime Memory Management
Database Connection Pool
- Default: 10 connections
- Recommended for 512MB: 5 connections
Update config.toml:
[database]
max_connections = 5
Rate Limiting Memory Usage
- Client rate limit buckets: Store in memory
- Circuit breakers: Minimal memory usage
- Consider reducing burst capacity if memory is critical
Provider Management
- Only enable providers you actually use
- Disable unused providers in configuration
3. Configuration for Low Memory
Create a config-low-memory.toml:
[server]
port = 8080
host = "0.0.0.0"
[database]
path = "./data/llm_proxy.db"
max_connections = 3 # Reduced from default 10
[providers]
# Only enable providers you need
openai.enabled = true
gemini.enabled = false # Disable if not used
deepseek.enabled = false # Disable if not used
grok.enabled = false # Disable if not used
[rate_limiting]
# Reduce memory usage for rate limiting
client_requests_per_minute = 30 # Reduced from 60
client_burst_size = 5 # Reduced from 10
global_requests_per_minute = 300 # Reduced from 600
4. System-Level Optimizations
Linux Kernel Parameters
Add to /etc/sysctl.conf:
# Reduce TCP buffer sizes
net.ipv4.tcp_rmem = 4096 87380 174760
net.ipv4.tcp_wmem = 4096 65536 131072
# Reduce connection tracking
net.netfilter.nf_conntrack_max = 65536
net.netfilter.nf_conntrack_tcp_timeout_established = 1200
# Reduce socket buffer sizes
net.core.rmem_max = 131072
net.core.wmem_max = 131072
net.core.rmem_default = 65536
net.core.wmem_default = 65536
Systemd Service Configuration
Create /etc/systemd/system/llm-proxy.service:
[Unit]
Description=LLM Proxy Gateway
After=network.target
[Service]
Type=simple
User=llmproxy
Group=llmproxy
WorkingDirectory=/opt/llm-proxy
ExecStart=/opt/llm-proxy/llm-proxy
Restart=on-failure
RestartSec=5
# Memory limits
MemoryMax=400M
MemorySwapMax=100M
# CPU limits
CPUQuota=50%
# Process limits
LimitNOFILE=65536
LimitNPROC=512
Environment="RUST_LOG=info"
Environment="LLM_PROXY__DATABASE__MAX_CONNECTIONS=3"
[Install]
WantedBy=multi-user.target
5. Application-Specific Optimizations
Disable Unused Features
- Multimodal support: If not using images, disable image processing dependencies
- Dashboard: The dashboard uses WebSockets and additional memory. Consider disabling if not needed.
- Detailed logging: Reduce log verbosity in production
Memory Pool Sizes
The application uses several memory pools:
- Database connection pool: Configured via
max_connections - HTTP client pool: Reqwest client pool (defaults to reasonable values)
- Async runtime: Tokio worker threads
Reduce Tokio worker threads for low-core systems:
// In main.rs, modify tokio runtime creation
#[tokio::main(flavor = "current_thread")] // Single-threaded runtime
async fn main() -> Result<()> {
// Or for multi-threaded with limited threads:
// #[tokio::main(worker_threads = 2)]
6. Monitoring and Profiling
Memory Usage Monitoring
# Install heaptrack for memory profiling
cargo install heaptrack
# Profile memory usage
heaptrack ./target/release/llm-proxy
# Monitor with ps
ps aux --sort=-%mem | head -10
# Monitor with top
top -p $(pgrep llm-proxy)
Performance Benchmarks
Test with different configurations:
# Test with 100 concurrent connections
wrk -t4 -c100 -d30s http://localhost:8080/health
# Test chat completion endpoint
ab -n 1000 -c 10 -p test_request.json -T application/json http://localhost:8080/v1/chat/completions
7. Deployment Checklist for 512MB RAM
- Build with release profile:
cargo build --release - Configure database with
max_connections = 3 - Disable unused providers in configuration
- Set appropriate rate limiting limits
- Configure systemd with memory limits
- Set up log rotation to prevent disk space issues
- Monitor memory usage during initial deployment
- Consider using swap space (512MB-1GB) for safety
8. Troubleshooting High Memory Usage
Common Issues and Solutions:
- Database connection leaks: Ensure connections are properly closed
- Memory fragmentation: Use jemalloc or mimalloc as allocator
- Unbounded queues: Check WebSocket message queues
- Cache growth: Implement cache limits or TTL
Add to Cargo.toml for alternative allocator:
[dependencies]
mimalloc = { version = "0.1", default-features = false }
[features]
default = ["mimalloc"]
In main.rs:
#[global_allocator]
static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;
9. Expected Memory Usage
| Component | Baseline | With 10 clients | With 100 clients |
|---|---|---|---|
| Base executable | 15MB | 15MB | 15MB |
| Database connections | 5MB | 8MB | 15MB |
| Rate limiting | 2MB | 5MB | 20MB |
| HTTP clients | 3MB | 5MB | 10MB |
| Total | 25MB | 33MB | 60MB |
Note: These are estimates. Actual usage depends on request volume, payload sizes, and configuration.