Files

hobokenchicken 1755075657 chore: initial clean commit

2026-02-26 13:56:21 -05:00

6.1 KiB

Raw Blame History

Optimization for 512MB RAM Environment

This document provides guidance for optimizing the LLM Proxy Gateway for deployment in resource-constrained environments (512MB RAM).

Memory Optimization Strategies

1. Build Optimization

The project is already configured with optimized build settings in Cargo.toml:

[profile.release]
opt-level = 3      # Maximum optimization
lto = true         # Link-time optimization
codegen-units = 1  # Single codegen unit for better optimization
strip = true       # Strip debug symbols

Additional optimizations you can apply:

# Build with specific target for better optimization
cargo build --release --target x86_64-unknown-linux-musl

# Or for ARM (Raspberry Pi, etc.)
cargo build --release --target aarch64-unknown-linux-musl

2. Runtime Memory Management

Database Connection Pool

Default: 10 connections
Recommended for 512MB: 5 connections

Update config.toml:

[database]
max_connections = 5

Rate Limiting Memory Usage

Client rate limit buckets: Store in memory
Circuit breakers: Minimal memory usage
Consider reducing burst capacity if memory is critical

Provider Management

Only enable providers you actually use
Disable unused providers in configuration

3. Configuration for Low Memory

Create a config-low-memory.toml:

[server]
port = 8080
host = "0.0.0.0"

[database]
path = "./data/llm_proxy.db"
max_connections = 3  # Reduced from default 10

[providers]
# Only enable providers you need
openai.enabled = true
gemini.enabled = false  # Disable if not used
deepseek.enabled = false  # Disable if not used
grok.enabled = false  # Disable if not used

[rate_limiting]
# Reduce memory usage for rate limiting
client_requests_per_minute = 30  # Reduced from 60
client_burst_size = 5  # Reduced from 10
global_requests_per_minute = 300  # Reduced from 600

4. System-Level Optimizations

Linux Kernel Parameters

Add to /etc/sysctl.conf:

# Reduce TCP buffer sizes
net.ipv4.tcp_rmem = 4096 87380 174760
net.ipv4.tcp_wmem = 4096 65536 131072

# Reduce connection tracking
net.netfilter.nf_conntrack_max = 65536
net.netfilter.nf_conntrack_tcp_timeout_established = 1200

# Reduce socket buffer sizes
net.core.rmem_max = 131072
net.core.wmem_max = 131072
net.core.rmem_default = 65536
net.core.wmem_default = 65536

Systemd Service Configuration

Create /etc/systemd/system/llm-proxy.service:

[Unit]
Description=LLM Proxy Gateway
After=network.target

[Service]
Type=simple
User=llmproxy
Group=llmproxy
WorkingDirectory=/opt/llm-proxy
ExecStart=/opt/llm-proxy/llm-proxy
Restart=on-failure
RestartSec=5

# Memory limits
MemoryMax=400M
MemorySwapMax=100M

# CPU limits
CPUQuota=50%

# Process limits
LimitNOFILE=65536
LimitNPROC=512

Environment="RUST_LOG=info"
Environment="LLM_PROXY__DATABASE__MAX_CONNECTIONS=3"

[Install]
WantedBy=multi-user.target

5. Application-Specific Optimizations

Disable Unused Features

Multimodal support: If not using images, disable image processing dependencies
Dashboard: The dashboard uses WebSockets and additional memory. Consider disabling if not needed.
Detailed logging: Reduce log verbosity in production

Memory Pool Sizes

The application uses several memory pools:

Database connection pool: Configured via max_connections
HTTP client pool: Reqwest client pool (defaults to reasonable values)
Async runtime: Tokio worker threads

Reduce Tokio worker threads for low-core systems:

// In main.rs, modify tokio runtime creation
#[tokio::main(flavor = "current_thread")]  // Single-threaded runtime
async fn main() -> Result<()> {
    // Or for multi-threaded with limited threads:
    // #[tokio::main(worker_threads = 2)]

6. Monitoring and Profiling

Memory Usage Monitoring

# Install heaptrack for memory profiling
cargo install heaptrack

# Profile memory usage
heaptrack ./target/release/llm-proxy

# Monitor with ps
ps aux --sort=-%mem | head -10

# Monitor with top
top -p $(pgrep llm-proxy)

Performance Benchmarks

Test with different configurations:

# Test with 100 concurrent connections
wrk -t4 -c100 -d30s http://localhost:8080/health

# Test chat completion endpoint
ab -n 1000 -c 10 -p test_request.json -T application/json http://localhost:8080/v1/chat/completions

7. Deployment Checklist for 512MB RAM

Build with release profile: cargo build --release
Configure database with max_connections = 3
Disable unused providers in configuration
Set appropriate rate limiting limits
Configure systemd with memory limits
Set up log rotation to prevent disk space issues
Monitor memory usage during initial deployment
Consider using swap space (512MB-1GB) for safety

8. Troubleshooting High Memory Usage

Common Issues and Solutions:

Database connection leaks: Ensure connections are properly closed
Memory fragmentation: Use jemalloc or mimalloc as allocator
Unbounded queues: Check WebSocket message queues
Cache growth: Implement cache limits or TTL

Add to Cargo.toml for alternative allocator:

[dependencies]
mimalloc = { version = "0.1", default-features = false }

[features]
default = ["mimalloc"]

In main.rs:

#[global_allocator]
static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;

9. Expected Memory Usage

Component	Baseline	With 10 clients	With 100 clients
Base executable	15MB	15MB	15MB
Database connections	5MB	8MB	15MB
Rate limiting	2MB	5MB	20MB
HTTP clients	3MB	5MB	10MB
Total	25MB	33MB	60MB

Note: These are estimates. Actual usage depends on request volume, payload sizes, and configuration.

6.1 KiB Raw Blame History