# Optimization for 512MB RAM Environment This document provides guidance for optimizing the LLM Proxy Gateway for deployment in resource-constrained environments (512MB RAM). ## Memory Optimization Strategies ### 1. Build Optimization The project is already configured with optimized build settings in `Cargo.toml`: ```toml [profile.release] opt-level = 3 # Maximum optimization lto = true # Link-time optimization codegen-units = 1 # Single codegen unit for better optimization strip = true # Strip debug symbols ``` **Additional optimizations you can apply:** ```bash # Build with specific target for better optimization cargo build --release --target x86_64-unknown-linux-musl # Or for ARM (Raspberry Pi, etc.) cargo build --release --target aarch64-unknown-linux-musl ``` ### 2. Runtime Memory Management #### Database Connection Pool - Default: 10 connections - Recommended for 512MB: 5 connections Update `config.toml`: ```toml [database] max_connections = 5 ``` #### Rate Limiting Memory Usage - Client rate limit buckets: Store in memory - Circuit breakers: Minimal memory usage - Consider reducing burst capacity if memory is critical #### Provider Management - Only enable providers you actually use - Disable unused providers in configuration ### 3. Configuration for Low Memory Create a `config-low-memory.toml`: ```toml [server] port = 8080 host = "0.0.0.0" [database] path = "./data/llm_proxy.db" max_connections = 3 # Reduced from default 10 [providers] # Only enable providers you need openai.enabled = true gemini.enabled = false # Disable if not used deepseek.enabled = false # Disable if not used grok.enabled = false # Disable if not used [rate_limiting] # Reduce memory usage for rate limiting client_requests_per_minute = 30 # Reduced from 60 client_burst_size = 5 # Reduced from 10 global_requests_per_minute = 300 # Reduced from 600 ``` ### 4. System-Level Optimizations #### Linux Kernel Parameters Add to `/etc/sysctl.conf`: ```bash # Reduce TCP buffer sizes net.ipv4.tcp_rmem = 4096 87380 174760 net.ipv4.tcp_wmem = 4096 65536 131072 # Reduce connection tracking net.netfilter.nf_conntrack_max = 65536 net.netfilter.nf_conntrack_tcp_timeout_established = 1200 # Reduce socket buffer sizes net.core.rmem_max = 131072 net.core.wmem_max = 131072 net.core.rmem_default = 65536 net.core.wmem_default = 65536 ``` #### Systemd Service Configuration Create `/etc/systemd/system/llm-proxy.service`: ```ini [Unit] Description=LLM Proxy Gateway After=network.target [Service] Type=simple User=llmproxy Group=llmproxy WorkingDirectory=/opt/llm-proxy ExecStart=/opt/llm-proxy/llm-proxy Restart=on-failure RestartSec=5 # Memory limits MemoryMax=400M MemorySwapMax=100M # CPU limits CPUQuota=50% # Process limits LimitNOFILE=65536 LimitNPROC=512 Environment="RUST_LOG=info" Environment="LLM_PROXY__DATABASE__MAX_CONNECTIONS=3" [Install] WantedBy=multi-user.target ``` ### 5. Application-Specific Optimizations #### Disable Unused Features - **Multimodal support**: If not using images, disable image processing dependencies - **Dashboard**: The dashboard uses WebSockets and additional memory. Consider disabling if not needed. - **Detailed logging**: Reduce log verbosity in production #### Memory Pool Sizes The application uses several memory pools: 1. **Database connection pool**: Configured via `max_connections` 2. **HTTP client pool**: Reqwest client pool (defaults to reasonable values) 3. **Async runtime**: Tokio worker threads Reduce Tokio worker threads for low-core systems: ```rust // In main.rs, modify tokio runtime creation #[tokio::main(flavor = "current_thread")] // Single-threaded runtime async fn main() -> Result<()> { // Or for multi-threaded with limited threads: // #[tokio::main(worker_threads = 2)] ``` ### 6. Monitoring and Profiling #### Memory Usage Monitoring ```bash # Install heaptrack for memory profiling cargo install heaptrack # Profile memory usage heaptrack ./target/release/llm-proxy # Monitor with ps ps aux --sort=-%mem | head -10 # Monitor with top top -p $(pgrep llm-proxy) ``` #### Performance Benchmarks Test with different configurations: ```bash # Test with 100 concurrent connections wrk -t4 -c100 -d30s http://localhost:8080/health # Test chat completion endpoint ab -n 1000 -c 10 -p test_request.json -T application/json http://localhost:8080/v1/chat/completions ``` ### 7. Deployment Checklist for 512MB RAM - [ ] Build with release profile: `cargo build --release` - [ ] Configure database with `max_connections = 3` - [ ] Disable unused providers in configuration - [ ] Set appropriate rate limiting limits - [ ] Configure systemd with memory limits - [ ] Set up log rotation to prevent disk space issues - [ ] Monitor memory usage during initial deployment - [ ] Consider using swap space (512MB-1GB) for safety ### 8. Troubleshooting High Memory Usage #### Common Issues and Solutions: 1. **Database connection leaks**: Ensure connections are properly closed 2. **Memory fragmentation**: Use jemalloc or mimalloc as allocator 3. **Unbounded queues**: Check WebSocket message queues 4. **Cache growth**: Implement cache limits or TTL #### Add to Cargo.toml for alternative allocator: ```toml [dependencies] mimalloc = { version = "0.1", default-features = false } [features] default = ["mimalloc"] ``` #### In main.rs: ```rust #[global_allocator] static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc; ``` ### 9. Expected Memory Usage | Component | Baseline | With 10 clients | With 100 clients | |-----------|----------|-----------------|------------------| | Base executable | 15MB | 15MB | 15MB | | Database connections | 5MB | 8MB | 15MB | | Rate limiting | 2MB | 5MB | 20MB | | HTTP clients | 3MB | 5MB | 10MB | | **Total** | **25MB** | **33MB** | **60MB** | **Note**: These are estimates. Actual usage depends on request volume, payload sizes, and configuration. ### 10. Further Reading - [Tokio performance guide](https://tokio.rs/tokio/topics/performance) - [Rust performance book](https://nnethercote.github.io/perf-book/) - [Linux memory management](https://www.kernel.org/doc/html/latest/admin-guide/mm/) - [SQLite performance tips](https://www.sqlite.org/faq.html#q19)