GopherGate/OPTIMIZATION.md

# Optimization for 512MB RAM Environment

This document provides guidance for optimizing the LLM Proxy Gateway for deployment in resource-constrained environments (512MB RAM).

## Memory Optimization Strategies

### 1. Build Optimization

The project is already configured with optimized build settings in `Cargo.toml`:

```toml
[profile.release]
opt-level = 3      # Maximum optimization
lto = true         # Link-time optimization
codegen-units = 1  # Single codegen unit for better optimization
strip = true       # Strip debug symbols
```

**Additional optimizations you can apply:**

```bash
# Build with specific target for better optimization
cargo build --release --target x86_64-unknown-linux-musl

# Or for ARM (Raspberry Pi, etc.)
cargo build --release --target aarch64-unknown-linux-musl
```

### 2. Runtime Memory Management

#### Database Connection Pool
- Default: 10 connections
- Recommended for 512MB: 5 connections

Update `config.toml`:
```toml
[database]
max_connections = 5
```

#### Rate Limiting Memory Usage
- Client rate limit buckets: Store in memory
- Circuit breakers: Minimal memory usage
- Consider reducing burst capacity if memory is critical

#### Provider Management
- Only enable providers you actually use
- Disable unused providers in configuration

### 3. Configuration for Low Memory

Create a `config-low-memory.toml`:

```toml
[server]
port = 8080
host = "0.0.0.0"

[database]
path = "./data/llm_proxy.db"
max_connections = 3  # Reduced from default 10

[providers]
# Only enable providers you need
openai.enabled = true
gemini.enabled = false  # Disable if not used
deepseek.enabled = false  # Disable if not used
grok.enabled = false  # Disable if not used

[rate_limiting]
# Reduce memory usage for rate limiting
client_requests_per_minute = 30  # Reduced from 60
client_burst_size = 5  # Reduced from 10
global_requests_per_minute = 300  # Reduced from 600
```

### 4. System-Level Optimizations

#### Linux Kernel Parameters
Add to `/etc/sysctl.conf`:
```bash
# Reduce TCP buffer sizes
net.ipv4.tcp_rmem = 4096 87380 174760
net.ipv4.tcp_wmem = 4096 65536 131072

# Reduce connection tracking
net.netfilter.nf_conntrack_max = 65536
net.netfilter.nf_conntrack_tcp_timeout_established = 1200

# Reduce socket buffer sizes
net.core.rmem_max = 131072
net.core.wmem_max = 131072
net.core.rmem_default = 65536
net.core.wmem_default = 65536
```

#### Systemd Service Configuration
Create `/etc/systemd/system/llm-proxy.service`:
```ini
[Unit]
Description=LLM Proxy Gateway
After=network.target

[Service]
Type=simple
User=llmproxy
Group=llmproxy
WorkingDirectory=/opt/llm-proxy
ExecStart=/opt/llm-proxy/llm-proxy
Restart=on-failure
RestartSec=5

# Memory limits
MemoryMax=400M
MemorySwapMax=100M

# CPU limits
CPUQuota=50%

# Process limits
LimitNOFILE=65536
LimitNPROC=512

Environment="RUST_LOG=info"
Environment="LLM_PROXY__DATABASE__MAX_CONNECTIONS=3"

[Install]
WantedBy=multi-user.target
```

### 5. Application-Specific Optimizations

#### Disable Unused Features
- **Multimodal support**: If not using images, disable image processing dependencies
- **Dashboard**: The dashboard uses WebSockets and additional memory. Consider disabling if not needed.
- **Detailed logging**: Reduce log verbosity in production

#### Memory Pool Sizes
The application uses several memory pools:
1. **Database connection pool**: Configured via `max_connections`
2. **HTTP client pool**: Reqwest client pool (defaults to reasonable values)
3. **Async runtime**: Tokio worker threads

Reduce Tokio worker threads for low-core systems:
```rust
// In main.rs, modify tokio runtime creation
#[tokio::main(flavor = "current_thread")]  // Single-threaded runtime
async fn main() -> Result<()> {
    // Or for multi-threaded with limited threads:
    // #[tokio::main(worker_threads = 2)]
```

### 6. Monitoring and Profiling

#### Memory Usage Monitoring
```bash
# Install heaptrack for memory profiling
cargo install heaptrack

# Profile memory usage
heaptrack ./target/release/llm-proxy

# Monitor with ps
ps aux --sort=-%mem | head -10

# Monitor with top
top -p $(pgrep llm-proxy)
```

#### Performance Benchmarks
Test with different configurations:
```bash
# Test with 100 concurrent connections
wrk -t4 -c100 -d30s http://localhost:8080/health

# Test chat completion endpoint
ab -n 1000 -c 10 -p test_request.json -T application/json http://localhost:8080/v1/chat/completions
```

### 7. Deployment Checklist for 512MB RAM

- [ ] Build with release profile: `cargo build --release`
- [ ] Configure database with `max_connections = 3`
- [ ] Disable unused providers in configuration
- [ ] Set appropriate rate limiting limits
- [ ] Configure systemd with memory limits
- [ ] Set up log rotation to prevent disk space issues
- [ ] Monitor memory usage during initial deployment
- [ ] Consider using swap space (512MB-1GB) for safety

### 8. Troubleshooting High Memory Usage

#### Common Issues and Solutions:

1. **Database connection leaks**: Ensure connections are properly closed
2. **Memory fragmentation**: Use jemalloc or mimalloc as allocator
3. **Unbounded queues**: Check WebSocket message queues
4. **Cache growth**: Implement cache limits or TTL

#### Add to Cargo.toml for alternative allocator:
```toml
[dependencies]
mimalloc = { version = "0.1", default-features = false }

[features]
default = ["mimalloc"]
```

#### In main.rs:
```rust
#[global_allocator]
static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;
```

### 9. Expected Memory Usage

| Component | Baseline | With 10 clients | With 100 clients |
|-----------|----------|-----------------|------------------|
| Base executable | 15MB | 15MB | 15MB |
| Database connections | 5MB | 8MB | 15MB |
| Rate limiting | 2MB | 5MB | 20MB |
| HTTP clients | 3MB | 5MB | 10MB |
| **Total** | **25MB** | **33MB** | **60MB** |

**Note**: These are estimates. Actual usage depends on request volume, payload sizes, and configuration.

### 10. Further Reading

- [Tokio performance guide](https://tokio.rs/tokio/topics/performance)
- [Rust performance book](https://nnethercote.github.io/perf-book/)
- [Linux memory management](https://www.kernel.org/doc/html/latest/admin-guide/mm/)
- [SQLite performance tips](https://www.sqlite.org/faq.html#q19)