232 lines
6.1 KiB
Markdown
232 lines
6.1 KiB
Markdown
# Optimization for 512MB RAM Environment
|
|
|
|
This document provides guidance for optimizing the LLM Proxy Gateway for deployment in resource-constrained environments (512MB RAM).
|
|
|
|
## Memory Optimization Strategies
|
|
|
|
### 1. Build Optimization
|
|
|
|
The project is already configured with optimized build settings in `Cargo.toml`:
|
|
|
|
```toml
|
|
[profile.release]
|
|
opt-level = 3 # Maximum optimization
|
|
lto = true # Link-time optimization
|
|
codegen-units = 1 # Single codegen unit for better optimization
|
|
strip = true # Strip debug symbols
|
|
```
|
|
|
|
**Additional optimizations you can apply:**
|
|
|
|
```bash
|
|
# Build with specific target for better optimization
|
|
cargo build --release --target x86_64-unknown-linux-musl
|
|
|
|
# Or for ARM (Raspberry Pi, etc.)
|
|
cargo build --release --target aarch64-unknown-linux-musl
|
|
```
|
|
|
|
### 2. Runtime Memory Management
|
|
|
|
#### Database Connection Pool
|
|
- Default: 10 connections
|
|
- Recommended for 512MB: 5 connections
|
|
|
|
Update `config.toml`:
|
|
```toml
|
|
[database]
|
|
max_connections = 5
|
|
```
|
|
|
|
#### Rate Limiting Memory Usage
|
|
- Client rate limit buckets: Store in memory
|
|
- Circuit breakers: Minimal memory usage
|
|
- Consider reducing burst capacity if memory is critical
|
|
|
|
#### Provider Management
|
|
- Only enable providers you actually use
|
|
- Disable unused providers in configuration
|
|
|
|
### 3. Configuration for Low Memory
|
|
|
|
Create a `config-low-memory.toml`:
|
|
|
|
```toml
|
|
[server]
|
|
port = 8080
|
|
host = "0.0.0.0"
|
|
|
|
[database]
|
|
path = "./data/llm_proxy.db"
|
|
max_connections = 3 # Reduced from default 10
|
|
|
|
[providers]
|
|
# Only enable providers you need
|
|
openai.enabled = true
|
|
gemini.enabled = false # Disable if not used
|
|
deepseek.enabled = false # Disable if not used
|
|
grok.enabled = false # Disable if not used
|
|
|
|
[rate_limiting]
|
|
# Reduce memory usage for rate limiting
|
|
client_requests_per_minute = 30 # Reduced from 60
|
|
client_burst_size = 5 # Reduced from 10
|
|
global_requests_per_minute = 300 # Reduced from 600
|
|
```
|
|
|
|
### 4. System-Level Optimizations
|
|
|
|
#### Linux Kernel Parameters
|
|
Add to `/etc/sysctl.conf`:
|
|
```bash
|
|
# Reduce TCP buffer sizes
|
|
net.ipv4.tcp_rmem = 4096 87380 174760
|
|
net.ipv4.tcp_wmem = 4096 65536 131072
|
|
|
|
# Reduce connection tracking
|
|
net.netfilter.nf_conntrack_max = 65536
|
|
net.netfilter.nf_conntrack_tcp_timeout_established = 1200
|
|
|
|
# Reduce socket buffer sizes
|
|
net.core.rmem_max = 131072
|
|
net.core.wmem_max = 131072
|
|
net.core.rmem_default = 65536
|
|
net.core.wmem_default = 65536
|
|
```
|
|
|
|
#### Systemd Service Configuration
|
|
Create `/etc/systemd/system/llm-proxy.service`:
|
|
```ini
|
|
[Unit]
|
|
Description=LLM Proxy Gateway
|
|
After=network.target
|
|
|
|
[Service]
|
|
Type=simple
|
|
User=llmproxy
|
|
Group=llmproxy
|
|
WorkingDirectory=/opt/llm-proxy
|
|
ExecStart=/opt/llm-proxy/llm-proxy
|
|
Restart=on-failure
|
|
RestartSec=5
|
|
|
|
# Memory limits
|
|
MemoryMax=400M
|
|
MemorySwapMax=100M
|
|
|
|
# CPU limits
|
|
CPUQuota=50%
|
|
|
|
# Process limits
|
|
LimitNOFILE=65536
|
|
LimitNPROC=512
|
|
|
|
Environment="RUST_LOG=info"
|
|
Environment="LLM_PROXY__DATABASE__MAX_CONNECTIONS=3"
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
### 5. Application-Specific Optimizations
|
|
|
|
#### Disable Unused Features
|
|
- **Multimodal support**: If not using images, disable image processing dependencies
|
|
- **Dashboard**: The dashboard uses WebSockets and additional memory. Consider disabling if not needed.
|
|
- **Detailed logging**: Reduce log verbosity in production
|
|
|
|
#### Memory Pool Sizes
|
|
The application uses several memory pools:
|
|
1. **Database connection pool**: Configured via `max_connections`
|
|
2. **HTTP client pool**: Reqwest client pool (defaults to reasonable values)
|
|
3. **Async runtime**: Tokio worker threads
|
|
|
|
Reduce Tokio worker threads for low-core systems:
|
|
```rust
|
|
// In main.rs, modify tokio runtime creation
|
|
#[tokio::main(flavor = "current_thread")] // Single-threaded runtime
|
|
async fn main() -> Result<()> {
|
|
// Or for multi-threaded with limited threads:
|
|
// #[tokio::main(worker_threads = 2)]
|
|
```
|
|
|
|
### 6. Monitoring and Profiling
|
|
|
|
#### Memory Usage Monitoring
|
|
```bash
|
|
# Install heaptrack for memory profiling
|
|
cargo install heaptrack
|
|
|
|
# Profile memory usage
|
|
heaptrack ./target/release/llm-proxy
|
|
|
|
# Monitor with ps
|
|
ps aux --sort=-%mem | head -10
|
|
|
|
# Monitor with top
|
|
top -p $(pgrep llm-proxy)
|
|
```
|
|
|
|
#### Performance Benchmarks
|
|
Test with different configurations:
|
|
```bash
|
|
# Test with 100 concurrent connections
|
|
wrk -t4 -c100 -d30s http://localhost:8080/health
|
|
|
|
# Test chat completion endpoint
|
|
ab -n 1000 -c 10 -p test_request.json -T application/json http://localhost:8080/v1/chat/completions
|
|
```
|
|
|
|
### 7. Deployment Checklist for 512MB RAM
|
|
|
|
- [ ] Build with release profile: `cargo build --release`
|
|
- [ ] Configure database with `max_connections = 3`
|
|
- [ ] Disable unused providers in configuration
|
|
- [ ] Set appropriate rate limiting limits
|
|
- [ ] Configure systemd with memory limits
|
|
- [ ] Set up log rotation to prevent disk space issues
|
|
- [ ] Monitor memory usage during initial deployment
|
|
- [ ] Consider using swap space (512MB-1GB) for safety
|
|
|
|
### 8. Troubleshooting High Memory Usage
|
|
|
|
#### Common Issues and Solutions:
|
|
|
|
1. **Database connection leaks**: Ensure connections are properly closed
|
|
2. **Memory fragmentation**: Use jemalloc or mimalloc as allocator
|
|
3. **Unbounded queues**: Check WebSocket message queues
|
|
4. **Cache growth**: Implement cache limits or TTL
|
|
|
|
#### Add to Cargo.toml for alternative allocator:
|
|
```toml
|
|
[dependencies]
|
|
mimalloc = { version = "0.1", default-features = false }
|
|
|
|
[features]
|
|
default = ["mimalloc"]
|
|
```
|
|
|
|
#### In main.rs:
|
|
```rust
|
|
#[global_allocator]
|
|
static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;
|
|
```
|
|
|
|
### 9. Expected Memory Usage
|
|
|
|
| Component | Baseline | With 10 clients | With 100 clients |
|
|
|-----------|----------|-----------------|------------------|
|
|
| Base executable | 15MB | 15MB | 15MB |
|
|
| Database connections | 5MB | 8MB | 15MB |
|
|
| Rate limiting | 2MB | 5MB | 20MB |
|
|
| HTTP clients | 3MB | 5MB | 10MB |
|
|
| **Total** | **25MB** | **33MB** | **60MB** |
|
|
|
|
**Note**: These are estimates. Actual usage depends on request volume, payload sizes, and configuration.
|
|
|
|
### 10. Further Reading
|
|
|
|
- [Tokio performance guide](https://tokio.rs/tokio/topics/performance)
|
|
- [Rust performance book](https://nnethercote.github.io/perf-book/)
|
|
- [Linux memory management](https://www.kernel.org/doc/html/latest/admin-guide/mm/)
|
|
- [SQLite performance tips](https://www.sqlite.org/faq.html#q19) |