chore: initial clean commit
This commit is contained in:
232
OPTIMIZATION.md
Normal file
232
OPTIMIZATION.md
Normal file
@@ -0,0 +1,232 @@
|
||||
# Optimization for 512MB RAM Environment
|
||||
|
||||
This document provides guidance for optimizing the LLM Proxy Gateway for deployment in resource-constrained environments (512MB RAM).
|
||||
|
||||
## Memory Optimization Strategies
|
||||
|
||||
### 1. Build Optimization
|
||||
|
||||
The project is already configured with optimized build settings in `Cargo.toml`:
|
||||
|
||||
```toml
|
||||
[profile.release]
|
||||
opt-level = 3 # Maximum optimization
|
||||
lto = true # Link-time optimization
|
||||
codegen-units = 1 # Single codegen unit for better optimization
|
||||
strip = true # Strip debug symbols
|
||||
```
|
||||
|
||||
**Additional optimizations you can apply:**
|
||||
|
||||
```bash
|
||||
# Build with specific target for better optimization
|
||||
cargo build --release --target x86_64-unknown-linux-musl
|
||||
|
||||
# Or for ARM (Raspberry Pi, etc.)
|
||||
cargo build --release --target aarch64-unknown-linux-musl
|
||||
```
|
||||
|
||||
### 2. Runtime Memory Management
|
||||
|
||||
#### Database Connection Pool
|
||||
- Default: 10 connections
|
||||
- Recommended for 512MB: 5 connections
|
||||
|
||||
Update `config.toml`:
|
||||
```toml
|
||||
[database]
|
||||
max_connections = 5
|
||||
```
|
||||
|
||||
#### Rate Limiting Memory Usage
|
||||
- Client rate limit buckets: Store in memory
|
||||
- Circuit breakers: Minimal memory usage
|
||||
- Consider reducing burst capacity if memory is critical
|
||||
|
||||
#### Provider Management
|
||||
- Only enable providers you actually use
|
||||
- Disable unused providers in configuration
|
||||
|
||||
### 3. Configuration for Low Memory
|
||||
|
||||
Create a `config-low-memory.toml`:
|
||||
|
||||
```toml
|
||||
[server]
|
||||
port = 8080
|
||||
host = "0.0.0.0"
|
||||
|
||||
[database]
|
||||
path = "./data/llm_proxy.db"
|
||||
max_connections = 3 # Reduced from default 10
|
||||
|
||||
[providers]
|
||||
# Only enable providers you need
|
||||
openai.enabled = true
|
||||
gemini.enabled = false # Disable if not used
|
||||
deepseek.enabled = false # Disable if not used
|
||||
grok.enabled = false # Disable if not used
|
||||
|
||||
[rate_limiting]
|
||||
# Reduce memory usage for rate limiting
|
||||
client_requests_per_minute = 30 # Reduced from 60
|
||||
client_burst_size = 5 # Reduced from 10
|
||||
global_requests_per_minute = 300 # Reduced from 600
|
||||
```
|
||||
|
||||
### 4. System-Level Optimizations
|
||||
|
||||
#### Linux Kernel Parameters
|
||||
Add to `/etc/sysctl.conf`:
|
||||
```bash
|
||||
# Reduce TCP buffer sizes
|
||||
net.ipv4.tcp_rmem = 4096 87380 174760
|
||||
net.ipv4.tcp_wmem = 4096 65536 131072
|
||||
|
||||
# Reduce connection tracking
|
||||
net.netfilter.nf_conntrack_max = 65536
|
||||
net.netfilter.nf_conntrack_tcp_timeout_established = 1200
|
||||
|
||||
# Reduce socket buffer sizes
|
||||
net.core.rmem_max = 131072
|
||||
net.core.wmem_max = 131072
|
||||
net.core.rmem_default = 65536
|
||||
net.core.wmem_default = 65536
|
||||
```
|
||||
|
||||
#### Systemd Service Configuration
|
||||
Create `/etc/systemd/system/llm-proxy.service`:
|
||||
```ini
|
||||
[Unit]
|
||||
Description=LLM Proxy Gateway
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=llmproxy
|
||||
Group=llmproxy
|
||||
WorkingDirectory=/opt/llm-proxy
|
||||
ExecStart=/opt/llm-proxy/llm-proxy
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
|
||||
# Memory limits
|
||||
MemoryMax=400M
|
||||
MemorySwapMax=100M
|
||||
|
||||
# CPU limits
|
||||
CPUQuota=50%
|
||||
|
||||
# Process limits
|
||||
LimitNOFILE=65536
|
||||
LimitNPROC=512
|
||||
|
||||
Environment="RUST_LOG=info"
|
||||
Environment="LLM_PROXY__DATABASE__MAX_CONNECTIONS=3"
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
### 5. Application-Specific Optimizations
|
||||
|
||||
#### Disable Unused Features
|
||||
- **Multimodal support**: If not using images, disable image processing dependencies
|
||||
- **Dashboard**: The dashboard uses WebSockets and additional memory. Consider disabling if not needed.
|
||||
- **Detailed logging**: Reduce log verbosity in production
|
||||
|
||||
#### Memory Pool Sizes
|
||||
The application uses several memory pools:
|
||||
1. **Database connection pool**: Configured via `max_connections`
|
||||
2. **HTTP client pool**: Reqwest client pool (defaults to reasonable values)
|
||||
3. **Async runtime**: Tokio worker threads
|
||||
|
||||
Reduce Tokio worker threads for low-core systems:
|
||||
```rust
|
||||
// In main.rs, modify tokio runtime creation
|
||||
#[tokio::main(flavor = "current_thread")] // Single-threaded runtime
|
||||
async fn main() -> Result<()> {
|
||||
// Or for multi-threaded with limited threads:
|
||||
// #[tokio::main(worker_threads = 2)]
|
||||
```
|
||||
|
||||
### 6. Monitoring and Profiling
|
||||
|
||||
#### Memory Usage Monitoring
|
||||
```bash
|
||||
# Install heaptrack for memory profiling
|
||||
cargo install heaptrack
|
||||
|
||||
# Profile memory usage
|
||||
heaptrack ./target/release/llm-proxy
|
||||
|
||||
# Monitor with ps
|
||||
ps aux --sort=-%mem | head -10
|
||||
|
||||
# Monitor with top
|
||||
top -p $(pgrep llm-proxy)
|
||||
```
|
||||
|
||||
#### Performance Benchmarks
|
||||
Test with different configurations:
|
||||
```bash
|
||||
# Test with 100 concurrent connections
|
||||
wrk -t4 -c100 -d30s http://localhost:8080/health
|
||||
|
||||
# Test chat completion endpoint
|
||||
ab -n 1000 -c 10 -p test_request.json -T application/json http://localhost:8080/v1/chat/completions
|
||||
```
|
||||
|
||||
### 7. Deployment Checklist for 512MB RAM
|
||||
|
||||
- [ ] Build with release profile: `cargo build --release`
|
||||
- [ ] Configure database with `max_connections = 3`
|
||||
- [ ] Disable unused providers in configuration
|
||||
- [ ] Set appropriate rate limiting limits
|
||||
- [ ] Configure systemd with memory limits
|
||||
- [ ] Set up log rotation to prevent disk space issues
|
||||
- [ ] Monitor memory usage during initial deployment
|
||||
- [ ] Consider using swap space (512MB-1GB) for safety
|
||||
|
||||
### 8. Troubleshooting High Memory Usage
|
||||
|
||||
#### Common Issues and Solutions:
|
||||
|
||||
1. **Database connection leaks**: Ensure connections are properly closed
|
||||
2. **Memory fragmentation**: Use jemalloc or mimalloc as allocator
|
||||
3. **Unbounded queues**: Check WebSocket message queues
|
||||
4. **Cache growth**: Implement cache limits or TTL
|
||||
|
||||
#### Add to Cargo.toml for alternative allocator:
|
||||
```toml
|
||||
[dependencies]
|
||||
mimalloc = { version = "0.1", default-features = false }
|
||||
|
||||
[features]
|
||||
default = ["mimalloc"]
|
||||
```
|
||||
|
||||
#### In main.rs:
|
||||
```rust
|
||||
#[global_allocator]
|
||||
static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;
|
||||
```
|
||||
|
||||
### 9. Expected Memory Usage
|
||||
|
||||
| Component | Baseline | With 10 clients | With 100 clients |
|
||||
|-----------|----------|-----------------|------------------|
|
||||
| Base executable | 15MB | 15MB | 15MB |
|
||||
| Database connections | 5MB | 8MB | 15MB |
|
||||
| Rate limiting | 2MB | 5MB | 20MB |
|
||||
| HTTP clients | 3MB | 5MB | 10MB |
|
||||
| **Total** | **25MB** | **33MB** | **60MB** |
|
||||
|
||||
**Note**: These are estimates. Actual usage depends on request volume, payload sizes, and configuration.
|
||||
|
||||
### 10. Further Reading
|
||||
|
||||
- [Tokio performance guide](https://tokio.rs/tokio/topics/performance)
|
||||
- [Rust performance book](https://nnethercote.github.io/perf-book/)
|
||||
- [Linux memory management](https://www.kernel.org/doc/html/latest/admin-guide/mm/)
|
||||
- [SQLite performance tips](https://www.sqlite.org/faq.html#q19)
|
||||
Reference in New Issue
Block a user