chore: initial clean commit

This commit is contained in:
2026-02-26 13:56:21 -05:00
commit 1755075657
53 changed files with 18068 additions and 0 deletions

232
OPTIMIZATION.md Normal file
View File

@@ -0,0 +1,232 @@
# Optimization for 512MB RAM Environment
This document provides guidance for optimizing the LLM Proxy Gateway for deployment in resource-constrained environments (512MB RAM).
## Memory Optimization Strategies
### 1. Build Optimization
The project is already configured with optimized build settings in `Cargo.toml`:
```toml
[profile.release]
opt-level = 3 # Maximum optimization
lto = true # Link-time optimization
codegen-units = 1 # Single codegen unit for better optimization
strip = true # Strip debug symbols
```
**Additional optimizations you can apply:**
```bash
# Build with specific target for better optimization
cargo build --release --target x86_64-unknown-linux-musl
# Or for ARM (Raspberry Pi, etc.)
cargo build --release --target aarch64-unknown-linux-musl
```
### 2. Runtime Memory Management
#### Database Connection Pool
- Default: 10 connections
- Recommended for 512MB: 5 connections
Update `config.toml`:
```toml
[database]
max_connections = 5
```
#### Rate Limiting Memory Usage
- Client rate limit buckets: Store in memory
- Circuit breakers: Minimal memory usage
- Consider reducing burst capacity if memory is critical
#### Provider Management
- Only enable providers you actually use
- Disable unused providers in configuration
### 3. Configuration for Low Memory
Create a `config-low-memory.toml`:
```toml
[server]
port = 8080
host = "0.0.0.0"
[database]
path = "./data/llm_proxy.db"
max_connections = 3 # Reduced from default 10
[providers]
# Only enable providers you need
openai.enabled = true
gemini.enabled = false # Disable if not used
deepseek.enabled = false # Disable if not used
grok.enabled = false # Disable if not used
[rate_limiting]
# Reduce memory usage for rate limiting
client_requests_per_minute = 30 # Reduced from 60
client_burst_size = 5 # Reduced from 10
global_requests_per_minute = 300 # Reduced from 600
```
### 4. System-Level Optimizations
#### Linux Kernel Parameters
Add to `/etc/sysctl.conf`:
```bash
# Reduce TCP buffer sizes
net.ipv4.tcp_rmem = 4096 87380 174760
net.ipv4.tcp_wmem = 4096 65536 131072
# Reduce connection tracking
net.netfilter.nf_conntrack_max = 65536
net.netfilter.nf_conntrack_tcp_timeout_established = 1200
# Reduce socket buffer sizes
net.core.rmem_max = 131072
net.core.wmem_max = 131072
net.core.rmem_default = 65536
net.core.wmem_default = 65536
```
#### Systemd Service Configuration
Create `/etc/systemd/system/llm-proxy.service`:
```ini
[Unit]
Description=LLM Proxy Gateway
After=network.target
[Service]
Type=simple
User=llmproxy
Group=llmproxy
WorkingDirectory=/opt/llm-proxy
ExecStart=/opt/llm-proxy/llm-proxy
Restart=on-failure
RestartSec=5
# Memory limits
MemoryMax=400M
MemorySwapMax=100M
# CPU limits
CPUQuota=50%
# Process limits
LimitNOFILE=65536
LimitNPROC=512
Environment="RUST_LOG=info"
Environment="LLM_PROXY__DATABASE__MAX_CONNECTIONS=3"
[Install]
WantedBy=multi-user.target
```
### 5. Application-Specific Optimizations
#### Disable Unused Features
- **Multimodal support**: If not using images, disable image processing dependencies
- **Dashboard**: The dashboard uses WebSockets and additional memory. Consider disabling if not needed.
- **Detailed logging**: Reduce log verbosity in production
#### Memory Pool Sizes
The application uses several memory pools:
1. **Database connection pool**: Configured via `max_connections`
2. **HTTP client pool**: Reqwest client pool (defaults to reasonable values)
3. **Async runtime**: Tokio worker threads
Reduce Tokio worker threads for low-core systems:
```rust
// In main.rs, modify tokio runtime creation
#[tokio::main(flavor = "current_thread")] // Single-threaded runtime
async fn main() -> Result<()> {
// Or for multi-threaded with limited threads:
// #[tokio::main(worker_threads = 2)]
```
### 6. Monitoring and Profiling
#### Memory Usage Monitoring
```bash
# Install heaptrack for memory profiling
cargo install heaptrack
# Profile memory usage
heaptrack ./target/release/llm-proxy
# Monitor with ps
ps aux --sort=-%mem | head -10
# Monitor with top
top -p $(pgrep llm-proxy)
```
#### Performance Benchmarks
Test with different configurations:
```bash
# Test with 100 concurrent connections
wrk -t4 -c100 -d30s http://localhost:8080/health
# Test chat completion endpoint
ab -n 1000 -c 10 -p test_request.json -T application/json http://localhost:8080/v1/chat/completions
```
### 7. Deployment Checklist for 512MB RAM
- [ ] Build with release profile: `cargo build --release`
- [ ] Configure database with `max_connections = 3`
- [ ] Disable unused providers in configuration
- [ ] Set appropriate rate limiting limits
- [ ] Configure systemd with memory limits
- [ ] Set up log rotation to prevent disk space issues
- [ ] Monitor memory usage during initial deployment
- [ ] Consider using swap space (512MB-1GB) for safety
### 8. Troubleshooting High Memory Usage
#### Common Issues and Solutions:
1. **Database connection leaks**: Ensure connections are properly closed
2. **Memory fragmentation**: Use jemalloc or mimalloc as allocator
3. **Unbounded queues**: Check WebSocket message queues
4. **Cache growth**: Implement cache limits or TTL
#### Add to Cargo.toml for alternative allocator:
```toml
[dependencies]
mimalloc = { version = "0.1", default-features = false }
[features]
default = ["mimalloc"]
```
#### In main.rs:
```rust
#[global_allocator]
static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;
```
### 9. Expected Memory Usage
| Component | Baseline | With 10 clients | With 100 clients |
|-----------|----------|-----------------|------------------|
| Base executable | 15MB | 15MB | 15MB |
| Database connections | 5MB | 8MB | 15MB |
| Rate limiting | 2MB | 5MB | 20MB |
| HTTP clients | 3MB | 5MB | 10MB |
| **Total** | **25MB** | **33MB** | **60MB** |
**Note**: These are estimates. Actual usage depends on request volume, payload sizes, and configuration.
### 10. Further Reading
- [Tokio performance guide](https://tokio.rs/tokio/topics/performance)
- [Rust performance book](https://nnethercote.github.io/perf-book/)
- [Linux memory management](https://www.kernel.org/doc/html/latest/admin-guide/mm/)
- [SQLite performance tips](https://www.sqlite.org/faq.html#q19)