chore: initial clean commit

2026-02-26 13:56:21 -05:00
commit 1755075657
53 changed files with 18068 additions and 0 deletions
--- a/OPTIMIZATION.md
+++ b/OPTIMIZATION.md
@@ -0,0 +1,232 @@
+# Optimization for 512MB RAM Environment
+
+This document provides guidance for optimizing the LLM Proxy Gateway for deployment in resource-constrained environments (512MB RAM).
+
+## Memory Optimization Strategies
+
+### 1. Build Optimization
+
+The project is already configured with optimized build settings in `Cargo.toml`:
+
+```toml
+[profile.release]
+opt-level = 3      # Maximum optimization
+lto = true         # Link-time optimization
+codegen-units = 1  # Single codegen unit for better optimization
+strip = true       # Strip debug symbols
+```
+
+**Additional optimizations you can apply:**
+
+```bash
+# Build with specific target for better optimization
+cargo build --release --target x86_64-unknown-linux-musl
+
+# Or for ARM (Raspberry Pi, etc.)
+cargo build --release --target aarch64-unknown-linux-musl
+```
+
+### 2. Runtime Memory Management
+
+#### Database Connection Pool
+- Default: 10 connections
+- Recommended for 512MB: 5 connections
+
+Update `config.toml`:
+```toml
+[database]
+max_connections = 5
+```
+
+#### Rate Limiting Memory Usage
+- Client rate limit buckets: Store in memory
+- Circuit breakers: Minimal memory usage
+- Consider reducing burst capacity if memory is critical
+
+#### Provider Management
+- Only enable providers you actually use
+- Disable unused providers in configuration
+
+### 3. Configuration for Low Memory
+
+Create a `config-low-memory.toml`:
+
+```toml
+[server]
+port = 8080
+host = "0.0.0.0"
+
+[database]
+path = "./data/llm_proxy.db"
+max_connections = 3  # Reduced from default 10
+
+[providers]
+# Only enable providers you need
+openai.enabled = true
+gemini.enabled = false  # Disable if not used
+deepseek.enabled = false  # Disable if not used
+grok.enabled = false  # Disable if not used
+
+[rate_limiting]
+# Reduce memory usage for rate limiting
+client_requests_per_minute = 30  # Reduced from 60
+client_burst_size = 5  # Reduced from 10
+global_requests_per_minute = 300  # Reduced from 600
+```
+
+### 4. System-Level Optimizations
+
+#### Linux Kernel Parameters
+Add to `/etc/sysctl.conf`:
+```bash
+# Reduce TCP buffer sizes
+net.ipv4.tcp_rmem = 4096 87380 174760
+net.ipv4.tcp_wmem = 4096 65536 131072
+
+# Reduce connection tracking
+net.netfilter.nf_conntrack_max = 65536
+net.netfilter.nf_conntrack_tcp_timeout_established = 1200
+
+# Reduce socket buffer sizes
+net.core.rmem_max = 131072
+net.core.wmem_max = 131072
+net.core.rmem_default = 65536
+net.core.wmem_default = 65536
+```
+
+#### Systemd Service Configuration
+Create `/etc/systemd/system/llm-proxy.service`:
+```ini
+[Unit]
+Description=LLM Proxy Gateway
+After=network.target
+
+[Service]
+Type=simple
+User=llmproxy
+Group=llmproxy
+WorkingDirectory=/opt/llm-proxy
+ExecStart=/opt/llm-proxy/llm-proxy
+Restart=on-failure
+RestartSec=5
+
+# Memory limits
+MemoryMax=400M
+MemorySwapMax=100M
+
+# CPU limits
+CPUQuota=50%
+
+# Process limits
+LimitNOFILE=65536
+LimitNPROC=512
+
+Environment="RUST_LOG=info"
+Environment="LLM_PROXY__DATABASE__MAX_CONNECTIONS=3"
+
+[Install]
+WantedBy=multi-user.target
+```
+
+### 5. Application-Specific Optimizations
+
+#### Disable Unused Features
+- **Multimodal support**: If not using images, disable image processing dependencies
+- **Dashboard**: The dashboard uses WebSockets and additional memory. Consider disabling if not needed.
+- **Detailed logging**: Reduce log verbosity in production
+
+#### Memory Pool Sizes
+The application uses several memory pools:
+1. **Database connection pool**: Configured via `max_connections`
+2. **HTTP client pool**: Reqwest client pool (defaults to reasonable values)
+3. **Async runtime**: Tokio worker threads
+
+Reduce Tokio worker threads for low-core systems:
+```rust
+// In main.rs, modify tokio runtime creation
+#[tokio::main(flavor = "current_thread")]  // Single-threaded runtime
+async fn main() -> Result<()> {
+    // Or for multi-threaded with limited threads:
+    // #[tokio::main(worker_threads = 2)]
+```
+
+### 6. Monitoring and Profiling
+
+#### Memory Usage Monitoring
+```bash
+# Install heaptrack for memory profiling
+cargo install heaptrack
+
+# Profile memory usage
+heaptrack ./target/release/llm-proxy
+
+# Monitor with ps
+ps aux --sort=-%mem | head -10
+
+# Monitor with top
+top -p $(pgrep llm-proxy)
+```
+
+#### Performance Benchmarks
+Test with different configurations:
+```bash
+# Test with 100 concurrent connections
+wrk -t4 -c100 -d30s http://localhost:8080/health
+
+# Test chat completion endpoint
+ab -n 1000 -c 10 -p test_request.json -T application/json http://localhost:8080/v1/chat/completions
+```
+
+### 7. Deployment Checklist for 512MB RAM
+
+- [ ] Build with release profile: `cargo build --release`
+- [ ] Configure database with `max_connections = 3`
+- [ ] Disable unused providers in configuration
+- [ ] Set appropriate rate limiting limits
+- [ ] Configure systemd with memory limits
+- [ ] Set up log rotation to prevent disk space issues
+- [ ] Monitor memory usage during initial deployment
+- [ ] Consider using swap space (512MB-1GB) for safety
+
+### 8. Troubleshooting High Memory Usage
+
+#### Common Issues and Solutions:
+
+1. **Database connection leaks**: Ensure connections are properly closed
+2. **Memory fragmentation**: Use jemalloc or mimalloc as allocator
+3. **Unbounded queues**: Check WebSocket message queues
+4. **Cache growth**: Implement cache limits or TTL
+
+#### Add to Cargo.toml for alternative allocator:
+```toml
+[dependencies]
+mimalloc = { version = "0.1", default-features = false }
+
+[features]
+default = ["mimalloc"]
+```
+
+#### In main.rs:
+```rust
+#[global_allocator]
+static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;
+```
+
+### 9. Expected Memory Usage
+
+| Component | Baseline | With 10 clients | With 100 clients |
+|-----------|----------|-----------------|------------------|
+| Base executable | 15MB | 15MB | 15MB |
+| Database connections | 5MB | 8MB | 15MB |
+| Rate limiting | 2MB | 5MB | 20MB |
+| HTTP clients | 3MB | 5MB | 10MB |
+| **Total** | **25MB** | **33MB** | **60MB** |
+
+**Note**: These are estimates. Actual usage depends on request volume, payload sizes, and configuration.
+
+### 10. Further Reading
+
+- [Tokio performance guide](https://tokio.rs/tokio/topics/performance)
+- [Rust performance book](https://nnethercote.github.io/perf-book/)
+- [Linux memory management](https://www.kernel.org/doc/html/latest/admin-guide/mm/)
+- [SQLite performance tips](https://www.sqlite.org/faq.html#q19)