docs: sync documentation with current implementation and archive stale plan
Some checks failed
CI / Check (push) Has been cancelled
CI / Clippy (push) Has been cancelled
CI / Formatting (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Release Build (push) Has been cancelled

This commit is contained in:
2026-03-06 14:28:04 -05:00
parent 975ae124d1
commit 633b69a07b
8 changed files with 767 additions and 78 deletions

65
CODE_REVIEW_PLAN.md Normal file
View File

@@ -0,0 +1,65 @@
# LLM Proxy Code Review Plan
## Overview
The **LLM Proxy** project is a Rust-based middleware designed to provide a unified interface for multiple Large Language Models (LLMs). Based on the repository structure, the project aims to implement a high-performance proxy server (`src/`) that handles request routing, usage tracking, and billing logic. A static dashboard (`static/`) provides a management interface for monitoring consumption and managing API keys. The architecture leverages Rust's async capabilities for efficient request handling and SQLite for persistent state management.
## Review Phases
### Phase 1: Backend Architecture & Rust Logic (@code-reviewer)
- **Focus on:**
- **Core Proxy Logic:** Efficiency of the request/response pipeline and streaming support.
- **State Management:** Thread-safety and shared state patterns using `Arc` and `Mutex`/`RwLock`.
- **Error Handling:** Use of idiomatic Rust error types and propagation.
- **Async Performance:** Proper use of `tokio` or similar runtimes to avoid blocking the executor.
- **Rust Idioms:** Adherence to Clippy suggestions and standard Rust naming conventions.
### Phase 2: Security & Authentication Audit (@security-auditor)
- **Focus on:**
- **API Key Management:** Secure storage, masking in logs, and rotation mechanisms.
- **JWT Handling:** Validation logic, signature verification, and expiration checks.
- **Input Validation:** Sanitization of prompts and configuration parameters to prevent injection.
- **Dependency Audit:** Scanning for known vulnerabilities in the `Cargo.lock` using `cargo-audit`.
### Phase 3: Database & Data Integrity Review (@database-optimizer)
- **Focus on:**
- **Schema Design:** Efficiency of the SQLite schema for usage tracking and billing.
- **Migration Strategy:** Robustness of the migration scripts to prevent data loss.
- **Usage Tracking:** Accuracy of token counting and concurrency handling during increments.
- **Query Optimization:** Identifying potential bottlenecks in reporting queries.
### Phase 4: Frontend & Dashboard Review (@frontend-developer)
- **Focus on:**
- **Vanilla JS Patterns:** Review of Web Components and modular JS in `static/js`.
- **Security:** Protection against XSS in the dashboard and secure handling of local storage.
- **UI/UX Consistency:** Ensuring the management interface is intuitive and responsive.
- **API Integration:** Robustness of the frontend's communication with the Rust backend.
### Phase 5: Infrastructure & Deployment Review (@devops-engineer)
- **Focus on:**
- **Dockerfile Optimization:** Multi-stage builds to minimize image size and attack surface.
- **Resource Limits:** Configuration of CPU/Memory limits for the proxy container.
- **Deployment Docs:** Clarity of the setup process and environment variable documentation.
## Timeline (Gantt)
```mermaid
gantt
title LLM Proxy Code Review Timeline (March 2026)
dateFormat YYYY-MM-DD
section Backend & Security
Architecture & Rust Logic (Phase 1) :active, p1, 2026-03-06, 1d
Security & Auth Audit (Phase 2) :p2, 2026-03-07, 1d
section Data & Frontend
Database & Integrity (Phase 3) :p3, 2026-03-07, 1d
Frontend & Dashboard (Phase 4) :p4, 2026-03-08, 1d
section DevOps
Infra & Deployment (Phase 5) :p5, 2026-03-08, 1d
Final Review & Sign-off :2026-03-08, 4h
```
## Success Criteria
- **Security:** Zero high-priority vulnerabilities identified; all API keys masked in logs.
- **Performance:** Proxy overhead is minimal (<10ms latency addition); queries are indexed.
- **Maintainability:** Code passes all linting (`cargo clippy`) and formatting (`cargo fmt`) checks.
- **Documentation:** README and deployment guides are up-to-date and accurate.
- **Reliability:** Usage tracking matches actual API consumption with 99.9% accuracy.

480
DATABASE_REVIEW.md Normal file
View File

@@ -0,0 +1,480 @@
# Database Review Report for LLM-Proxy Repository
**Review Date:** 2025-03-06
**Reviewer:** Database Optimization Expert
**Repository:** llm-proxy
**Focus Areas:** Schema Design, Query Optimization, Migration Strategy, Data Integrity, Usage Tracking Accuracy
## Executive Summary
The llm-proxy database implementation demonstrates solid foundation with appropriate table structures and clear separation of concerns. However, several areas require improvement to ensure scalability, data consistency, and performance as usage grows. Key findings include:
1. **Schema Design**: Generally normalized but missing foreign key enforcement and some critical indexes.
2. **Query Optimization**: Well-optimized for most queries but missing composite indexes for common filtering patterns.
3. **Migration Strategy**: Ad-hoc migration approach that may cause issues with schema evolution.
4. **Data Integrity**: Potential race conditions in usage tracking and missing transaction boundaries.
5. **Usage Tracking**: Generally accurate but risk of inconsistent state between related tables.
This report provides detailed analysis and actionable recommendations for each area.
## 1. Schema Design Review
### Tables Overview
The database consists of 6 main tables:
1. **clients**: Client management with usage aggregates
2. **llm_requests**: Request logging with token counts and costs
3. **provider_configs**: Provider configuration and credit balances
4. **model_configs**: Model-specific configuration and cost overrides
5. **users**: Dashboard user authentication
6. **client_tokens**: API token storage for client authentication
### Normalization Assessment
**Strengths:**
- Tables follow 3rd Normal Form (3NF) with appropriate separation
- Foreign key relationships properly defined
- No obvious data duplication across tables
**Areas for Improvement:**
- **Denormalized aggregates**: `clients.total_requests`, `total_tokens`, `total_cost` are derived from `llm_requests`. This introduces risk of inconsistency.
- **Provider credit balance**: Stored in `provider_configs` but also updated based on `llm_requests`. No audit trail for balance changes.
### Data Type Analysis
**Appropriate Choices:**
- INTEGER for token counts (cast from u32 to i64)
- REAL for monetary values
- DATETIME for timestamps using SQLite's CURRENT_TIMESTAMP
- TEXT for identifiers with appropriate length
**Potential Issues:**
- `llm_requests.request_body` and `response_body` defined as TEXT but always set to NULL - consider removing or making optional columns.
- `provider_configs.billing_mode` added via migration but default value not consistently applied to existing rows.
### Constraints and Foreign Keys
**Current Constraints:**
- Primary keys defined for all tables
- UNIQUE constraints on `clients.client_id`, `users.username`, `client_tokens.token`
- Foreign key definitions present but **not enforced** (SQLite default)
**Missing Constraints:**
- NOT NULL constraints missing on several columns where nullability not intended
- CHECK constraints for positive values (`credit_balance >= 0`)
- Foreign key enforcement not enabled
## 2. Query Optimization Analysis
### Indexing Strategy
**Existing Indexes:**
- `idx_clients_client_id` - Essential for client lookups
- `idx_clients_created_at` - Useful for chronological listing
- `idx_llm_requests_timestamp` - Critical for time-based queries
- `idx_llm_requests_client_id` - Supports client-specific queries
- `idx_llm_requests_provider` - Good for provider breakdowns
- `idx_llm_requests_status` - Low cardinality but acceptable
- `idx_client_tokens_token` UNIQUE - Essential for authentication
- `idx_client_tokens_client_id` - Supports token management
**Missing Critical Indexes:**
1. `model_configs.provider_id` - Foreign key column used in JOINs
2. `llm_requests(client_id, timestamp)` - Composite index for client time-series queries
3. `llm_requests(provider, timestamp)` - For provider performance analysis
4. `llm_requests(status, timestamp)` - For error trend analysis
### N+1 Query Detection
**Well-Optimized Areas:**
- Model configuration caching prevents repeated database hits
- Provider configs loaded in batch for dashboard display
- Client listing uses single efficient query
**Potential N+1 Patterns:**
- In `server/mod.rs` list_models function, cache lookup per model but this is in-memory
- No significant database N+1 issues identified
### Inefficient Query Patterns
**Query 1: Time-series aggregation with strftime()**
```sql
SELECT strftime('%Y-%m-%d', timestamp) as date, ...
FROM llm_requests
WHERE 1=1 {}
GROUP BY date, client_id, provider, model
ORDER BY date DESC
LIMIT 200
```
**Issue:** Function on indexed column prevents index utilization for the WHERE clause when filtering by timestamp range.
**Recommendation:** Store computed date column or use range queries on timestamp directly.
**Query 2: Today's stats using strftime()**
```sql
WHERE strftime('%Y-%m-%d', timestamp) = ?
```
**Issue:** Non-sargable query prevents index usage.
**Recommendation:** Use range query:
```sql
WHERE timestamp >= date(?) AND timestamp < date(?, '+1 day')
```
### Recommended Index Additions
```sql
-- Composite indexes for common query patterns
CREATE INDEX idx_llm_requests_client_timestamp ON llm_requests(client_id, timestamp);
CREATE INDEX idx_llm_requests_provider_timestamp ON llm_requests(provider, timestamp);
CREATE INDEX idx_llm_requests_status_timestamp ON llm_requests(status, timestamp);
-- Foreign key index
CREATE INDEX idx_model_configs_provider_id ON model_configs(provider_id);
-- Optional: Covering index for client usage queries
CREATE INDEX idx_clients_usage ON clients(client_id, total_requests, total_tokens, total_cost);
```
## 3. Migration Strategy Assessment
### Current Approach
The migration system uses a hybrid approach:
1. **Schema synchronization**: `CREATE TABLE IF NOT EXISTS` on startup
2. **Ad-hoc migrations**: `ALTER TABLE` statements with error suppression
3. **Single migration file**: `migrations/001-add-billing-mode.sql` with transaction wrapper
**Pros:**
- Simple to understand and maintain
- Automatic schema creation for new deployments
- Error suppression prevents crashes on column existence
**Cons:**
- No version tracking of applied migrations
- Potential for inconsistent schema across deployments
- `ALTER TABLE` error suppression hides genuine schema issues
- No rollback capability
### Risks and Limitations
1. **Schema Drift**: Different instances may have different schemas if migrations are applied out of order
2. **Data Loss Risk**: No backup/verification before schema changes
3. **Production Issues**: Error suppression could mask migration failures until runtime
### Recommendations
1. **Implement Proper Migration Tooling**: Use `sqlx migrate` or similar versioned migration system
2. **Add Migration Version Table**: Track applied migrations and checksum verification
3. **Separate Migration Scripts**: One file per migration with up/down directions
4. **Pre-deployment Validation**: Schema checks in CI/CD pipeline
5. **Backup Strategy**: Automatic backups before migration execution
## 4. Data Integrity Evaluation
### Foreign Key Enforcement
**Critical Issue:** Foreign key constraints are defined but **not enforced** in SQLite.
**Impact:** Orphaned records, inconsistent referential integrity.
**Solution:** Enable foreign key support in connection string:
```rust
let options = SqliteConnectOptions::from_str(&format!("sqlite:{}", database_path))?
.create_if_missing(true)
.pragma("foreign_keys", "ON");
```
### Transaction Usage
**Good Patterns:**
- Request logging uses transactions for insert + provider balance update
- Atomic UPDATE for client usage statistics
**Problematic Areas:**
1. **Split Transactions**: Client usage update and request logging are in separate transactions
- In `logging/mod.rs`: `insert_log` transaction includes provider balance update
- In `utils/streaming.rs`: Client usage updated separately after logging
- **Risk**: Partial updates if one transaction fails
2. **No Transaction for Client Creation**: Client and token creation not atomic
**Recommendations:**
- Wrap client usage update within the same transaction as request logging
- Use transaction for client + token creation
- Consider using savepoints for complex operations
### Race Conditions and Consistency
**Potential Race Conditions:**
1. **Provider credit balance**: Concurrent requests may cause lost updates
- Current: `UPDATE provider_configs SET credit_balance = credit_balance - ?`
- SQLite provides serializable isolation, but negative balances not prevented
2. **Client usage aggregates**: Concurrent updates to `total_requests`, `total_tokens`, `total_cost`
- Similar UPDATE pattern, generally safe but consider idempotency
**Recommendations:**
- Add check constraint: `CHECK (credit_balance >= 0)`
- Implement idempotent request logging with unique request IDs
- Consider optimistic concurrency control for critical balances
## 5. Usage Tracking Accuracy
### Token Counting Methodology
**Current Approach:**
- Prompt tokens: Estimated using provider-specific estimators
- Completion tokens: Estimated or from provider real usage data
- Cache tokens: Separately tracked for cache-aware pricing
**Strengths:**
- Fallback to estimation when provider doesn't report usage
- Cache token differentiation for accurate pricing
**Weaknesses:**
- Estimation may differ from actual provider counts
- No validation of provider-reported token counts
### Cost Calculation
**Well Implemented:**
- Model-specific cost overrides via `model_configs`
- Cache-aware pricing when supported by registry
- Provider fallback calculations
**Potential Issues:**
- Floating-point precision for monetary calculations
- No rounding strategy for fractional cents
### Update Consistency
**Inconsistency Risk:** Client aggregates updated separately from request logging.
**Example Flow:**
1. Request log inserted and provider balance updated (transaction)
2. Client usage updated (separate operation)
3. If step 2 fails, client stats undercount usage
**Solution:** Include client update in the same transaction:
```rust
// In insert_log function, add:
UPDATE clients
SET total_requests = total_requests + 1,
total_tokens = total_tokens + ?,
total_cost = total_cost + ?
WHERE client_id = ?;
```
### Financial Accuracy
**Good Practices:**
- Token-level granularity for cost calculation
- Separation of prompt/completion/cache pricing
- Database persistence for audit trail
**Recommendations:**
1. **Audit Trail**: Add `balance_transactions` table for provider credit changes
2. **Rounding Policy**: Define rounding strategy (e.g., to 6 decimal places)
3. **Validation**: Periodic reconciliation of aggregates vs. detail records
## 6. Performance Recommendations
### Schema Improvements
1. **Partitioning Strategy**: For high-volume `llm_requests`, consider:
- Monthly partitioning by timestamp
- Archive old data to separate tables
2. **Data Retention Policy**: Implement automatic cleanup of old request logs
```sql
DELETE FROM llm_requests WHERE timestamp < date('now', '-90 days');
```
3. **Column Optimization**: Remove unused `request_body`, `response_body` columns or implement compression
### Query Optimizations
1. **Avoid Functions on Indexed Columns**: Rewrite date queries as range queries
2. **Batch Updates**: Consider batch updates for client usage instead of per-request
3. **Read Replicas**: For dashboard queries, consider separate read connection
### Connection Pooling
**Current:** SQLx connection pool with default settings
**Recommendations:**
- Configure pool size based on expected concurrency
- Implement connection health checks
- Monitor pool utilization metrics
### Monitoring Setup
**Essential Metrics:**
- Query execution times (slow query logging)
- Index usage statistics
- Table growth trends
- Connection pool utilization
**Implementation:**
- Add `sqlx::metrics` integration
- Regular `ANALYZE` execution for query planner
- Dashboard for database health monitoring
## 7. Security Considerations
### Data Protection
**Sensitive Data:**
- `provider_configs.api_key` - Should be encrypted at rest
- `users.password_hash` - Already hashed with bcrypt
- `client_tokens.token` - Plain text storage
**Recommendations:**
- Encrypt API keys using libsodium or similar
- Implement token hashing (similar to password hashing)
- Regular security audits of authentication flows
### SQL Injection Prevention
**Good Practices:**
- Use sqlx query builder with parameter binding
- No raw SQL concatenation observed in code review
**Verification Needed:** Ensure all dynamic SQL uses parameterized queries
### Access Controls
**Database Level:**
- SQLite lacks built-in user management
- Consider file system permissions for database file
- Application-level authentication is primary control
## 8. Summary of Critical Issues
**Priority 1 (Critical):**
1. Foreign key constraints not enabled
2. Split transactions risking data inconsistency
3. Missing composite indexes for common queries
**Priority 2 (High):**
1. No proper migration versioning system
2. Potential race conditions in balance updates
3. Non-sargable date queries impacting performance
**Priority 3 (Medium):**
1. Denormalized aggregates without consistency guarantees
2. No data retention policy for request logs
3. Missing check constraints for data validation
## 9. Recommended Action Plan
### Phase 1: Immediate Fixes (1-2 weeks)
1. Enable foreign key constraints in database connection
2. Add composite indexes for common query patterns
3. Fix transaction boundaries for client usage updates
4. Rewrite non-sargable date queries
### Phase 2: Short-term Improvements (3-4 weeks)
1. Implement proper migration system with version tracking
2. Add check constraints for data validation
3. Implement connection pooling configuration
4. Create database monitoring dashboard
### Phase 3: Long-term Enhancements (2-3 months)
1. Implement data retention and archiving strategy
2. Add audit trail for provider balance changes
3. Consider partitioning for high-volume tables
4. Implement encryption for sensitive data
### Phase 4: Ongoing Maintenance
1. Regular index maintenance and query plan analysis
2. Periodic reconciliation of aggregate vs. detail data
3. Security audits and dependency updates
4. Performance benchmarking and optimization
---
## Appendices
### A. Sample Migration Implementation
```sql
-- migrations/002-enable-foreign-keys.sql
PRAGMA foreign_keys = ON;
-- migrations/003-add-composite-indexes.sql
CREATE INDEX idx_llm_requests_client_timestamp ON llm_requests(client_id, timestamp);
CREATE INDEX idx_llm_requests_provider_timestamp ON llm_requests(provider, timestamp);
CREATE INDEX idx_model_configs_provider_id ON model_configs(provider_id);
```
### B. Transaction Fix Example
```rust
async fn insert_log(pool: &SqlitePool, log: RequestLog) -> Result<(), sqlx::Error> {
let mut tx = pool.begin().await?;
// Insert or ignore client
sqlx::query("INSERT OR IGNORE INTO clients (client_id, name, description) VALUES (?, ?, 'Auto-created from request')")
.bind(&log.client_id)
.bind(&log.client_id)
.execute(&mut *tx)
.await?;
// Insert request log
sqlx::query("INSERT INTO llm_requests ...")
.execute(&mut *tx)
.await?;
// Update provider balance
if log.cost > 0.0 {
sqlx::query("UPDATE provider_configs SET credit_balance = credit_balance - ? WHERE id = ? AND (billing_mode IS NULL OR billing_mode != 'postpaid')")
.bind(log.cost)
.bind(&log.provider)
.execute(&mut *tx)
.await?;
}
// Update client aggregates within same transaction
sqlx::query("UPDATE clients SET total_requests = total_requests + 1, total_tokens = total_tokens + ?, total_cost = total_cost + ? WHERE client_id = ?")
.bind(log.total_tokens as i64)
.bind(log.cost)
.bind(&log.client_id)
.execute(&mut *tx)
.await?;
tx.commit().await?;
Ok(())
}
```
### C. Monitoring Query Examples
```sql
-- Identify unused indexes
SELECT * FROM sqlite_master
WHERE type = 'index'
AND name NOT IN (
SELECT DISTINCT name
FROM sqlite_stat1
WHERE tbl = 'llm_requests'
);
-- Table size analysis
SELECT name, (pgsize * page_count) / 1024 / 1024 as size_mb
FROM dbstat
WHERE name = 'llm_requests';
-- Query performance analysis (requires EXPLAIN QUERY PLAN)
EXPLAIN QUERY PLAN
SELECT * FROM llm_requests
WHERE client_id = ? AND timestamp >= ?;
```
---
*This report provides a comprehensive analysis of the current database implementation and actionable recommendations for improvement. Regular review and iteration will ensure the database continues to meet performance, consistency, and scalability requirements as the application grows.*

62
PLAN.md Normal file
View File

@@ -0,0 +1,62 @@
# Project Plan: LLM Proxy Enhancements & Security Upgrade
This document outlines the roadmap for standardizing frontend security, cleaning up the codebase, upgrading session management to HMAC-signed tokens, and extending integration testing.
## Phase 1: Frontend Security Standardization
**Primary Agent:** `frontend-developer`
- [x] Audit `static/js/pages/users.js` for manual HTML string concatenation.
- [x] Replace custom escaping or unescaped injections with `window.api.escapeHtml`.
- [x] Verify user list and user detail rendering for XSS vulnerabilities.
## Phase 2: Codebase Cleanup
**Primary Agent:** `backend-developer`
- [x] Identify and remove unused imports in `src/config/mod.rs`.
- [x] Identify and remove unused imports in `src/providers/mod.rs`.
- [x] Run `cargo clippy` and `cargo fmt` to ensure adherence to standards.
## Phase 3: HMAC Architectural Upgrade
**Primary Agents:** `fullstack-developer`, `security-auditor`, `backend-developer`
### 3.1 Design (Security Auditor)
- [x] Define Token Structure: `base64(payload).signature`.
- Payload: `{ "session_id": "...", "username": "...", "role": "...", "exp": ... }`
- [x] Select HMAC algorithm (HMAC-SHA256).
- [x] Define environment variable for secret key: `SESSION_SECRET`.
### 3.2 Implementation (Backend Developer)
- [x] Refactor `src/dashboard/sessions.rs`:
- Integrate `hmac` and `sha2` crates (or similar).
- Update `create_session` to return signed tokens.
- Update `validate_session` to verify signature before checking store.
- [x] Implement activity-based session refresh:
- If session is valid and >50% through its TTL, extend `expires_at` and issue new signed token.
### 3.3 Integration (Fullstack Developer)
- [x] Update dashboard API handlers to handle new token format.
- [x] Update frontend session storage/retrieval if necessary.
## Phase 4: Extended Integration Testing
**Primary Agent:** `qa-automation`
- [ ] Setup test environment with encrypted key storage enabled.
- [ ] Implement end-to-end flow:
1. Store encrypted provider key via API.
2. Authenticate through Proxy.
3. Make proxied LLM request (verifying decryption and usage).
- [ ] Validate HMAC token expiration and refresh logic in automated tests.
## Phase 5: Code Quality & Refactoring
**Primary Agent:** `fullstack-developer`
- [x] Refactor dashboard monolith into modular sub-modules (`auth.rs`, `usage.rs`, etc.).
- [x] Standardize error handling and remove `unwrap()` in production paths.
- [x] Implement system health metrics and backup functionality.
---
## Technical Standards
- **Rust:** No `unwrap()` in production code; use proper error handling (`Result`).
- **Frontend:** Always use `window.api` wrappers for sensitive operations.
- **Security:** Secrets must never be logged or hardcoded.

108
README.md
View File

@@ -26,6 +26,17 @@ A unified, high-performance LLM proxy gateway built in Rust. It provides a singl
- **Rate Limiting:** Per-client and global rate limits.
- **Cache-Aware Costing:** Tracks cache hit/miss tokens for accurate billing.
## Security
LLM Proxy is designed with security in mind:
- **HMAC Session Tokens:** Management dashboard sessions are secured using HMAC-SHA256 signed tokens.
- **Encrypted Provider Keys:** Sensitive LLM provider API keys are stored encrypted (AES-256-GCM) in the database.
- **Session Refresh:** Activity-based session extension prevents session hijacking while maintaining user convenience.
- **XSS Prevention:** Standardized frontend escaping using `window.api.escapeHtml`.
**Note:** You must define a `SESSION_SECRET` in your `.env` file for secure session signing.
## Tech Stack
- **Runtime:** Rust with Tokio.
@@ -39,6 +50,7 @@ A unified, high-performance LLM proxy gateway built in Rust. It provides a singl
- Rust (1.80+)
- SQLite3
- Docker (optional, for containerized deployment)
### Quick Start
@@ -53,10 +65,9 @@ A unified, high-performance LLM proxy gateway built in Rust. It provides a singl
```bash
cp .env.example .env
# Edit .env and add your API keys:
# SESSION_SECRET=... (Generate a strong random secret)
# OPENAI_API_KEY=sk-...
# GEMINI_API_KEY=AIza...
# DEEPSEEK_API_KEY=sk-...
# GROK_API_KEY=gk-... (optional)
```
3. Run the proxy:
@@ -66,50 +77,31 @@ A unified, high-performance LLM proxy gateway built in Rust. It provides a singl
The server starts on `http://localhost:8080` by default.
### Configuration
### Deployment (Docker)
Edit `config.toml` to customize providers, models, and settings:
A multi-stage `Dockerfile` is provided for efficient deployment:
```toml
[server]
port = 8080
host = "0.0.0.0"
```bash
# Build the container
docker build -t llm-proxy .
[database]
path = "./data/llm_proxy.db"
[providers.openai]
enabled = true
default_model = "gpt-4o"
[providers.gemini]
enabled = true
default_model = "gemini-2.0-flash"
[providers.deepseek]
enabled = true
default_model = "deepseek-reasoner"
[providers.grok]
enabled = false
default_model = "grok-beta"
[providers.ollama]
enabled = false
base_url = "http://localhost:11434/v1"
# Run the container
docker run -p 8080:8080 \
-e SESSION_SECRET=your-secure-secret \
-v ./data:/app/data \
llm-proxy
```
## Management Dashboard
Access the dashboard at `http://localhost:8080`:
Access the dashboard at `http://localhost:8080`. The dashboard architecture has been refactored into modular sub-components for better maintainability:
- **Overview:** Real-time request counters, system health, provider status.
- **Analytics:** Time-series charts, filterable by date, client, provider, and model.
- **Costs:** Budget tracking, cost breakdown by provider/client/model, projections.
- **Clients:** Create, revoke, and rotate API tokens; per-client usage stats.
- **Providers:** Enable/disable providers, test connections, configure API keys.
- **Monitoring:** Live request stream via WebSocket, response times, error rates.
- **Users:** Admin/user management with role-based access control.
- **Auth (`/api/auth`):** Login, session management, and password changes.
- **Usage (`/api/usage`):** Summary stats, time-series analytics, and provider breakdown.
- **Clients (`/api/clients`):** API key management and per-client usage tracking.
- **Providers (`/api/providers`):** Provider configuration, status monitoring, and connection testing.
- **System (`/api/system`):** Health metrics, live logs, database backups, and global settings.
- **Monitoring:** Live request stream via WebSocket.
### Default Credentials
@@ -137,46 +129,6 @@ response = client.chat.completions.create(
)
```
### Open WebUI
```
API Base URL: http://your-server:8080/v1
API Key: YOUR_CLIENT_API_KEY
```
### cURL
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_CLIENT_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": false
}'
```
## Model Discovery
The proxy exposes `/v1/models` for OpenAI-compatible client model discovery:
```bash
curl http://localhost:8080/v1/models \
-H "Authorization: Bearer YOUR_CLIENT_API_KEY"
```
## Troubleshooting
### Streaming Issues
If clients timeout or show "TransferEncodingError", ensure:
1. Proxy buffering is disabled in nginx: `proxy_buffering off;`
2. Chunked transfer is enabled: `chunked_transfer_encoding on;`
3. Timeouts are sufficient: `proxy_read_timeout 7200s;`
### Provider Errors
- Check API keys are set in `.env`
- Test provider in dashboard (Settings → Providers → Test)
- Review logs: `journalctl -u llm-proxy -f`
## License
MIT OR Apache-2.0

58
RUST_BACKEND_REVIEW.md Normal file
View File

@@ -0,0 +1,58 @@
# LLM Proxy Rust Backend Code Review Report
## Executive Summary
This code review examines the `llm-proxy` Rust backend, focusing on architectural soundness, performance characteristics, and adherence to Rust idioms. The codebase demonstrates solid engineering with well-structured modular design, comprehensive error handling, and thoughtful async patterns. However, several areas require attention for production readiness, particularly around thread safety, memory efficiency, and error recovery.
## 1. Core Proxy Logic Review
### Strengths
- Clean provider abstraction (`Provider` trait).
- Streaming support with `AggregatingStream` for token counting.
- Model mapping and caching system.
### Issues Found
- **Provider Manager Thread Safety Risk:** O(n) lookups using `Vec` with `RwLock`. Use `DashMap` instead.
- **Streaming Memory Inefficiency:** Accumulates complete response content in memory.
- **Model Registry Cache Invalidation:** No strategy when config changes via dashboard.
## 2. State Management Review
- **Token Bucket Algorithm Flaw:** Custom implementation lacks thread-safe refill sync. Use `governor` crate.
- **Broadcast Channel Unbounded Growth Risk:** Fixed-size (100) may drop messages.
- **Database Connection Pool Contention:** SQLite connections shared without differentiation.
## 3. Error Handling Review
- **Error Recovery Missing:** No circuit breaker or retry logic for provider calls.
- **Stream Error Logging Gap:** Stream errors swallowed without logging partial usage.
## 4. Rust Idioms and Performance
- **Unnecessary String Cloning:** Frequent cloning in authentication hot paths.
- **JSON Parsing Inefficiency:** Multiple passes with `serde_json::Value`. Use typed structs.
- **Missing `#[derive(Copy)]`:** For small enums like `CircuitState`.
## 5. Async Performance
- **Blocking Calls:** Token estimation may block async runtime.
- **Missing Connection Timeouts:** Only overall timeout, no separate read/write timeouts.
- **Unbounded Task Spawn:** For client usage updates under load.
## 6. Security Considerations
- **Token Leakage:** Redact tokens in `Debug` and `Display` impls.
- **No Request Size Limits:** Vulnerable to memory exhaustion.
## 7. Testability
- **Mocking Difficulty:** Tight coupling to concrete provider implementations.
- **Missing Integration Tests:** No E2E tests for streaming.
## 8. Summary of Critical Actions
### High Priority
1. Replace custom token bucket with `governor` crate.
2. Fix provider lookup O(n) scaling issue.
3. Implement proper error recovery with retries.
4. Add request size limits and timeout configurations.
### Medium Priority
1. Reduce string cloning in hot paths.
2. Implement cache invalidation for model configs.
3. Add connection pooling separation.
4. Improve streaming memory efficiency.
---
*Review conducted by: Senior Principal Engineer (Code Reviewer)*

58
SECURITY_AUDIT.md Normal file
View File

@@ -0,0 +1,58 @@
# LLM Proxy Security Audit Report
## Executive Summary
A comprehensive security audit of the `llm-proxy` repository was conducted. The audit identified **1 critical vulnerability**, **3 high-risk issues**, **4 medium-risk issues**, and **3 low-risk issues**. The most severe findings include Cross-Site Scripting (XSS) in the dashboard interface and insecure storage of provider API keys in the database.
## Detailed Findings
### Critical Risk Vulnerabilities
#### **CRITICAL-01: Cross-Site Scripting (XSS) in Dashboard Interface**
- **Location**: `static/js/pages/clients.js` (multiple locations).
- **Description**: User-controlled data (e.g., `client.id`) inserted directly into HTML or `onclick` handlers without escaping.
- **Impact**: Arbitrary JavaScript execution in admin context, potentially stealing session tokens.
#### **CRITICAL-02: Insecure API Key Storage in Database**
- **Location**: `src/database/mod.rs`, `src/providers/mod.rs`, `src/dashboard/providers.rs`.
- **Description**: Provider API keys are stored in **plaintext** in the SQLite database.
- **Impact**: Compromised database file exposes all provider API keys.
### High Risk Vulnerabilities
#### **HIGH-01: Missing Input Validation and Size Limits**
- **Location**: `src/server/mod.rs`, `src/models/mod.rs`.
- **Impact**: Denial of Service via large payloads.
#### **HIGH-02: Sensitive Data Logging Without Encryption**
- **Location**: `src/database/mod.rs`, `src/logging/mod.rs`.
- **Description**: Full request and response bodies stored in `llm_requests` table without encryption or redaction.
#### **HIGH-03: Weak Default Credentials and Password Policy**
- **Description**: Default admin password is 'admin' with only 4-character minimum password length.
### Medium Risk Vulnerabilities
#### **MEDIUM-01: Missing CSRF Protection**
- No CSRF tokens or SameSite cookie attributes for state-changing dashboard endpoints.
#### **MEDIUM-02: Insecure Session Management**
- Session tokens stored in localStorage without HttpOnly flag.
- Tokens use simple `session-{uuid}` format.
#### **MEDIUM-03: Error Information Leakage**
- Internal error details exposed to clients in some cases.
#### **MEDIUM-04: Outdated Dependencies**
- Outdated versions of `chrono`, `tokio`, and `reqwest`.
### Low Risk Vulnerabilities
- Missing security headers (CSP, HSTS, X-Frame-Options).
- Insufficient rate limiting on dashboard authentication.
- No database encryption at rest.
## Recommendations
### Immediate Actions
1. **Fix XSS Vulnerabilities:** Implement proper HTML escaping for all user-controlled data.
2. **Secure API Key Storage:** Encrypt API keys in database using a library like `ring`.
3. **Implement Input Validation:** Add maximum payload size limits (e.g., 10MB).
4. **Improve Data Protection:** Add option to disable request/response body logging.
---
*Report generated by Security Auditor Agent on March 6, 2026*

14
timeline.mmd Normal file
View File

@@ -0,0 +1,14 @@
gantt
title LLM Proxy Project Timeline
dateFormat YYYY-MM-DD
section Frontend
Standardize Escaping (users.js) :a1, 2026-03-06, 1d
section Backend Cleanup
Remove Unused Imports :b1, 2026-03-06, 1d
section HMAC Migration
Architecture Design :c1, 2026-03-07, 1d
Backend Implementation :c2, after c1, 2d
Session Refresh Logic :c3, after c2, 1d
section Testing
Integration Test (Encrypted Keys) :d1, 2026-03-09, 2d
HMAC Verification Tests :d2, after c3, 1d