From 0ce5f4f4908c581eab60cd27a9c8b015150ffa2f Mon Sep 17 00:00:00 2001 From: hobokenchicken Date: Thu, 19 Mar 2026 13:26:31 -0400 Subject: [PATCH] docs: finalize documentation for Go migration Updated README, architecture, and TODO to reflect full feature parity, system metrics, and registry integration. --- BACKEND_ARCHITECTURE.md | 37 +++++++++++++++++++------------------ README.md | 8 +++++++- TODO.md | 16 +++++++++++----- 3 files changed, 37 insertions(+), 24 deletions(-) diff --git a/BACKEND_ARCHITECTURE.md b/BACKEND_ARCHITECTURE.md index 0ee3e0b9..4be40dfd 100644 --- a/BACKEND_ARCHITECTURE.md +++ b/BACKEND_ARCHITECTURE.md @@ -9,6 +9,7 @@ The LLM Proxy backend is implemented in Go, focusing on high performance, clear - **Database:** [sqlx](https://github.com/jmoiron/sqlx) - Lightweight wrapper for standard `database/sql`. - **SQLite Driver:** [modernc.org/sqlite](https://modernc.org/sqlite) - CGO-free SQLite implementation for ease of cross-compilation. - **Config:** [Viper](https://github.com/spf13/viper) - Robust configuration management supporting environment variables and files. +- **Metrics:** [gopsutil](https://github.com/shirou/gopsutil) - System-level resource monitoring. ## Project Structure @@ -22,40 +23,40 @@ The LLM Proxy backend is implemented in Go, focusing on high performance, clear │ ├── models/ # Unified request/response structs │ ├── providers/ # LLM provider implementations (OpenAI, Gemini, etc.) │ ├── server/ # HTTP server, dashboard handlers, and WebSocket hub -│ └── utils/ # Common utilities (multimodal, etc.) +│ └── utils/ # Common utilities (registry, pricing, etc.) └── static/ # Frontend assets (served by the backend) ``` ## Key Components ### 1. Provider Interface (`internal/providers/provider.go`) -Standardized interface for all LLM backends: -```go -type Provider interface { - Name() string - ChatCompletion(ctx context.Context, req *models.UnifiedRequest) (*models.ChatCompletionResponse, error) - ChatCompletionStream(ctx context.Context, req *models.UnifiedRequest) (<-chan *models.ChatCompletionStreamResponse, error) -} -``` +Standardized interface for all LLM backends. Implementations handle mapping between the unified format and provider-specific APIs (OpenAI, Gemini, DeepSeek, Grok). -### 2. Asynchronous Logging (`internal/server/logging.go`) +### 2. Model Registry & Pricing (`internal/utils/registry.go`) +Integrates with `models.dev/api.json` to provide real-time model metadata and pricing. +- **Fuzzy Matching:** Supports matching versioned model IDs (e.g., `gpt-4o-2024-08-06`) to base registry entries. +- **Automatic Refreshes:** The registry is fetched at startup and refreshed every 24 hours via a background goroutine. + +### 3. Asynchronous Logging (`internal/server/logging.go`) Uses a buffered channel and background worker to log every request to SQLite without blocking the client response. It also broadcasts logs to the WebSocket hub for real-time dashboard updates. -### 3. Session Management (`internal/server/sessions.go`) -Implements HMAC-SHA256 signed tokens for dashboard authentication. Sessions are stored in-memory with configurable TTL. +### 4. Session Management (`internal/server/sessions.go`) +Implements HMAC-SHA256 signed tokens for dashboard authentication. Tokens secure the management interface while standard Bearer tokens are used for LLM API access. -### 4. WebSocket Hub (`internal/server/websocket.go`) -A centralized hub for managing WebSocket connections, allowing real-time broadcast of system events and request logs to the dashboard. +### 5. WebSocket Hub (`internal/server/websocket.go`) +A centralized hub for managing WebSocket connections, allowing real-time broadcast of system events, system metrics, and request logs to the dashboard. ## Concurrency Model Go's goroutines and channels are used extensively: -- **Streaming:** Each streaming request uses a goroutine to read and parse the provider's response, feeding chunks into a channel. -- **Logging:** A single background worker processes the `logChan` to perform database writes. +- **Streaming:** Each streaming request uses a goroutine to read and parse the provider's response, feeding chunks into a channel for SSE delivery. +- **Logging:** A single background worker processes the `logChan` to perform serial database writes. - **WebSocket:** The `Hub` runs in a dedicated goroutine, handling registration and broadcasting. +- **Maintenance:** Background tasks handle registry refreshes and status monitoring. ## Security -- **Encryption Key:** A mandatory 32-byte key is used for both session signing and encryption of sensitive data in the database. -- **Auth Middleware:** Verifies client API keys against the database before proxying requests to LLM providers. +- **Encryption Key:** A mandatory 32-byte key is used for both session signing and encryption of sensitive data. +- **Auth Middleware:** Scoped to `/v1` routes to verify client API keys against the database. - **Bcrypt:** Passwords for dashboard users are hashed using Bcrypt with a work factor of 12. +- **Database Hardening:** Automatic migrations ensure the schema is always current with the code. diff --git a/README.md b/README.md index 2a3176f8..a4ad34a0 100644 --- a/README.md +++ b/README.md @@ -102,7 +102,13 @@ Access the dashboard at `http://localhost:8080`. ### Default Credentials - **Username:** `admin` -- **Password:** `admin` (You will be prompted to change this or should change it manually in the dashboard) +- **Password:** `admin123` (You will be prompted to change this on first login) + +**Forgot Password?** +You can reset the admin password to default by running: +```bash +./llm-proxy -reset-admin +``` ## API Usage diff --git a/TODO.md b/TODO.md index 879a6b43..bb1df460 100644 --- a/TODO.md +++ b/TODO.md @@ -2,9 +2,9 @@ ## Completed Tasks - [x] Initial Go project setup -- [x] Database schema & migrations +- [x] Database schema & migrations (hardcoded in `db.go`) - [x] Configuration loader (Viper) -- [x] Auth Middleware +- [x] Auth Middleware (scoped to `/v1`) - [x] Basic Provider implementations (OpenAI, Gemini, DeepSeek, Grok) - [x] Streaming Support (SSE & Gemini custom streaming) - [x] Archive Rust files to `rust` branch @@ -12,16 +12,21 @@ - [x] Enhanced `helpers.go` for Multimodal & Tool Calling (OpenAI compatible) - [x] Enhanced `server.go` for robust request conversion - [x] Dashboard Management APIs (Clients, Tokens, Users, Providers) -- [x] Dashboard Analytics & Usage Summary -- [x] WebSocket for real-time dashboard updates +- [x] Dashboard Analytics & Usage Summary (Fixed SQL robustness) +- [x] WebSocket for real-time dashboard updates (Hub with client counting) - [x] Asynchronous Request Logging to SQLite - [x] Update documentation (README, deployment, architecture) +- [x] Cost Tracking accuracy (Registry integration with `models.dev`) +- [x] Model Listing endpoint (`/v1/models`) with provider filtering +- [x] System Metrics endpoint (`/api/system/metrics` using `gopsutil`) +- [x] Fixed dashboard 404s and 500s ## Feature Parity Checklist (High Priority) ### OpenAI Provider - [x] Tool Calling - [x] Multimodal (Images) support +- [x] Accurate usage parsing (cached & reasoning tokens) - [ ] Reasoning Content (CoT) support for `o1`, `o3` (need to ensure it's parsed in responses) - [ ] Support for `/v1/responses` API (required for some gpt-5/o1 models) @@ -35,15 +40,16 @@ - [x] Reasoning Content (CoT) support - [x] Parameter sanitization for `deepseek-reasoner` - [x] Tool Calling support +- [x] Accurate usage parsing (cache hits & reasoning) ### Grok Provider - [x] Tool Calling support - [x] Multimodal support +- [x] Accurate usage parsing (via OpenAI helper) ## Infrastructure & Middleware - [ ] Implement Rate Limiting (`golang.org/x/time/rate`) - [ ] Implement Circuit Breaker (`github.com/sony/gobreaker`) -- [ ] Implement Model Cost Calculation logic (needs registry/pricing integration) ## Verification - [ ] Unit tests for feature-specific mapping (CoT, Tools, Images)