Updated README, architecture, and TODO to reflect full feature parity, system metrics, and registry integration.
3.7 KiB
Backend Architecture (Go)
The LLM Proxy backend is implemented in Go, focusing on high performance, clear concurrency patterns, and maintainability.
Core Technologies
- Runtime: Go 1.22+
- Web Framework: Gin Gonic - Fast and lightweight HTTP routing.
- Database: sqlx - Lightweight wrapper for standard
database/sql. - SQLite Driver: modernc.org/sqlite - CGO-free SQLite implementation for ease of cross-compilation.
- Config: Viper - Robust configuration management supporting environment variables and files.
- Metrics: gopsutil - System-level resource monitoring.
Project Structure
├── cmd/
│ └── llm-proxy/ # Entry point (main.go)
├── internal/
│ ├── config/ # Configuration loading and validation
│ ├── db/ # Database schema, migrations, and models
│ ├── middleware/ # Auth and logging middleware
│ ├── models/ # Unified request/response structs
│ ├── providers/ # LLM provider implementations (OpenAI, Gemini, etc.)
│ ├── server/ # HTTP server, dashboard handlers, and WebSocket hub
│ └── utils/ # Common utilities (registry, pricing, etc.)
└── static/ # Frontend assets (served by the backend)
Key Components
1. Provider Interface (internal/providers/provider.go)
Standardized interface for all LLM backends. Implementations handle mapping between the unified format and provider-specific APIs (OpenAI, Gemini, DeepSeek, Grok).
2. Model Registry & Pricing (internal/utils/registry.go)
Integrates with models.dev/api.json to provide real-time model metadata and pricing.
- Fuzzy Matching: Supports matching versioned model IDs (e.g.,
gpt-4o-2024-08-06) to base registry entries. - Automatic Refreshes: The registry is fetched at startup and refreshed every 24 hours via a background goroutine.
3. Asynchronous Logging (internal/server/logging.go)
Uses a buffered channel and background worker to log every request to SQLite without blocking the client response. It also broadcasts logs to the WebSocket hub for real-time dashboard updates.
4. Session Management (internal/server/sessions.go)
Implements HMAC-SHA256 signed tokens for dashboard authentication. Tokens secure the management interface while standard Bearer tokens are used for LLM API access.
5. WebSocket Hub (internal/server/websocket.go)
A centralized hub for managing WebSocket connections, allowing real-time broadcast of system events, system metrics, and request logs to the dashboard.
Concurrency Model
Go's goroutines and channels are used extensively:
- Streaming: Each streaming request uses a goroutine to read and parse the provider's response, feeding chunks into a channel for SSE delivery.
- Logging: A single background worker processes the
logChanto perform serial database writes. - WebSocket: The
Hubruns in a dedicated goroutine, handling registration and broadcasting. - Maintenance: Background tasks handle registry refreshes and status monitoring.
Security
- Encryption Key: A mandatory 32-byte key is used for both session signing and encryption of sensitive data.
- Auth Middleware: Scoped to
/v1routes to verify client API keys against the database. - Bcrypt: Passwords for dashboard users are hashed using Bcrypt with a work factor of 12.
- Database Hardening: Automatic migrations ensure the schema is always current with the code.