hobokenchicken 232f092f27
Some checks failed
CI / Check (push) Has been cancelled
CI / Clippy (push) Has been cancelled
CI / Formatting (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Release Build (push) Has been cancelled
fix(server): map provider names to registry keys for /v1/models
The model registry from models.dev uses 'google' and 'xai' as provider
IDs, but internal providers use 'gemini' and 'grok'. Added mapping so
all provider models appear in the listing.
2026-03-02 14:14:56 -05:00
2026-03-01 03:47:27 -05:00
2026-02-26 13:56:21 -05:00
2026-02-26 13:56:21 -05:00
2026-02-26 13:56:21 -05:00
2026-02-26 13:56:21 -05:00
2026-02-26 13:56:21 -05:00
2026-02-26 13:56:21 -05:00
2026-02-26 13:56:21 -05:00
2026-02-26 13:56:21 -05:00

LLM Proxy Gateway

A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok) with built-in token tracking, real-time cost calculation, and a management dashboard.

🚀 Features

  • Unified API: Fully OpenAI-compatible /v1/chat/completions endpoint.
  • Multi-Provider Support:
    • OpenAI: Standard models (GPT-4o, GPT-3.5, etc.) and reasoning models (o1, o3).
    • Google Gemini: Support for the latest Gemini 2.0 models.
    • DeepSeek: High-performance, low-cost integration.
    • xAI Grok: Integration for Grok-series models.
    • Ollama: Support for local LLMs running on your machine or another host.
  • Observability & Tracking:
    • Real-time Costing: Fetches live pricing and context specs from models.dev on startup.
    • Token Counting: Precise estimation using tiktoken-rs.
    • Database Logging: Every request is logged to SQLite for historical analysis.
    • Streaming Support: Full SSE (Server-Sent Events) support with aggregated token tracking.
  • Multimodal (Vision): Support for image processing (Base64 and remote URLs) across compatible providers.
  • Reliability:
    • Circuit Breaking: Automatically protects your system when providers are down.
    • Rate Limiting: Granular per-client and global rate limits.
  • Management Dashboard: A modern, real-time web interface to monitor usage, costs, and system health.

🛠️ Tech Stack

  • Runtime: Rust (2024 Edition) with Tokio.
  • Web Framework: Axum.
  • Database: SQLx with SQLite.
  • Frontend: Vanilla JS/CSS (no heavyweight framework required).

🚦 Getting Started

Prerequisites

  • Rust (1.80+)
  • SQLite3

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/llm-proxy.git
    cd llm-proxy
    
  2. Set up your environment:

    cp .env.example .env
    # Edit .env and add your API keys
    
  3. Configure providers and server: Edit config.toml to customize models, pricing fallbacks, and port settings.

    Ollama Example (config.toml):

    [providers.ollama]
    enabled = true
    base_url = "http://192.168.1.50:11434/v1"
    models = ["llama3", "mistral"]
    
  4. Run the proxy:

    cargo run --release
    

The server will start at http://localhost:3000 (by default).

📊 Management Dashboard

Access the built-in dashboard at http://localhost:3000 to see:

  • Usage Summary: Total requests, tokens, and USD spent.
  • Trend Charts: 24-hour request and cost distributions.
  • Live Logs: Real-time stream of incoming LLM requests via WebSockets.
  • Provider Health: Monitor which providers are online or degraded.

🔌 API Usage

The proxy is designed to be a drop-in replacement for OpenAI. Simply change your base URL:

Example Request (Python):

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/v1",
    api_key="your-proxy-client-id" # Hashed sk- keys are managed in the dashboard
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello from the proxy!"}]
)

⚖️ License

MIT OR Apache-2.0

Description
No description provided
Readme 3.2 GiB
Languages
JavaScript 54.7%
Go 34.6%
CSS 8.3%
HTML 2.2%
Dockerfile 0.2%