2026-02-26 11:51:36 -05:00
2026-02-26 11:51:36 -05:00
2026-02-26 11:51:36 -05:00
2026-02-26 11:51:36 -05:00
2026-02-26 11:51:36 -05:00
2026-02-26 11:51:36 -05:00
2026-02-26 11:51:36 -05:00
2026-02-26 11:51:36 -05:00
2026-02-26 11:51:36 -05:00
2026-02-26 11:51:36 -05:00
2026-02-26 11:51:36 -05:00

LLM Proxy Gateway

A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok) with built-in token tracking, real-time cost calculation, and a management dashboard.

🚀 Features

  • Unified API: Fully OpenAI-compatible /v1/chat/completions endpoint.
  • Multi-Provider Support:
    • OpenAI: Standard models (GPT-4o, GPT-3.5, etc.) and reasoning models (o1, o3).
    • Google Gemini: Support for the latest Gemini 2.0 models.
    • DeepSeek: High-performance, low-cost integration.
    • xAI Grok: Integration for Grok-series models.
    • Ollama: Support for local LLMs running on your machine or another host.
  • Observability & Tracking:
    • Real-time Costing: Fetches live pricing and context specs from models.dev on startup.
    • Token Counting: Precise estimation using tiktoken-rs.
    • Database Logging: Every request is logged to SQLite for historical analysis.
    • Streaming Support: Full SSE (Server-Sent Events) support with aggregated token tracking.
  • Multimodal (Vision): Support for image processing (Base64 and remote URLs) across compatible providers.
  • Reliability:
    • Circuit Breaking: Automatically protects your system when providers are down.
    • Rate Limiting: Granular per-client and global rate limits.
  • Management Dashboard: A modern, real-time web interface to monitor usage, costs, and system health.

🛠️ Tech Stack

  • Runtime: Rust (2024 Edition) with Tokio.
  • Web Framework: Axum.
  • Database: SQLx with SQLite.
  • Frontend: Vanilla JS/CSS (no heavyweight framework required).

🚦 Getting Started

Prerequisites

  • Rust (1.80+)
  • SQLite3

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/llm-proxy.git
    cd llm-proxy
    
  2. Set up your environment:

    cp .env.example .env
    # Edit .env and add your API keys
    
  3. Configure providers and server: Edit config.toml to customize models, pricing fallbacks, and port settings.

    Ollama Example (config.toml):

    [providers.ollama]
    enabled = true
    base_url = "http://192.168.1.50:11434/v1"
    models = ["llama3", "mistral"]
    
  4. Run the proxy:

    cargo run --release
    

The server will start at http://localhost:3000 (by default).

📊 Management Dashboard

Access the built-in dashboard at http://localhost:3000 to see:

  • Usage Summary: Total requests, tokens, and USD spent.
  • Trend Charts: 24-hour request and cost distributions.
  • Live Logs: Real-time stream of incoming LLM requests via WebSockets.
  • Provider Health: Monitor which providers are online or degraded.

🔌 API Usage

The proxy is designed to be a drop-in replacement for OpenAI. Simply change your base URL:

Example Request (Python):

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/v1",
    api_key="your-proxy-client-id" # Hashed sk- keys are managed in the dashboard
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello from the proxy!"}]
)

⚖️ License

MIT OR Apache-2.0

Description
No description provided
Readme 3.2 GiB
Languages
JavaScript 54.7%
Go 34.6%
CSS 8.3%
HTML 2.2%
Dockerfile 0.2%