hobokenchicken/GopherGate

Fork 0

Go to file

hobokenchicken db5824f0fb

CI / Check (push) Has been cancelled

Details

CI / Clippy (push) Has been cancelled

Details

CI / Formatting (push) Has been cancelled

Details

CI / Test (push) Has been cancelled

Details

CI / Release Build (push) Has been cancelled

Details

feat: add cache token tracking and cache-aware cost calculation

Track cache_read_tokens and cache_write_tokens end-to-end: parse from
provider responses (OpenAI, DeepSeek, Grok, Gemini), persist to SQLite,
apply cache-aware pricing from the model registry, and surface in API
responses and the dashboard.

- Add cache fields to ProviderResponse, StreamUsage, RequestLog structs
- Parse cached_tokens (OpenAI/Grok), prompt_cache_hit/miss (DeepSeek),
  cachedContentTokenCount (Gemini) from provider responses
- Send stream_options.include_usage for streaming; capture real usage
  from final SSE chunk in AggregatingStream
- ALTER TABLE migration for cache_read_tokens/cache_write_tokens columns
- Cache-aware cost formula using registry cache_read/cache_write rates
- Update Provider trait calculate_cost signature across all providers
- Add cache_read_tokens/cache_write_tokens to Usage API response
- Dashboard: cache hit rate card, cache columns in pricing and usage
  tables, cache token aggregation in SQL queries
- Remove API debug panel and verbose console logging from api.js
- Bump static asset cache-bust to v5

2026-03-02 14:45:21 -05:00

.github/workflows

refactor: comprehensive audit — fix bugs, harden security, deduplicate providers, add CI/Docker

2026-03-02 00:35:45 -05:00

data

fixed login page again

2026-03-01 03:47:27 -05:00

src

feat: add cache token tracking and cache-aware cost calculation

2026-03-02 14:45:21 -05:00

static

feat: add cache token tracking and cache-aware cost calculation

2026-03-02 14:45:21 -05:00

.env

chore: initial clean commit

2026-02-26 13:56:21 -05:00

.env.example

chore: initial clean commit

2026-02-26 13:56:21 -05:00

.gitignore

chore: initial clean commit

2026-02-26 13:56:21 -05:00

Cargo.lock

refactor: comprehensive audit — fix bugs, harden security, deduplicate providers, add CI/Docker

2026-03-02 00:35:45 -05:00

Cargo.toml

fix: restore let-chains and add rust-version = 1.87 to Cargo.toml

2026-03-02 08:31:37 -05:00

clippy.toml

refactor: comprehensive audit — fix bugs, harden security, deduplicate providers, add CI/Docker

2026-03-02 00:35:45 -05:00

DASHBOARD_README.md

chore: initial clean commit

2026-02-26 13:56:21 -05:00

deploy.sh

chore: initial clean commit

2026-02-26 13:56:21 -05:00

deployment.md

chore: initial clean commit

2026-02-26 13:56:21 -05:00

Dockerfile

refactor: comprehensive audit — fix bugs, harden security, deduplicate providers, add CI/Docker

2026-03-02 00:35:45 -05:00

OPTIMIZATION.md

chore: initial clean commit

2026-02-26 13:56:21 -05:00

README.md

chore: initial clean commit

2026-02-26 13:56:21 -05:00

rustfmt.toml

refactor: comprehensive audit — fix bugs, harden security, deduplicate providers, add CI/Docker

2026-03-02 00:35:45 -05:00

test_dashboard.sh

refactor: comprehensive audit — fix bugs, harden security, deduplicate providers, add CI/Docker

2026-03-02 00:35:45 -05:00

test_server.sh

chore: initial clean commit

2026-02-26 13:56:21 -05:00

README.md

LLM Proxy Gateway

A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok) with built-in token tracking, real-time cost calculation, and a management dashboard.

🚀 Features

Unified API: Fully OpenAI-compatible /v1/chat/completions endpoint.
Multi-Provider Support:
- OpenAI: Standard models (GPT-4o, GPT-3.5, etc.) and reasoning models (o1, o3).
- Google Gemini: Support for the latest Gemini 2.0 models.
- DeepSeek: High-performance, low-cost integration.
- xAI Grok: Integration for Grok-series models.
- Ollama: Support for local LLMs running on your machine or another host.
Observability & Tracking:
- Real-time Costing: Fetches live pricing and context specs from models.dev on startup.
- Token Counting: Precise estimation using tiktoken-rs.
- Database Logging: Every request is logged to SQLite for historical analysis.
- Streaming Support: Full SSE (Server-Sent Events) support with aggregated token tracking.
Multimodal (Vision): Support for image processing (Base64 and remote URLs) across compatible providers.
Reliability:
- Circuit Breaking: Automatically protects your system when providers are down.
- Rate Limiting: Granular per-client and global rate limits.
Management Dashboard: A modern, real-time web interface to monitor usage, costs, and system health.

🛠️ Tech Stack

Runtime: Rust (2024 Edition) with Tokio.
Web Framework: Axum.
Database: SQLx with SQLite.
Frontend: Vanilla JS/CSS (no heavyweight framework required).

🚦 Getting Started

Prerequisites

Rust (1.80+)
SQLite3

Installation

Clone the repository:

git clone https://github.com/yourusername/llm-proxy.git
cd llm-proxy

Set up your environment:

cp .env.example .env
# Edit .env and add your API keys

Configure providers and server: Edit config.toml to customize models, pricing fallbacks, and port settings.

Ollama Example (config.toml):
```
[providers.ollama]
enabled = true
base_url = "http://192.168.1.50:11434/v1"
models = ["llama3", "mistral"]
```
Run the proxy:
```
cargo run --release
```

The server will start at http://localhost:3000 (by default).

📊 Management Dashboard

Access the built-in dashboard at http://localhost:3000 to see:

Usage Summary: Total requests, tokens, and USD spent.
Trend Charts: 24-hour request and cost distributions.
Live Logs: Real-time stream of incoming LLM requests via WebSockets.
Provider Health: Monitor which providers are online or degraded.

🔌 API Usage

The proxy is designed to be a drop-in replacement for OpenAI. Simply change your base URL:

Example Request (Python):

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/v1",
    api_key="your-proxy-client-id" # Hashed sk- keys are managed in the dashboard
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello from the proxy!"}]
)

⚖️ License

MIT OR Apache-2.0

Languages

JavaScript 54.7%

Go 34.6%

CSS 8.3%

HTML 2.2%

Dockerfile 0.2%