hobokenchicken/GopherGate

Fork 0

Go to file

hobokenchicken 6a324c08c7

CI / Check (push) Has been cancelled

Details

CI / Clippy (push) Has been cancelled

Details

CI / Formatting (push) Has been cancelled

Details

CI / Test (push) Has been cancelled

Details

CI / Release Build (push) Has been cancelled

Details

fix(deepseek): more aggressive R1 fix and detailed error logging

- Ensure assistant tool calls always have content: "" instead of null.
- Add debug-level logging of the full request body when DeepSeek returns 400.
- Improved R1 placeholder injection to include content sanitation.

2026-03-05 19:11:46 +00:00

.github/workflows

refactor: comprehensive audit — fix bugs, harden security, deduplicate providers, add CI/Docker

2026-03-02 00:35:45 -05:00

.opencode/plans

fix(streaming): use iter instead of once for [DONE] marker

2026-03-03 11:55:59 -05:00

data

idk

2026-03-03 17:30:17 +00:00

migrations

feat(billing): add billing_mode to providers (postpaid support) & UI/migration

2026-03-03 15:37:19 -05:00

src

fix(deepseek): more aggressive R1 fix and detailed error logging

2026-03-05 19:11:46 +00:00

static

feat(billing): add billing_mode to providers (postpaid support) & UI/migration

2026-03-03 15:37:19 -05:00

.env

chore: initial clean commit

2026-02-26 13:56:21 -05:00

.env.example

chore: initial clean commit

2026-02-26 13:56:21 -05:00

.gitignore

chore: initial clean commit

2026-02-26 13:56:21 -05:00

Cargo.lock

feat(auth): add DB-based token authentication for dashboard-created clients

2026-03-02 15:14:12 -05:00

Cargo.toml

feat(auth): add DB-based token authentication for dashboard-created clients

2026-03-02 15:14:12 -05:00

clippy.toml

refactor: comprehensive audit — fix bugs, harden security, deduplicate providers, add CI/Docker

2026-03-02 00:35:45 -05:00

DASHBOARD_README.md

docs: update README, deployment guide, and dashboard docs

2026-03-03 13:06:37 -05:00

deploy.sh

fix(config): wire up LLM_PROXY__CONFIG_PATH env var and fix database path in service

2026-03-03 10:11:09 -05:00

deployment.md

docs: update README, deployment guide, and dashboard docs

2026-03-03 13:06:37 -05:00

Dockerfile

refactor: comprehensive audit — fix bugs, harden security, deduplicate providers, add CI/Docker

2026-03-02 00:35:45 -05:00

OPTIMIZATION.md

chore: initial clean commit

2026-02-26 13:56:21 -05:00

README.md

docs: update README, deployment guide, and dashboard docs

2026-03-03 13:06:37 -05:00

rustfmt.toml

refactor: comprehensive audit — fix bugs, harden security, deduplicate providers, add CI/Docker

2026-03-02 00:35:45 -05:00

test_dashboard.sh

refactor: comprehensive audit — fix bugs, harden security, deduplicate providers, add CI/Docker

2026-03-02 00:35:45 -05:00

test_server.sh

chore: initial clean commit

2026-02-26 13:56:21 -05:00

README.md

LLM Proxy Gateway

A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok, Ollama) with built-in token tracking, real-time cost calculation, multi-user authentication, and a management dashboard.

Features

Unified API: OpenAI-compatible /v1/chat/completions and /v1/models endpoints.
Multi-Provider Support:
- OpenAI: GPT-4o, GPT-4o Mini, o1, o3 reasoning models.
- Google Gemini: Gemini 2.0 Flash, Pro, and vision models.
- DeepSeek: DeepSeek Chat and Reasoner models.
- xAI Grok: Grok-beta models.
- Ollama: Local LLMs running on your network.
Observability & Tracking:
- Real-time Costing: Fetches live pricing and context specs from models.dev on startup.
- Token Counting: Precise estimation using tiktoken-rs.
- Database Logging: Every request logged to SQLite for historical analysis.
- Streaming Support: Full SSE (Server-Sent Events) with [DONE] termination for client compatibility.
Multimodal (Vision): Image processing (Base64 and remote URLs) across compatible providers.
Multi-User Access Control:
- Admin Role: Full access to all dashboard features, user management, and system configuration.
- Viewer Role: Read-only access to usage analytics, costs, and monitoring.
- Client API Keys: Create and manage multiple client tokens for external integrations.
Reliability:
- Circuit Breaking: Automatically protects when providers are down.
- Rate Limiting: Per-client and global rate limits.
- Cache-Aware Costing: Tracks cache hit/miss tokens for accurate billing.

Tech Stack

Runtime: Rust with Tokio.
Web Framework: Axum.
Database: SQLx with SQLite.
Frontend: Vanilla JS/CSS with Chart.js for visualizations.

Getting Started

Prerequisites

Rust (1.80+)
SQLite3

Quick Start

Clone and build:

git clone ssh://git.dustin.coffee:2222/hobokenchicken/llm-proxy.git
cd llm-proxy
cargo build --release

Configure environment:

cp .env.example .env
# Edit .env and add your API keys:
# OPENAI_API_KEY=sk-...
# GEMINI_API_KEY=AIza...
# DEEPSEEK_API_KEY=sk-...
# GROK_API_KEY=gk-...  (optional)

Run the proxy:
```
cargo run --release
```

The server starts on http://localhost:8080 by default.

Configuration

Edit config.toml to customize providers, models, and settings:

[server]
port = 8080
host = "0.0.0.0"

[database]
path = "./data/llm_proxy.db"

[providers.openai]
enabled = true
default_model = "gpt-4o"

[providers.gemini]
enabled = true
default_model = "gemini-2.0-flash"

[providers.deepseek]
enabled = true
default_model = "deepseek-reasoner"

[providers.grok]
enabled = false
default_model = "grok-beta"

[providers.ollama]
enabled = false
base_url = "http://localhost:11434/v1"

Management Dashboard

Access the dashboard at http://localhost:8080:

Overview: Real-time request counters, system health, provider status.
Analytics: Time-series charts, filterable by date, client, provider, and model.
Costs: Budget tracking, cost breakdown by provider/client/model, projections.
Clients: Create, revoke, and rotate API tokens; per-client usage stats.
Providers: Enable/disable providers, test connections, configure API keys.
Monitoring: Live request stream via WebSocket, response times, error rates.
Users: Admin/user management with role-based access control.

Default Credentials

Username: admin
Password: admin123

Change the admin password in the dashboard after first login!

API Usage

The proxy is a drop-in replacement for OpenAI. Configure your client:

Python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="YOUR_CLIENT_API_KEY"  # Create in dashboard
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

Open WebUI

API Base URL: http://your-server:8080/v1
API Key: YOUR_CLIENT_API_KEY

cURL

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_CLIENT_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": false
  }'

Model Discovery

The proxy exposes /v1/models for OpenAI-compatible client model discovery:

curl http://localhost:8080/v1/models \
  -H "Authorization: Bearer YOUR_CLIENT_API_KEY"

Troubleshooting

Streaming Issues

If clients timeout or show "TransferEncodingError", ensure:

Proxy buffering is disabled in nginx: proxy_buffering off;
Chunked transfer is enabled: chunked_transfer_encoding on;
Timeouts are sufficient: proxy_read_timeout 7200s;

Provider Errors

Check API keys are set in .env
Test provider in dashboard (Settings → Providers → Test)
Review logs: journalctl -u llm-proxy -f

License

MIT OR Apache-2.0

Languages

JavaScript 54.7%

Go 34.6%

CSS 8.3%

HTML 2.2%

Dockerfile 0.2%