hobokenchicken/GopherGate

Fork 0

T

hobokenchicken 6440e8cc13

CI / Check (push) Has been cancelled

Details

CI / Clippy (push) Has been cancelled

Details

CI / Formatting (push) Has been cancelled

Details

CI / Test (push) Has been cancelled

Details

CI / Release Build (push) Has been cancelled

Details

fix(gemini): ensure final finish_reason is 'tool_calls' if any tools were seen

Gemini often sends tool calls in one chunk and then 'STOP' in a final chunk.
If we pass the raw 'stop' at the end, clients stop and ignore the previously
received tool calls. We now track if any tools were seen and override the
final 'stop' to 'tool_calls'.

2026-03-05 17:50:25 +00:00

.github/workflows

refactor: comprehensive audit — fix bugs, harden security, deduplicate providers, add CI/Docker

2026-03-02 00:35:45 -05:00

.opencode/plans

fix(streaming): use iter instead of once for [DONE] marker

2026-03-03 11:55:59 -05:00

data

idk

2026-03-03 17:30:17 +00:00

migrations

feat(billing): add billing_mode to providers (postpaid support) & UI/migration

2026-03-03 15:37:19 -05:00

src

fix(gemini): ensure final finish_reason is 'tool_calls' if any tools were seen

2026-03-05 17:50:25 +00:00

static

feat(billing): add billing_mode to providers (postpaid support) & UI/migration

2026-03-03 15:37:19 -05:00

.env

chore: initial clean commit

2026-02-26 13:56:21 -05:00

.env.example

chore: initial clean commit

2026-02-26 13:56:21 -05:00

.gitignore

chore: initial clean commit

2026-02-26 13:56:21 -05:00

Cargo.lock

feat(auth): add DB-based token authentication for dashboard-created clients

2026-03-02 15:14:12 -05:00

Cargo.toml

feat(auth): add DB-based token authentication for dashboard-created clients

2026-03-02 15:14:12 -05:00

clippy.toml

refactor: comprehensive audit — fix bugs, harden security, deduplicate providers, add CI/Docker

2026-03-02 00:35:45 -05:00

DASHBOARD_README.md

docs: update README, deployment guide, and dashboard docs

2026-03-03 13:06:37 -05:00

deploy.sh

fix(config): wire up LLM_PROXY__CONFIG_PATH env var and fix database path in service

2026-03-03 10:11:09 -05:00

deployment.md

docs: update README, deployment guide, and dashboard docs

2026-03-03 13:06:37 -05:00

Dockerfile

refactor: comprehensive audit — fix bugs, harden security, deduplicate providers, add CI/Docker

2026-03-02 00:35:45 -05:00

OPTIMIZATION.md

chore: initial clean commit

2026-02-26 13:56:21 -05:00

README.md

docs: update README, deployment guide, and dashboard docs

2026-03-03 13:06:37 -05:00

rustfmt.toml

refactor: comprehensive audit — fix bugs, harden security, deduplicate providers, add CI/Docker

2026-03-02 00:35:45 -05:00

test_dashboard.sh

refactor: comprehensive audit — fix bugs, harden security, deduplicate providers, add CI/Docker

2026-03-02 00:35:45 -05:00

test_server.sh

chore: initial clean commit

2026-02-26 13:56:21 -05:00

README.md

LLM Proxy Gateway

A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok, Ollama) with built-in token tracking, real-time cost calculation, multi-user authentication, and a management dashboard.

Features

Unified API: OpenAI-compatible /v1/chat/completions and /v1/models endpoints.
Multi-Provider Support:
- OpenAI: GPT-4o, GPT-4o Mini, o1, o3 reasoning models.
- Google Gemini: Gemini 2.0 Flash, Pro, and vision models.
- DeepSeek: DeepSeek Chat and Reasoner models.
- xAI Grok: Grok-beta models.
- Ollama: Local LLMs running on your network.
Observability & Tracking:
- Real-time Costing: Fetches live pricing and context specs from models.dev on startup.
- Token Counting: Precise estimation using tiktoken-rs.
- Database Logging: Every request logged to SQLite for historical analysis.
- Streaming Support: Full SSE (Server-Sent Events) with [DONE] termination for client compatibility.
Multimodal (Vision): Image processing (Base64 and remote URLs) across compatible providers.
Multi-User Access Control:
- Admin Role: Full access to all dashboard features, user management, and system configuration.
- Viewer Role: Read-only access to usage analytics, costs, and monitoring.
- Client API Keys: Create and manage multiple client tokens for external integrations.
Reliability:
- Circuit Breaking: Automatically protects when providers are down.
- Rate Limiting: Per-client and global rate limits.
- Cache-Aware Costing: Tracks cache hit/miss tokens for accurate billing.

Tech Stack

Runtime: Rust with Tokio.
Web Framework: Axum.
Database: SQLx with SQLite.
Frontend: Vanilla JS/CSS with Chart.js for visualizations.

Getting Started

Prerequisites

Rust (1.80+)
SQLite3

Quick Start

Clone and build:

git clone ssh://git.dustin.coffee:2222/hobokenchicken/llm-proxy.git
cd llm-proxy
cargo build --release

Configure environment:

cp .env.example .env
# Edit .env and add your API keys:
# OPENAI_API_KEY=sk-...
# GEMINI_API_KEY=AIza...
# DEEPSEEK_API_KEY=sk-...
# GROK_API_KEY=gk-...  (optional)

Run the proxy:
```
cargo run --release
```

The server starts on http://localhost:8080 by default.

Configuration

Edit config.toml to customize providers, models, and settings:

[server]
port = 8080
host = "0.0.0.0"

[database]
path = "./data/llm_proxy.db"

[providers.openai]
enabled = true
default_model = "gpt-4o"

[providers.gemini]
enabled = true
default_model = "gemini-2.0-flash"

[providers.deepseek]
enabled = true
default_model = "deepseek-reasoner"

[providers.grok]
enabled = false
default_model = "grok-beta"

[providers.ollama]
enabled = false
base_url = "http://localhost:11434/v1"

Management Dashboard

Access the dashboard at http://localhost:8080:

Overview: Real-time request counters, system health, provider status.
Analytics: Time-series charts, filterable by date, client, provider, and model.
Costs: Budget tracking, cost breakdown by provider/client/model, projections.
Clients: Create, revoke, and rotate API tokens; per-client usage stats.
Providers: Enable/disable providers, test connections, configure API keys.
Monitoring: Live request stream via WebSocket, response times, error rates.
Users: Admin/user management with role-based access control.

Default Credentials

Username: admin
Password: admin123

Change the admin password in the dashboard after first login!

API Usage

The proxy is a drop-in replacement for OpenAI. Configure your client:

Python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="YOUR_CLIENT_API_KEY"  # Create in dashboard
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

Open WebUI

API Base URL: http://your-server:8080/v1
API Key: YOUR_CLIENT_API_KEY

cURL

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_CLIENT_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": false
  }'

Model Discovery

The proxy exposes /v1/models for OpenAI-compatible client model discovery:

curl http://localhost:8080/v1/models \
  -H "Authorization: Bearer YOUR_CLIENT_API_KEY"

Troubleshooting

Streaming Issues

If clients timeout or show "TransferEncodingError", ensure:

Proxy buffering is disabled in nginx: proxy_buffering off;
Chunked transfer is enabled: chunked_transfer_encoding on;
Timeouts are sufficient: proxy_read_timeout 7200s;

Provider Errors

Check API keys are set in .env
Test provider in dashboard (Settings → Providers → Test)
Review logs: journalctl -u llm-proxy -f

License

MIT OR Apache-2.0

Languages

Go 49.1%

JavaScript 42.8%

CSS 6.1%

HTML 1.7%

Shell 0.2%

Other 0.1%