Files
GopherGate/README.md
T
hobokenchicken b7df3108fa
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
docs: update README, TODO, and deployment docs
README: Added hierarchical routing, classifier bucket mapping, two-level
dispatch, model groups table, DeepSeek language note, deploy script, and
updated model names to match current models.dev registry.

TODO: Added 15 completed items covering model groups, routing, dispatch,
and provider fixes from May 7 session.

deployment.md: Added deploy.sh instructions.
2026-05-07 14:07:52 -04:00

276 lines
10 KiB
Markdown

# GopherGate
A unified, high-performance LLM proxy gateway built in Go. It provides OpenAI-compatible `/v1/chat/completions`, `/v1/images/generations`, `/v1/responses`, and `/v1/models` endpoints to access multiple providers (OpenAI, Gemini, DeepSeek, Moonshot, Grok, Ollama) with built-in token tracking, real-time cost calculation, multi-user authentication, and a management dashboard.
## Features
- **Unified API:** OpenAI-compatible `/v1/chat/completions`, `/v1/images/generations`, `/v1/responses`, and `/v1/models` endpoints.
- The `/v1/responses` endpoint (OpenAI Responses API) is currently supported for OpenAI models only. Non-OpenAI providers (Gemini, DeepSeek, Moonshot, Grok, Ollama) return a "not supported" response.
- **Multi-Provider Support:**
- **OpenAI:** GPT-4o, GPT-4o Mini, GPT-5, GPT-5.4, o1/o3/o4 reasoning models, DALL-E 2/3 image generation.
- **Google Gemini:** Gemini 2.5 Flash/Pro, Gemini 3 Flash/Pro previews, Imagen 3 image generation.
- **DeepSeek:** DeepSeek Chat, Reasoner, V4 Flash, V4 Pro.
- **Moonshot:** Kimi K2.5, K2.6 reasoning models.
- **xAI Grok:** Grok-3, Grok-4, Grok-4.3 reasoning models.
- **Ollama:** Local LLMs running on your network.
- **Observability & Tracking:**
- **Asynchronous Logging:** Non-blocking request logging to SQLite using background workers.
- **Token Counting:** Precise estimation and tracking of prompt, completion, and reasoning tokens.
- **Database Persistence:** Every request logged to SQLite for historical analysis and dashboard analytics.
- **Streaming Support:** Full SSE (Server-Sent Events) support for all providers.
- **Multimodal (Vision):** Image processing (Base64 and remote URLs) across compatible providers.
- **Image Generation:** DALL-E 2/3 (OpenAI) and Imagen 3 (Gemini) via OpenAI-compatible `/v1/images/generations` endpoint.
- **Automatic Model Routing:**
- **Hierarchical Routing:** Groups can target other groups, cascading through multiple levels until a concrete model is reached. Cycle detection and depth limiting (max 10) prevent infinite loops.
- **Heuristic strategy:** Free, zero-latency keyword matching (e.g. "debug" or "step by step" routes to the reasoning model).
- **Classifier strategy:** Uses a cheap LLM to rate task complexity on a configurable scale (1-10), then selects the appropriate model. Bucket mapping distributes ratings proportionally across targets.
- **Two-Level Dispatch:** A `dispatcher` group (classifier, threshold=10) auto-routes to tier groups by complexity score, which then apply their own internal strategies.
- **Metadata:** Groups support `logic_level` (1-10 complexity scale) and `primary_use` (description) fields for organizational clarity.
- Pre-seeded with provider groups, tier groups (heavy-logic / standard-pro / fast-flow), and a dispatcher. Model groups are exposed in `/v1/models` so clients can discover them.
- **Multi-User Access Control:**
- **Admin Role:** Full access to all dashboard features, user management, and system configuration.
- **Viewer Role:** Read-only access to usage analytics, costs, and monitoring.
- **Client API Keys:** Create and manage multiple client tokens for external integrations.
- **Reliability:**
- **Circuit Breaking:** Protects providers when they are down, auto-recovers after timeout.
- **Provider-Aware Classification:** Classifier selector models are routed to the correct provider automatically.
## DeepSeek Language Note
DeepSeek models default to Chinese for some prompts. GopherGate automatically injects an English system prompt ("Always respond in English.") when no system message is present. If the client provides its own system prompt, it is left untouched.
## Security
GopherGate is designed with security in mind:
- **Signed Session Tokens:** Management dashboard sessions are secured using HMAC-SHA256 signed tokens.
- **Encrypted Storage:** Support for encrypted provider API keys in the database.
- **Auth Middleware:** Secure client authentication via database-backed API keys.
**Note:** You must define an `LLM_PROXY__ENCRYPTION_KEY` in your `.env` file for secure session signing and encryption.
## Tech Stack
- **Runtime:** Go 1.22+
- **Web Framework:** Gin Gonic
- **Database:** sqlx with SQLite (CGO-free via `modernc.org/sqlite`)
- **Frontend:** Vanilla JS/CSS with Chart.js for visualizations
## Getting Started
### Prerequisites
- Go (1.22+)
- SQLite3 (optional, driver is built-in)
- Docker (optional, for containerized deployment)
### Quick Start
1. Clone and build:
```bash
git clone <repository-url>
cd gophergate
go build -o gophergate ./cmd/gophergate
```
2. Configure environment:
```bash
cp .env.example .env
# Edit .env and add your configuration:
# LLM_PROXY__ENCRYPTION_KEY=... (32-byte hex or base64 string)
# OPENAI_API_KEY=sk-...
# GEMINI_API_KEY=AIza...
# DEEPSEEK_API_KEY=sk-...
# MOONSHOT_API_KEY=...
# GROK_API_KEY=xai-...
# For Ollama (optional): Set base URL and enable
# LLM_PROXY__PROVIDERS__OLLAMA__BASE_URL=http://localhost:11434/v1
# LLM_PROXY__PROVIDERS__OLLAMA__ENABLED=true
# LLM_PROXY__PROVIDERS__OLLAMA__MODELS=llama3,gemma2,mistral
```
3. Run the proxy:
```bash
./gophergate
```
The server starts on `http://0.0.0.0:8080` by default. Configure `LLM_PROXY__SERVER__PORT` in `.env` to change it.
### Quick Deploy Script
A `deploy.sh` script is included for production restarts:
```bash
./deploy.sh
# git pull -> go build -> stop old process -> start new process
```
### Deployment (Docker)
```bash
# Build the container
docker build -t gophergate .
# Run the container
docker run -p 8080:8080 \
-e LLM_PROXY__ENCRYPTION_KEY=your-secure-key \
-v ./data:/app/data \
gophergate
```
## Management Dashboard
Access the dashboard at `http://localhost:8080`.
- **Auth:** Login, session management, and status tracking.
- **Usage:** Summary stats, time-series analytics, and provider breakdown.
- **Clients:** API key management and per-client usage tracking.
- **Providers:** Provider configuration and status monitoring.
- **Model Groups:** Define auto-routing groups with heuristic or classifier strategies. Supports logic level and primary use metadata.
- **Models:** Model enable/disable and cost configuration.
- **Users:** Admin-only user management for dashboard access.
- **Monitoring:** Live request stream via WebSocket.
### Default Credentials
- **Username:** `admin`
- **Password:** `admin123` (You will be prompted to change this on first login)
**Forgot Password?**
You can reset the admin password to default by running:
```bash
./gophergate -reset-admin
```
## API Usage
The proxy is a drop-in replacement for OpenAI. Configure your client:
### Python
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="YOUR_CLIENT_API_KEY" # Create in dashboard
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
```
### Responses API
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="YOUR_CLIENT_API_KEY"
)
# OpenAI Responses API (supported for OpenAI models only)
response = client.responses.create(
model="gpt-4o",
input="Explain quantum computing in one paragraph.",
instructions="You are a helpful assistant.",
temperature=0.7,
max_output_tokens=500
)
print(response.output_text)
```
**Note:** The `/v1/responses` endpoint is currently supported for OpenAI models only.
### Automatic Model Routing
Use a model group name to let gophergate pick the best model automatically:
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="YOUR_CLIENT_API_KEY"
)
# Simple query -- routes to the cheap/fast model
response = client.chat.completions.create(
model="fast-flow",
messages=[{"role": "user", "content": "What is 2+2?"}]
)
# Complex query -- routes to the reasoning model automatically
response = client.chat.completions.create(
model="heavy-logic",
messages=[{"role": "user", "content": "Write a Python red-black tree implementation."}]
)
```
### Two-Level Dispatch
The `dispatcher` group uses a classifier to score prompts 1-10, then routes to the appropriate tier group:
```python
# Automatically routed based on complexity:
# 1-3 -> fast-flow (classification, basic Q&A)
# 4-7 -> standard-pro (general assistant, long docs)
# 8-10 -> heavy-logic (complex coding, logic, agents)
response = client.chat.completions.create(
model="dispatcher",
messages=[{"role": "user", "content": "Debug this race condition in my Go code."}]
)
# This goes: dispatcher -> heavy-logic -> deepseek-v4-pro
```
Pre-seeded groups:
| Group | Level | Strategy | Targets | Primary Use |
|-------|-------|----------|---------|-------------|
| `fast-flow` | 2 | heuristic | deepseek-v4-flash, gpt-5.4-nano | Classification, JSON, Basic Q&A |
| `standard-pro` | 5 | heuristic | gpt-5.4-mini, gemini-3-flash-preview | General Assistant, Long Docs |
| `heavy-logic` | 9 | heuristic | grok-4.3, kimi-k2.6, deepseek-v4-pro | Complex Coding, Logic, Agents |
| `dispatcher` | - | classifier | fast-flow, standard-pro, heavy-logic | Auto-dispatches by complexity |
| `deepseek-auto` | - | heuristic | deepseek-chat, deepseek-reasoner | Legacy provider group |
| `openai-auto` | - | heuristic | gpt-4o-mini, gpt-4o | Legacy provider group |
| `gemini-auto` | - | heuristic | gemini-2.0-flash, gemini-2.5-pro | Legacy provider group |
### Image Generation (DALL-E / Imagen)
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="YOUR_CLIENT_API_KEY"
)
# DALL-E 3 (OpenAI)
resp = client.images.generate(
model="dall-e-3",
prompt="A cute gopher wearing a top hat",
n=1,
size="1024x1024"
)
print(resp.data[0].url)
# Imagen 3 (Gemini) -- uses same endpoint
resp = client.images.generate(
model="imagen-3.0-generate-001",
prompt="A gopher coding in Go",
n=1,
size="1024x1024"
)
print(resp.data[0].url) # Returns data URI (Gemini returns base64)
```
## License
MIT