hobokenchicken/GopherGate

Fork 0

T

hobokenchicken 4aea7a3b4c

CI / Lint (push) Has been cancelled

Details

CI / Test (push) Has been cancelled

Details

CI / Build (push) Has been cancelled

Details

fix: select provider AFTER routing resolves model groups

Previously, provider selection happened on the raw client-requested model
name (e.g. 'dispatcher') which defaulted to OpenAI. After routing resolved
it to 'deepseek-v4-flash', the provider was never re-selected.

Now prefix-stripping + routing runs first, then selectProvider() picks
the correct provider based on the resolved concrete model.

2026-05-07 13:54:42 -04:00

.github/workflows

chore: rebrand project to GopherGate

2026-03-19 13:37:05 -04:00

.pi-lens

fix: add per-image cost tracking for DALL-E and Imagen

2026-04-27 10:42:29 -04:00

cmd

debug: add max_tokens trace logging to chat completions handler

2026-04-30 10:04:50 -04:00

data/backups

chore: cleanup repository and update gitignore

2026-03-25 13:08:33 +00:00

internal

fix: select provider AFTER routing resolves model groups

2026-05-07 13:54:42 -04:00

static

feat: add logic_level and primary_use metadata to model groups

2026-05-07 12:01:28 -04:00

.env.example

feat: add moonshot kimi k2.5 support

2026-03-25 09:27:46 -04:00

.gitignore

feat: Phase 2 - reliability & observability

2026-04-26 14:48:56 -04:00

BACKEND_ARCHITECTURE.md

docs: update documentation for Ollama provider

2026-04-06 15:01:55 -04:00

deploy.sh

chore: add deploy.sh for prod restarts

2026-05-07 12:02:28 -04:00

deployment.md

chore: rebrand project to GopherGate

2026-03-19 13:37:05 -04:00

Dockerfile

chore: rebrand project to GopherGate

2026-03-19 13:37:05 -04:00

go.mod

fix: Phase 1 - security & stability patches

2026-04-26 14:45:22 -04:00

go.sum

fix: Phase 1 - security & stability patches

2026-04-26 14:45:22 -04:00

PLAN.md

fix: Phase 1 - security & stability patches

2026-04-26 14:45:22 -04:00

README.md

docs: add automatic model routing to README

2026-05-05 11:28:59 -04:00

TODO.md

feat: implement circuit breaker, fix auth vulnerability

2026-04-09 12:17:18 -04:00

README.md

GopherGate

A unified, high-performance LLM proxy gateway built in Go. It provides OpenAI-compatible /v1/chat/completions, /v1/images/generations, /v1/responses, and /v1/models endpoints to access multiple providers (OpenAI, Gemini, DeepSeek, Moonshot, Grok, Ollama) with built-in token tracking, real-time cost calculation, multi-user authentication, and a management dashboard.

Features

Unified API: OpenAI-compatible /v1/chat/completions, /v1/images/generations, /v1/responses, and /v1/models endpoints.
- The /v1/responses endpoint (OpenAI Responses API) is currently supported for OpenAI models only. Non-OpenAI providers (Gemini, DeepSeek, Moonshot, Grok, Ollama) return a "not supported" response.
Multi-Provider Support:
- OpenAI: GPT-4o, GPT-4o Mini, o1, o3 reasoning models, DALL-E 2/3 image generation. Group: openai-auto.
- Google Gemini: Gemini 2.0 Flash, Pro, and vision models (with native CoT support), Imagen 3 image generation. Group: gemini-auto.
- DeepSeek: DeepSeek Chat and Reasoner (R1) models. Group: deepseek-auto.
- Moonshot: Kimi K2.5 and other Kimi models.
- xAI Grok: Grok-4 models.
- Ollama: Local LLMs running on your network.
Observability & Tracking:
- Asynchronous Logging: Non-blocking request logging to SQLite using background workers.
- Token Counting: Precise estimation and tracking of prompt, completion, and reasoning tokens.
- Database Persistence: Every request logged to SQLite for historical analysis and dashboard analytics.
- Streaming Support: Full SSE (Server-Sent Events) support for all providers.
Multimodal (Vision): Image processing (Base64 and remote URLs) across compatible providers.
Image Generation: DALL-E 2/3 (OpenAI) and Imagen 3 (Gemini) via OpenAI-compatible /v1/images/generations endpoint.
Automatic Model Routing: Define model groups (e.g. deepseek-auto) that route to the best concrete model based on the task.
- Heuristic strategy: Free, zero-latency keyword matching (e.g. "debug" or "step by step" routes to the reasoning model).
- Classifier strategy: Uses a cheap LLM to rate task complexity, then selects the appropriate model.
- Pre-seeded with openai-auto, deepseek-auto, and gemini-auto groups. Managed via the dashboard or API.
Multi-User Access Control:
- Admin Role: Full access to all dashboard features, user management, and system configuration.
- Viewer Role: Read-only access to usage analytics, costs, and monitoring.
- Client API Keys: Create and manage multiple client tokens for external integrations.
Reliability:
- Circuit Breaking: Automatically protects when providers are down (coming soon).
- Rate Limiting: Per-client and global rate limits (coming soon).

Security

GopherGate is designed with security in mind:

Signed Session Tokens: Management dashboard sessions are secured using HMAC-SHA256 signed tokens.
Encrypted Storage: Support for encrypted provider API keys in the database.
Auth Middleware: Secure client authentication via database-backed API keys.

Note: You must define an LLM_PROXY__ENCRYPTION_KEY in your .env file for secure session signing and encryption.

Tech Stack

Runtime: Go 1.22+
Web Framework: Gin Gonic
Database: sqlx with SQLite (CGO-free via modernc.org/sqlite)
Frontend: Vanilla JS/CSS with Chart.js for visualizations

Getting Started

Prerequisites

Go (1.22+)
SQLite3 (optional, driver is built-in)
Docker (optional, for containerized deployment)

Quick Start

Clone and build:

git clone <repository-url>
cd gophergate
go build -o gophergate ./cmd/gophergate

Configure environment:

cp .env.example .env
# Edit .env and add your configuration:
# LLM_PROXY__ENCRYPTION_KEY=... (32-byte hex or base64 string)
# OPENAI_API_KEY=sk-...
# GEMINI_API_KEY=AIza...
# MOONSHOT_API_KEY=...
# For Ollama (optional): Set base URL and enable
# LLM_PROXY__PROVIDERS__OLLAMA__BASE_URL=http://localhost:11434/v1
# LLM_PROXY__PROVIDERS__OLLAMA__ENABLED=true
# LLM_PROXY__PROVIDERS__OLLAMA__MODELS=llama3,gemma2,mistral

Run the proxy:
```
./gophergate
```

The server starts on http://0.0.0.0:8080 by default.

Deployment (Docker)

# Build the container
docker build -t gophergate .

# Run the container
docker run -p 8080:8080 \
  -e LLM_PROXY__ENCRYPTION_KEY=your-secure-key \
  -v ./data:/app/data \
  gophergate

Management Dashboard

Access the dashboard at http://localhost:8080.

Auth: Login, session management, and status tracking.
Usage: Summary stats, time-series analytics, and provider breakdown.
Clients: API key management and per-client usage tracking.
Providers: Provider configuration and status monitoring.
Model Groups: Define auto-routing groups with heuristic or classifier strategies.
Models: Model enable/disable and cost configuration.
Users: Admin-only user management for dashboard access.
Monitoring: Live request stream via WebSocket.

Default Credentials

Username: admin
Password: admin123 (You will be prompted to change this on first login)

Forgot Password? You can reset the admin password to default by running:

./gophergate -reset-admin

API Usage

The proxy is a drop-in replacement for OpenAI. Configure your client:

Moonshot models are available through the same OpenAI-compatible endpoint. For example, use kimi-k2.5 as the model name after setting MOONSHOT_API_KEY in your environment.

Ollama models (like llama3, gemma2, mistral) are also available through the same endpoint after enabling Ollama in configuration and setting the base URL to your Ollama server (default: http://localhost:11434/v1).

Python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="YOUR_CLIENT_API_KEY"  # Create in dashboard
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

Responses API

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="YOUR_CLIENT_API_KEY"
)

# OpenAI Responses API (supported for OpenAI models only)
response = client.responses.create(
    model="gpt-4o",
    input="Explain quantum computing in one paragraph.",
    instructions="You are a helpful assistant.",
    temperature=0.7,
    max_output_tokens=500
)
print(response.output_text)

Note: The /v1/responses endpoint is currently supported for OpenAI models only. Requests routed to Gemini, DeepSeek, Moonshot, Grok, or Ollama models return a "not supported" error.

Automatic Model Routing

Use a model group name to let gophergate pick the best model automatically:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="YOUR_CLIENT_API_KEY"
)

# Simple query - routes to the cheap/fast model in the group
response = client.chat.completions.create(
    model="deepseek-auto",
    messages=[{"role": "user", "content": "What is 2+2?"}]
)

# Complex query - routes to the reasoning model automatically
response = client.chat.completions.create(
    model="openai-auto",
    messages=[{"role": "user", "content": "Debug this Python code and explain step by step..."}]
)

Pre-seeded groups: openai-auto (gpt-4o-mini / gpt-4o), deepseek-auto (deepseek-chat / deepseek-reasoner), gemini-auto (gemini-2.0-flash / gemini-2.5-pro). By default these use heuristic keyword matching. Switch any group to classifier mode in the dashboard to use LLM-based complexity rating instead.

Image Generation (DALL-E / Imagen)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="YOUR_CLIENT_API_KEY"
)

# DALL-E 3 (OpenAI)
resp = client.images.generate(
    model="dall-e-3",
    prompt="A cute gopher wearing a top hat",
    n=1,
    size="1024x1024"
)
print(resp.data[0].url)

# Imagen 3 (Gemini) — uses same endpoint
resp = client.images.generate(
    model="imagen-3.0-generate-001",
    prompt="A gopher coding in Go",
    n=1,
    size="1024x1024"
)
print(resp.data[0].url)  # Returns data URI (Gemini returns base64)

License

MIT

Languages

Go 49.1%

JavaScript 42.8%

CSS 6.1%

HTML 1.7%

Shell 0.2%

Other 0.1%