From b7df3108faacddfb2231570d8b385a3fe84ba28e Mon Sep 17 00:00:00 2001 From: hobokenchicken Date: Thu, 7 May 2026 14:07:52 -0400 Subject: [PATCH] docs: update README, TODO, and deployment docs README: Added hierarchical routing, classifier bucket mapping, two-level dispatch, model groups table, DeepSeek language note, deploy script, and updated model names to match current models.dev registry. TODO: Added 15 completed items covering model groups, routing, dispatch, and provider fixes from May 7 session. deployment.md: Added deploy.sh instructions. --- README.md | 92 +++++++++++++++++++++++++++++++++++---------------- TODO.md | 15 ++++++++- deployment.md | 17 ++++++++++ 3 files changed, 95 insertions(+), 29 deletions(-) diff --git a/README.md b/README.md index 050d91f4..3b37846c 100644 --- a/README.md +++ b/README.md @@ -7,11 +7,11 @@ A unified, high-performance LLM proxy gateway built in Go. It provides OpenAI-co - **Unified API:** OpenAI-compatible `/v1/chat/completions`, `/v1/images/generations`, `/v1/responses`, and `/v1/models` endpoints. - The `/v1/responses` endpoint (OpenAI Responses API) is currently supported for OpenAI models only. Non-OpenAI providers (Gemini, DeepSeek, Moonshot, Grok, Ollama) return a "not supported" response. - **Multi-Provider Support:** - - **OpenAI:** GPT-4o, GPT-4o Mini, o1, o3 reasoning models, DALL-E 2/3 image generation. Group: `openai-auto`. - - **Google Gemini:** Gemini 2.0 Flash, Pro, and vision models (with native CoT support), Imagen 3 image generation. Group: `gemini-auto`. - - **DeepSeek:** DeepSeek Chat and Reasoner (R1) models. Group: `deepseek-auto`. - - **Moonshot:** Kimi K2.5 and other Kimi models. - - **xAI Grok:** Grok-4 models. + - **OpenAI:** GPT-4o, GPT-4o Mini, GPT-5, GPT-5.4, o1/o3/o4 reasoning models, DALL-E 2/3 image generation. + - **Google Gemini:** Gemini 2.5 Flash/Pro, Gemini 3 Flash/Pro previews, Imagen 3 image generation. + - **DeepSeek:** DeepSeek Chat, Reasoner, V4 Flash, V4 Pro. + - **Moonshot:** Kimi K2.5, K2.6 reasoning models. + - **xAI Grok:** Grok-3, Grok-4, Grok-4.3 reasoning models. - **Ollama:** Local LLMs running on your network. - **Observability & Tracking:** - **Asynchronous Logging:** Non-blocking request logging to SQLite using background workers. @@ -20,17 +20,24 @@ A unified, high-performance LLM proxy gateway built in Go. It provides OpenAI-co - **Streaming Support:** Full SSE (Server-Sent Events) support for all providers. - **Multimodal (Vision):** Image processing (Base64 and remote URLs) across compatible providers. - **Image Generation:** DALL-E 2/3 (OpenAI) and Imagen 3 (Gemini) via OpenAI-compatible `/v1/images/generations` endpoint. -- **Automatic Model Routing:** Define model groups (e.g. `deepseek-auto`) that route to the best concrete model based on the task. +- **Automatic Model Routing:** + - **Hierarchical Routing:** Groups can target other groups, cascading through multiple levels until a concrete model is reached. Cycle detection and depth limiting (max 10) prevent infinite loops. - **Heuristic strategy:** Free, zero-latency keyword matching (e.g. "debug" or "step by step" routes to the reasoning model). - - **Classifier strategy:** Uses a cheap LLM to rate task complexity, then selects the appropriate model. - - Pre-seeded with `openai-auto`, `deepseek-auto`, and `gemini-auto` groups. Managed via the dashboard or API. + - **Classifier strategy:** Uses a cheap LLM to rate task complexity on a configurable scale (1-10), then selects the appropriate model. Bucket mapping distributes ratings proportionally across targets. + - **Two-Level Dispatch:** A `dispatcher` group (classifier, threshold=10) auto-routes to tier groups by complexity score, which then apply their own internal strategies. + - **Metadata:** Groups support `logic_level` (1-10 complexity scale) and `primary_use` (description) fields for organizational clarity. + - Pre-seeded with provider groups, tier groups (heavy-logic / standard-pro / fast-flow), and a dispatcher. Model groups are exposed in `/v1/models` so clients can discover them. - **Multi-User Access Control:** - **Admin Role:** Full access to all dashboard features, user management, and system configuration. - **Viewer Role:** Read-only access to usage analytics, costs, and monitoring. - **Client API Keys:** Create and manage multiple client tokens for external integrations. - **Reliability:** - - **Circuit Breaking:** Automatically protects when providers are down (coming soon). - - **Rate Limiting:** Per-client and global rate limits (coming soon). + - **Circuit Breaking:** Protects providers when they are down, auto-recovers after timeout. + - **Provider-Aware Classification:** Classifier selector models are routed to the correct provider automatically. + +## DeepSeek Language Note + +DeepSeek models default to Chinese for some prompts. GopherGate automatically injects an English system prompt ("Always respond in English.") when no system message is present. If the client provides its own system prompt, it is left untouched. ## Security @@ -75,7 +82,9 @@ GopherGate is designed with security in mind: # LLM_PROXY__ENCRYPTION_KEY=... (32-byte hex or base64 string) # OPENAI_API_KEY=sk-... # GEMINI_API_KEY=AIza... + # DEEPSEEK_API_KEY=sk-... # MOONSHOT_API_KEY=... + # GROK_API_KEY=xai-... # For Ollama (optional): Set base URL and enable # LLM_PROXY__PROVIDERS__OLLAMA__BASE_URL=http://localhost:11434/v1 # LLM_PROXY__PROVIDERS__OLLAMA__ENABLED=true @@ -87,7 +96,16 @@ GopherGate is designed with security in mind: ./gophergate ``` -The server starts on `http://0.0.0.0:8080` by default. +The server starts on `http://0.0.0.0:8080` by default. Configure `LLM_PROXY__SERVER__PORT` in `.env` to change it. + +### Quick Deploy Script + +A `deploy.sh` script is included for production restarts: + +```bash +./deploy.sh +# git pull -> go build -> stop old process -> start new process +``` ### Deployment (Docker) @@ -110,7 +128,7 @@ Access the dashboard at `http://localhost:8080`. - **Usage:** Summary stats, time-series analytics, and provider breakdown. - **Clients:** API key management and per-client usage tracking. - **Providers:** Provider configuration and status monitoring. -- **Model Groups:** Define auto-routing groups with heuristic or classifier strategies. +- **Model Groups:** Define auto-routing groups with heuristic or classifier strategies. Supports logic level and primary use metadata. - **Models:** Model enable/disable and cost configuration. - **Users:** Admin-only user management for dashboard access. - **Monitoring:** Live request stream via WebSocket. @@ -131,14 +149,6 @@ You can reset the admin password to default by running: The proxy is a drop-in replacement for OpenAI. Configure your client: -Moonshot models are available through the same OpenAI-compatible endpoint. For -example, use `kimi-k2.5` as the model name after setting `MOONSHOT_API_KEY` in -your environment. - -Ollama models (like `llama3`, `gemma2`, `mistral`) are also available through the same -endpoint after enabling Ollama in configuration and setting the base URL to your -Ollama server (default: `http://localhost:11434/v1`). - ### Python ```python @@ -176,7 +186,7 @@ response = client.responses.create( print(response.output_text) ``` -**Note:** The `/v1/responses` endpoint is currently supported for OpenAI models only. Requests routed to Gemini, DeepSeek, Moonshot, Grok, or Ollama models return a "not supported" error. +**Note:** The `/v1/responses` endpoint is currently supported for OpenAI models only. ### Automatic Model Routing @@ -190,20 +200,46 @@ client = OpenAI( api_key="YOUR_CLIENT_API_KEY" ) -# Simple query - routes to the cheap/fast model in the group +# Simple query -- routes to the cheap/fast model response = client.chat.completions.create( - model="deepseek-auto", + model="fast-flow", messages=[{"role": "user", "content": "What is 2+2?"}] ) -# Complex query - routes to the reasoning model automatically +# Complex query -- routes to the reasoning model automatically response = client.chat.completions.create( - model="openai-auto", - messages=[{"role": "user", "content": "Debug this Python code and explain step by step..."}] + model="heavy-logic", + messages=[{"role": "user", "content": "Write a Python red-black tree implementation."}] ) ``` -Pre-seeded groups: `openai-auto` (gpt-4o-mini / gpt-4o), `deepseek-auto` (deepseek-chat / deepseek-reasoner), `gemini-auto` (gemini-2.0-flash / gemini-2.5-pro). By default these use heuristic keyword matching. Switch any group to classifier mode in the dashboard to use LLM-based complexity rating instead. +### Two-Level Dispatch + +The `dispatcher` group uses a classifier to score prompts 1-10, then routes to the appropriate tier group: + +```python +# Automatically routed based on complexity: +# 1-3 -> fast-flow (classification, basic Q&A) +# 4-7 -> standard-pro (general assistant, long docs) +# 8-10 -> heavy-logic (complex coding, logic, agents) +response = client.chat.completions.create( + model="dispatcher", + messages=[{"role": "user", "content": "Debug this race condition in my Go code."}] +) +# This goes: dispatcher -> heavy-logic -> deepseek-v4-pro +``` + +Pre-seeded groups: + +| Group | Level | Strategy | Targets | Primary Use | +|-------|-------|----------|---------|-------------| +| `fast-flow` | 2 | heuristic | deepseek-v4-flash, gpt-5.4-nano | Classification, JSON, Basic Q&A | +| `standard-pro` | 5 | heuristic | gpt-5.4-mini, gemini-3-flash-preview | General Assistant, Long Docs | +| `heavy-logic` | 9 | heuristic | grok-4.3, kimi-k2.6, deepseek-v4-pro | Complex Coding, Logic, Agents | +| `dispatcher` | - | classifier | fast-flow, standard-pro, heavy-logic | Auto-dispatches by complexity | +| `deepseek-auto` | - | heuristic | deepseek-chat, deepseek-reasoner | Legacy provider group | +| `openai-auto` | - | heuristic | gpt-4o-mini, gpt-4o | Legacy provider group | +| `gemini-auto` | - | heuristic | gemini-2.0-flash, gemini-2.5-pro | Legacy provider group | ### Image Generation (DALL-E / Imagen) @@ -224,7 +260,7 @@ resp = client.images.generate( ) print(resp.data[0].url) -# Imagen 3 (Gemini) — uses same endpoint +# Imagen 3 (Gemini) -- uses same endpoint resp = client.images.generate( model="imagen-3.0-generate-001", prompt="A gopher coding in Go", diff --git a/TODO.md b/TODO.md index d64704cd..bea429ed 100644 --- a/TODO.md +++ b/TODO.md @@ -15,11 +15,24 @@ - [x] Dashboard Analytics & Usage Summary (Fixed SQL robustness) - [x] WebSocket for real-time dashboard updates (Hub with client counting) - [x] Asynchronous Request Logging to SQLite -- [x] Update documentation (README, deployment, architecture) - [x] Cost Tracking accuracy (Registry integration with `models.dev`) - [x] Model Listing endpoint (`/v1/models`) with provider filtering - [x] System Metrics endpoint (`/api/system/metrics` using `gopsutil`) - [x] Fixed dashboard 404s and 500s +- [x] Model groups with heuristic and classifier routing strategies +- [x] Hierarchical routing — groups can target other groups with cycle detection +- [x] Classifier bucket mapping via complexity_threshold (1-10 scale -> N targets) +- [x] Two-level dispatch — classifier router delegates to tier groups +- [x] Model groups exposed in /v1/models endpoint (owned_by: gophergate) +- [x] logic_level and primary_use metadata on model groups +- [x] Model group CRUD dashboard page +- [x] dispatcher, heavy-logic, standard-pro, fast-flow seed groups +- [x] Provider selection moved after routing resolution (fixes group routing) +- [x] Classifier selector model routed to correct provider (selectProvider) +- [x] DeepSeek English system prompt injection (ensureEnglish) +- [x] Deploy script (deploy.sh) +- [x] Recent Activity pane shows resolved model + group annotation +- [x] Model names aligned with models.dev registry ## Planned Resolutions (High Priority) diff --git a/deployment.md b/deployment.md index 0e128253..e204ef7e 100644 --- a/deployment.md +++ b/deployment.md @@ -26,6 +26,22 @@ go build -o gophergate ./cmd/gophergate ./gophergate ``` +### Quick Deploy Script + +A `deploy.sh` script is provided for production restarts: + +```bash +./deploy.sh +``` + +This script will: +1. Stop any running gophergate process +2. Pull latest changes from git +3. Build the application +4. Start it in the background (logs to `gophergate.log`) + +If the build fails, the previous binary is left untouched and the script exits. + ## Docker Deployment The project includes a multi-stage `Dockerfile` for minimal image size. @@ -50,3 +66,4 @@ docker run -d \ - **SSL/TLS:** It is recommended to run the proxy behind a reverse proxy like Nginx or Caddy for SSL termination. - **Backups:** Regularly backup the `data/llm_proxy.db` file. - **Monitoring:** Monitor the `/health` endpoint for system status. +- **Logs:** When started with `deploy.sh` or `nohup`, logs are written to `gophergate.log`.