From bac03de0516a40edb14074bbae30c14860ce2c33 Mon Sep 17 00:00:00 2001 From: hobokenchicken Date: Tue, 5 May 2026 11:28:59 -0400 Subject: [PATCH] docs: add automatic model routing to README --- README.md | 39 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 36 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 832b7de8..050d91f4 100644 --- a/README.md +++ b/README.md @@ -7,9 +7,9 @@ A unified, high-performance LLM proxy gateway built in Go. It provides OpenAI-co - **Unified API:** OpenAI-compatible `/v1/chat/completions`, `/v1/images/generations`, `/v1/responses`, and `/v1/models` endpoints. - The `/v1/responses` endpoint (OpenAI Responses API) is currently supported for OpenAI models only. Non-OpenAI providers (Gemini, DeepSeek, Moonshot, Grok, Ollama) return a "not supported" response. - **Multi-Provider Support:** - - **OpenAI:** GPT-4o, GPT-4o Mini, o1, o3 reasoning models, DALL-E 2/3 image generation. - - **Google Gemini:** Gemini 2.0 Flash, Pro, and vision models (with native CoT support), Imagen 3 image generation. - - **DeepSeek:** DeepSeek Chat and Reasoner (R1) models. + - **OpenAI:** GPT-4o, GPT-4o Mini, o1, o3 reasoning models, DALL-E 2/3 image generation. Group: `openai-auto`. + - **Google Gemini:** Gemini 2.0 Flash, Pro, and vision models (with native CoT support), Imagen 3 image generation. Group: `gemini-auto`. + - **DeepSeek:** DeepSeek Chat and Reasoner (R1) models. Group: `deepseek-auto`. - **Moonshot:** Kimi K2.5 and other Kimi models. - **xAI Grok:** Grok-4 models. - **Ollama:** Local LLMs running on your network. @@ -20,6 +20,10 @@ A unified, high-performance LLM proxy gateway built in Go. It provides OpenAI-co - **Streaming Support:** Full SSE (Server-Sent Events) support for all providers. - **Multimodal (Vision):** Image processing (Base64 and remote URLs) across compatible providers. - **Image Generation:** DALL-E 2/3 (OpenAI) and Imagen 3 (Gemini) via OpenAI-compatible `/v1/images/generations` endpoint. +- **Automatic Model Routing:** Define model groups (e.g. `deepseek-auto`) that route to the best concrete model based on the task. + - **Heuristic strategy:** Free, zero-latency keyword matching (e.g. "debug" or "step by step" routes to the reasoning model). + - **Classifier strategy:** Uses a cheap LLM to rate task complexity, then selects the appropriate model. + - Pre-seeded with `openai-auto`, `deepseek-auto`, and `gemini-auto` groups. Managed via the dashboard or API. - **Multi-User Access Control:** - **Admin Role:** Full access to all dashboard features, user management, and system configuration. - **Viewer Role:** Read-only access to usage analytics, costs, and monitoring. @@ -106,6 +110,8 @@ Access the dashboard at `http://localhost:8080`. - **Usage:** Summary stats, time-series analytics, and provider breakdown. - **Clients:** API key management and per-client usage tracking. - **Providers:** Provider configuration and status monitoring. +- **Model Groups:** Define auto-routing groups with heuristic or classifier strategies. +- **Models:** Model enable/disable and cost configuration. - **Users:** Admin-only user management for dashboard access. - **Monitoring:** Live request stream via WebSocket. @@ -172,6 +178,33 @@ print(response.output_text) **Note:** The `/v1/responses` endpoint is currently supported for OpenAI models only. Requests routed to Gemini, DeepSeek, Moonshot, Grok, or Ollama models return a "not supported" error. +### Automatic Model Routing + +Use a model group name to let gophergate pick the best model automatically: + +```python +from openai import OpenAI + +client = OpenAI( + base_url="http://localhost:8080/v1", + api_key="YOUR_CLIENT_API_KEY" +) + +# Simple query - routes to the cheap/fast model in the group +response = client.chat.completions.create( + model="deepseek-auto", + messages=[{"role": "user", "content": "What is 2+2?"}] +) + +# Complex query - routes to the reasoning model automatically +response = client.chat.completions.create( + model="openai-auto", + messages=[{"role": "user", "content": "Debug this Python code and explain step by step..."}] +) +``` + +Pre-seeded groups: `openai-auto` (gpt-4o-mini / gpt-4o), `deepseek-auto` (deepseek-chat / deepseek-reasoner), `gemini-auto` (gemini-2.0-flash / gemini-2.5-pro). By default these use heuristic keyword matching. Switch any group to classifier mode in the dashboard to use LLM-based complexity rating instead. + ### Image Generation (DALL-E / Imagen) ```python