docs: add automatic model routing to README
This commit is contained in:
@@ -7,9 +7,9 @@ A unified, high-performance LLM proxy gateway built in Go. It provides OpenAI-co
|
||||
- **Unified API:** OpenAI-compatible `/v1/chat/completions`, `/v1/images/generations`, `/v1/responses`, and `/v1/models` endpoints.
|
||||
- The `/v1/responses` endpoint (OpenAI Responses API) is currently supported for OpenAI models only. Non-OpenAI providers (Gemini, DeepSeek, Moonshot, Grok, Ollama) return a "not supported" response.
|
||||
- **Multi-Provider Support:**
|
||||
- **OpenAI:** GPT-4o, GPT-4o Mini, o1, o3 reasoning models, DALL-E 2/3 image generation.
|
||||
- **Google Gemini:** Gemini 2.0 Flash, Pro, and vision models (with native CoT support), Imagen 3 image generation.
|
||||
- **DeepSeek:** DeepSeek Chat and Reasoner (R1) models.
|
||||
- **OpenAI:** GPT-4o, GPT-4o Mini, o1, o3 reasoning models, DALL-E 2/3 image generation. Group: `openai-auto`.
|
||||
- **Google Gemini:** Gemini 2.0 Flash, Pro, and vision models (with native CoT support), Imagen 3 image generation. Group: `gemini-auto`.
|
||||
- **DeepSeek:** DeepSeek Chat and Reasoner (R1) models. Group: `deepseek-auto`.
|
||||
- **Moonshot:** Kimi K2.5 and other Kimi models.
|
||||
- **xAI Grok:** Grok-4 models.
|
||||
- **Ollama:** Local LLMs running on your network.
|
||||
@@ -20,6 +20,10 @@ A unified, high-performance LLM proxy gateway built in Go. It provides OpenAI-co
|
||||
- **Streaming Support:** Full SSE (Server-Sent Events) support for all providers.
|
||||
- **Multimodal (Vision):** Image processing (Base64 and remote URLs) across compatible providers.
|
||||
- **Image Generation:** DALL-E 2/3 (OpenAI) and Imagen 3 (Gemini) via OpenAI-compatible `/v1/images/generations` endpoint.
|
||||
- **Automatic Model Routing:** Define model groups (e.g. `deepseek-auto`) that route to the best concrete model based on the task.
|
||||
- **Heuristic strategy:** Free, zero-latency keyword matching (e.g. "debug" or "step by step" routes to the reasoning model).
|
||||
- **Classifier strategy:** Uses a cheap LLM to rate task complexity, then selects the appropriate model.
|
||||
- Pre-seeded with `openai-auto`, `deepseek-auto`, and `gemini-auto` groups. Managed via the dashboard or API.
|
||||
- **Multi-User Access Control:**
|
||||
- **Admin Role:** Full access to all dashboard features, user management, and system configuration.
|
||||
- **Viewer Role:** Read-only access to usage analytics, costs, and monitoring.
|
||||
@@ -106,6 +110,8 @@ Access the dashboard at `http://localhost:8080`.
|
||||
- **Usage:** Summary stats, time-series analytics, and provider breakdown.
|
||||
- **Clients:** API key management and per-client usage tracking.
|
||||
- **Providers:** Provider configuration and status monitoring.
|
||||
- **Model Groups:** Define auto-routing groups with heuristic or classifier strategies.
|
||||
- **Models:** Model enable/disable and cost configuration.
|
||||
- **Users:** Admin-only user management for dashboard access.
|
||||
- **Monitoring:** Live request stream via WebSocket.
|
||||
|
||||
@@ -172,6 +178,33 @@ print(response.output_text)
|
||||
|
||||
**Note:** The `/v1/responses` endpoint is currently supported for OpenAI models only. Requests routed to Gemini, DeepSeek, Moonshot, Grok, or Ollama models return a "not supported" error.
|
||||
|
||||
### Automatic Model Routing
|
||||
|
||||
Use a model group name to let gophergate pick the best model automatically:
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/v1",
|
||||
api_key="YOUR_CLIENT_API_KEY"
|
||||
)
|
||||
|
||||
# Simple query - routes to the cheap/fast model in the group
|
||||
response = client.chat.completions.create(
|
||||
model="deepseek-auto",
|
||||
messages=[{"role": "user", "content": "What is 2+2?"}]
|
||||
)
|
||||
|
||||
# Complex query - routes to the reasoning model automatically
|
||||
response = client.chat.completions.create(
|
||||
model="openai-auto",
|
||||
messages=[{"role": "user", "content": "Debug this Python code and explain step by step..."}]
|
||||
)
|
||||
```
|
||||
|
||||
Pre-seeded groups: `openai-auto` (gpt-4o-mini / gpt-4o), `deepseek-auto` (deepseek-chat / deepseek-reasoner), `gemini-auto` (gemini-2.0-flash / gemini-2.5-pro). By default these use heuristic keyword matching. Switch any group to classifier mode in the dashboard to use LLM-based complexity rating instead.
|
||||
|
||||
### Image Generation (DALL-E / Imagen)
|
||||
|
||||
```python
|
||||
|
||||
Reference in New Issue
Block a user