GopherGate/README.md

# LLM Proxy Gateway

A unified, high-performance LLM proxy gateway built in Rust. It provides a single OpenAI-compatible API to access multiple providers (OpenAI, Gemini, DeepSeek, Grok, Ollama) with built-in token tracking, real-time cost calculation, multi-user authentication, and a management dashboard.

## Features

- **Unified API:** OpenAI-compatible `/v1/chat/completions` and `/v1/models` endpoints.
- **Multi-Provider Support:**
  - **OpenAI:** GPT-4o, GPT-4o Mini, o1, o3 reasoning models.
  - **Google Gemini:** Gemini 2.0 Flash, Pro, and vision models.
  - **DeepSeek:** DeepSeek Chat and Reasoner models.
  - **xAI Grok:** Grok-beta models.
  - **Ollama:** Local LLMs running on your network.
- **Observability & Tracking:**
  - **Real-time Costing:** Fetches live pricing and context specs from `models.dev` on startup.
  - **Token Counting:** Precise estimation using `tiktoken-rs`.
  - **Database Logging:** Every request logged to SQLite for historical analysis.
  - **Streaming Support:** Full SSE (Server-Sent Events) with `[DONE]` termination for client compatibility.
- **Multimodal (Vision):** Image processing (Base64 and remote URLs) across compatible providers.
- **Multi-User Access Control:**
  - **Admin Role:** Full access to all dashboard features, user management, and system configuration.
  - **Viewer Role:** Read-only access to usage analytics, costs, and monitoring.
  - **Client API Keys:** Create and manage multiple client tokens for external integrations.
- **Reliability:**
  - **Circuit Breaking:** Automatically protects when providers are down.
  - **Rate Limiting:** Per-client and global rate limits.
  - **Cache-Aware Costing:** Tracks cache hit/miss tokens for accurate billing.

## Tech Stack

- **Runtime:** Rust with Tokio.
- **Web Framework:** Axum.
- **Database:** SQLx with SQLite.
- **Frontend:** Vanilla JS/CSS with Chart.js for visualizations.

## Getting Started

### Prerequisites

- Rust (1.80+)
- SQLite3

### Quick Start

1. Clone and build:
   ```bash
   git clone ssh://git.dustin.coffee:2222/hobokenchicken/llm-proxy.git
   cd llm-proxy
   cargo build --release
   ```

2. Configure environment:
   ```bash
   cp .env.example .env
   # Edit .env and add your API keys:
   # OPENAI_API_KEY=sk-...
   # GEMINI_API_KEY=AIza...
   # DEEPSEEK_API_KEY=sk-...
   # GROK_API_KEY=gk-...  (optional)
   ```

3. Run the proxy:
   ```bash
   cargo run --release
   ```

The server starts on `http://localhost:8080` by default.

### Configuration

Edit `config.toml` to customize providers, models, and settings:

```toml
[server]
port = 8080
host = "0.0.0.0"

[database]
path = "./data/llm_proxy.db"

[providers.openai]
enabled = true
default_model = "gpt-4o"

[providers.gemini]
enabled = true
default_model = "gemini-2.0-flash"

[providers.deepseek]
enabled = true
default_model = "deepseek-reasoner"

[providers.grok]
enabled = false
default_model = "grok-beta"

[providers.ollama]
enabled = false
base_url = "http://localhost:11434/v1"
```

## Management Dashboard

Access the dashboard at `http://localhost:8080`:

- **Overview:** Real-time request counters, system health, provider status.
- **Analytics:** Time-series charts, filterable by date, client, provider, and model.
- **Costs:** Budget tracking, cost breakdown by provider/client/model, projections.
- **Clients:** Create, revoke, and rotate API tokens; per-client usage stats.
- **Providers:** Enable/disable providers, test connections, configure API keys.
- **Monitoring:** Live request stream via WebSocket, response times, error rates.
- **Users:** Admin/user management with role-based access control.

### Default Credentials

- **Username:** `admin`
- **Password:** `admin123`

Change the admin password in the dashboard after first login!

## API Usage

The proxy is a drop-in replacement for OpenAI. Configure your client:

### Python
```python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="YOUR_CLIENT_API_KEY"  # Create in dashboard
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)
```

### Open WebUI
```
API Base URL: http://your-server:8080/v1
API Key: YOUR_CLIENT_API_KEY
```

### cURL
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_CLIENT_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": false
  }'
```

## Model Discovery

The proxy exposes `/v1/models` for OpenAI-compatible client model discovery:

```bash
curl http://localhost:8080/v1/models \
  -H "Authorization: Bearer YOUR_CLIENT_API_KEY"
```

## Troubleshooting

### Streaming Issues
If clients timeout or show "TransferEncodingError", ensure:
1. Proxy buffering is disabled in nginx: `proxy_buffering off;`
2. Chunked transfer is enabled: `chunked_transfer_encoding on;`
3. Timeouts are sufficient: `proxy_read_timeout 7200s;`

### Provider Errors
- Check API keys are set in `.env`
- Test provider in dashboard (Settings → Providers → Test)
- Review logs: `journalctl -u llm-proxy -f`

## License

MIT OR Apache-2.0