8a8d8d1477
- AuthMiddleware now requires auth on /v1/* routes (returns 401) - WebSocket origin check configurable via WSAllowedOrigin - Removed debug fmt.Printf leaks (config, ollama, server) - Registry access protected by sync.RWMutex (race condition fix) - Session cleanup goroutine runs every 15 min - RevokeSession returns error instead of silent no-op
203 lines
8.2 KiB
Markdown
203 lines
8.2 KiB
Markdown
# GopherGate — Remediation Plan
|
|
|
|
> 3 phases, 6 weeks total. Each phase independently shippable.
|
|
|
|
---
|
|
|
|
## Phase 1 — Security & Stability (Weeks 1-2)
|
|
|
|
**Goal:** Patch auth bypass, data races, debug leaks. No new features.
|
|
|
|
### 1.1 Fix auth bypass
|
|
|
|
- [ ] `middleware/auth.go`: Return 401 instead of `c.Next()` when no auth header on `/v1/*`
|
|
- [ ] Add `requireAuth` param to `AuthMiddleware` constructor: `AuthMiddleware(db, requireAuth bool)`
|
|
- [ ] `/v1/*` routes → `requireAuth=true`, leave `/health` unauthed
|
|
- [ ] Add tests: curl request without token → 401
|
|
|
|
### 1.2 Fix WebSocket origin
|
|
|
|
- [ ] `websocket.go`: Replace `return true` with origin check against configured `Server.Host`
|
|
- [ ] Config option `websocket.allowed_origins []string` (default: same origin)
|
|
- [ ] Add `xsrf` check on WS upgrade endpoint if behind proxy
|
|
|
|
### 1.3 Strip debug prints
|
|
|
|
- [ ] `config.go`: Remove `fmt.Printf("Debug Config:...")` and `fmt.Printf("Debug Env:...")`
|
|
- [ ] `server.go` `logRequest()`: Remove `fmt.Printf("[DEBUG] Request logged:...")`
|
|
- [ ] `config.go`: Remove `fmt.Printf("[DEBUG] Final Ollama Config:...")`
|
|
- [ ] `providers/ollama.go`: Remove `fmt.Printf("[Ollama]...")` debug logs or gate behind `LLM_PROXY_DEBUG=1`
|
|
- [ ] Replace all `fmt.Printf` with structured logger (slog from stdlib)
|
|
|
|
### 1.4 Fix registry data race
|
|
|
|
- [ ] `server.go`: Add `sync.RWMutex` around `s.registry`
|
|
- [ ] `handleListModels()`: Lock read
|
|
- [ ] `logRequest()`: Lock read
|
|
- [ ] Background refresh goroutines: Lock write
|
|
- [ ] Verify with `go run -race`
|
|
|
|
### 1.5 Session cleanup
|
|
|
|
- [ ] `sessions.go`: Add periodic cleanup goroutine for expired sessions
|
|
- [ ] Cleanup interval: every 15 minutes
|
|
- [ ] `RevokeSession`: Return error instead of silent no-op
|
|
|
|
---
|
|
|
|
## Phase 2 — Reliability & Observability (Weeks 3-4)
|
|
|
|
**Goal:** Error handling, timeouts, logging maturity, concurrency hardening.
|
|
|
|
### 2.1 Provider HTTP timeouts
|
|
|
|
- [ ] Each provider `New*Provider()`: Set `client.SetTimeout(30 * time.Second)` for non-stream
|
|
- [ ] Streaming: No timeout, but add `context.Context` cancellation from request
|
|
- [ ] `circuit_breaker.go`: Configure real thresholds
|
|
- `MaxRequests: 5`
|
|
- `Interval: 60 * time.Second`
|
|
- `Timeout: 30 * time.Second`
|
|
- `ReadyToTrip: func(counts) bool { return counts.ConsecutiveFailures > 3 }`
|
|
- [ ] Test: Stop Ollama, hit endpoint → circuit opens after 3 failures → auto-recovers after 30s
|
|
|
|
### 2.2 Structured logging (slog)
|
|
|
|
- [ ] Create `internal/logger/logger.go` — `slog.NewJSONHandler`
|
|
- [ ] Log levels: error/warn/info/debug
|
|
- [ ] Replace all `fmt.Printf` in: server, providers, config, logging
|
|
- [ ] `RequestLogger`: Use slog structured fields, remove manual JSON building
|
|
- [ ] Log channel: increase buffer from 100 to 10000 or use batch insert every 5s
|
|
|
|
### 2.3 Stream error propagation
|
|
|
|
- [ ] `ChatCompletionStream`: Send error chunks as SSE events, not just `fmt.Printf`
|
|
- [ ] Format: `data: {"error":"..."}\n\n`
|
|
- [ ] Client sees full error in stream instead of silent truncation
|
|
|
|
### 2.4 Registry fetch retry
|
|
|
|
- [ ] `FetchRegistry()`: Add retry with backoff (3 tries, 1s/2s/4s)
|
|
- [ ] Cache last-known-good registry so startup works offline
|
|
|
|
### 2.5 Token truncation safety
|
|
|
|
- [ ] `helpers.go`: Deep-copy ToolCall before truncation, don't mutate original
|
|
- [ ] Same pattern across all providers that sanitize IDs
|
|
|
|
### 2.6 RevokeSession error handling
|
|
|
|
- [ ] `RevokeSession(token)` → `RevokeSession(token) error`
|
|
- [ ] Update all callers to handle error
|
|
|
|
---
|
|
|
|
## Phase 3 — Architecture & Maintainability (Weeks 5-6)
|
|
|
|
**Goal:** Code splitting, test coverage, billing integrity.
|
|
|
|
### 3.1 Split dashboard.go
|
|
|
|
- [ ] Create `internal/server/clients.go` — client CRUD handlers
|
|
- [ ] Create `internal/server/providers.go` — provider handlers
|
|
- [ ] Create `internal/server/users.go` — user handlers
|
|
- [ ] Create `internal/server/analytics.go` — usage/analytics handlers
|
|
- [ ] Create `internal/server/system.go` — health, metrics, logs, backup
|
|
- [ ] `dashboard.go` shrinks to imports + route wiring only
|
|
|
|
### 3.2 Provider routing via config
|
|
|
|
- [ ] Replace `strings.Contains` routing table with config-driven model→provider map
|
|
- [ ] `config.go`: Add `server.model_routing` map (e.g. `"llama-*": "ollama"`)
|
|
- [ ] Fallback chain: explicit match → prefix match → glob match → default
|
|
- [ ] Backward-compat: keep old prefix logic as fallback
|
|
|
|
### 3.3 Billing integrity
|
|
|
|
- [ ] `logging.go`: Add idempotency key to log entries (unique request ID)
|
|
- [ ] Before deducting balance, check if `request_id` already processed
|
|
- [ ] `processLog`: Wrap in retry on serialization failure (SQLite busy)
|
|
- [ ] Credit deduction: move to separate async worker with replay protection
|
|
|
|
### 3.4 Add tests
|
|
|
|
- [ ] `internal/models/`: Unit tests for `FindModel()`, message conversion
|
|
- [ ] `internal/providers/helpers_test.go`: Unit tests for `MessagesToOpenAIJSON`, `ParseOpenAIResponse`
|
|
- [ ] `internal/utils/`: Tests for `Encrypt`/`Decrypt`, `CalculateCost`
|
|
- [ ] `internal/server/`: Integration test for auth flow (token → chat completion)
|
|
- [ ] `internal/middleware/`: Test auth bypass fix
|
|
- [ ] Goal: ≥40% coverage on non-UI packages
|
|
|
|
### 3.5 go.mod hygiene
|
|
|
|
- [ ] `go mod tidy` (done)
|
|
- [ ] Add `go vet ./...` to CI/pre-commit hook
|
|
- [ ] Pin dependencies with `go mod verify`
|
|
|
|
---
|
|
|
|
## Dependency Map
|
|
|
|
```
|
|
Phase 1 ──────────────────────────▶ Phase 2 ──────────────────────────▶ Phase 3
|
|
│ │ │
|
|
├─ 1.1 Auth bypass ──────────▶ 2.3 Stream errors (depends on auth) │
|
|
├─ 1.2 WS origin │ │
|
|
├─ 1.3 Debug prints │ │
|
|
├─ 1.4 Registry race │ │
|
|
├─ 1.5 Session cleanup │ │
|
|
│ ├─ 2.1 HTTP timeouts │
|
|
│ ├─ 2.2 Structured logging ───────────▶ 3.3 Billing (depends on good logs)
|
|
│ ├─ 2.4 Registry retry │
|
|
│ ├─ 2.5 Token truncation │
|
|
│ ├─ 2.6 RevokeSession errors │
|
|
│ │
|
|
│ ├─ 3.1 Split dashboard.go
|
|
│ ├─ 3.2 Config routing
|
|
│ ├─ 3.4 Tests
|
|
│ ├─ 3.5 go.mod hygiene
|
|
```
|
|
|
|
---
|
|
|
|
## Mermaid Gantt
|
|
|
|
```mermaid
|
|
gantt
|
|
title GopherGate Remediation
|
|
dateFormat YYYY-MM-DD
|
|
axisFormat %b %d
|
|
|
|
section Phase 1 — Security
|
|
Auth bypass fix :p1a, 2026-05-04, 2d
|
|
WS origin lock :p1b, after p1a, 1d
|
|
Strip debug prints :p1c, 2026-05-04, 2d
|
|
Registry race fix :p1d, after p1c, 1d
|
|
Session cleanup :p1e, after p1d, 2d
|
|
|
|
section Phase 2 — Reliability
|
|
HTTP timeouts + CB :p2a, 2026-05-11, 3d
|
|
Structured logging :p2b, 2026-05-11, 3d
|
|
Stream error propagation :p2c, after p2a, 1d
|
|
Registry retry :p2d, after p2b, 1d
|
|
Token truncation fix :p2e, after p2a, 1d
|
|
RevokeSession errors :p2f, after p2b, 1d
|
|
|
|
section Phase 3 — Architecture
|
|
Split dashboard.go :p3a, 2026-05-25, 4d
|
|
Config-driven routing :p3b, 2026-05-25, 3d
|
|
Billing integrity :p3c, after p3a, 3d
|
|
Add tests :p3d, 2026-06-01, 5d
|
|
go.mod hygiene :p3e, after p3d, 1d
|
|
```
|
|
|
|
---
|
|
|
|
## Immediate Next Action
|
|
|
|
**Start 1.1 — Fix auth bypass:**
|
|
|
|
- Edit `middleware/auth.go` → change `c.Next()` to `c.AbortWithStatusJSON(401, ...)` when no header
|
|
- Add `RequireAuth` bool param
|
|
- Update `server.go` `setupRoutes()` to pass `requireAuth=true` for `/v1/*`
|
|
- `curl localhost:8080/v1/chat/completions -d '{}'` → 401
|