Files
GopherGate/PLAN.md
T
hobokenchicken 8a8d8d1477
CI / Lint (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
fix: Phase 1 - security & stability patches
- AuthMiddleware now requires auth on /v1/* routes (returns 401)
- WebSocket origin check configurable via WSAllowedOrigin
- Removed debug fmt.Printf leaks (config, ollama, server)
- Registry access protected by sync.RWMutex (race condition fix)
- Session cleanup goroutine runs every 15 min
- RevokeSession returns error instead of silent no-op
2026-04-26 14:45:22 -04:00

203 lines
8.2 KiB
Markdown

# GopherGate — Remediation Plan
> 3 phases, 6 weeks total. Each phase independently shippable.
---
## Phase 1 — Security & Stability (Weeks 1-2)
**Goal:** Patch auth bypass, data races, debug leaks. No new features.
### 1.1 Fix auth bypass
- [ ] `middleware/auth.go`: Return 401 instead of `c.Next()` when no auth header on `/v1/*`
- [ ] Add `requireAuth` param to `AuthMiddleware` constructor: `AuthMiddleware(db, requireAuth bool)`
- [ ] `/v1/*` routes → `requireAuth=true`, leave `/health` unauthed
- [ ] Add tests: curl request without token → 401
### 1.2 Fix WebSocket origin
- [ ] `websocket.go`: Replace `return true` with origin check against configured `Server.Host`
- [ ] Config option `websocket.allowed_origins []string` (default: same origin)
- [ ] Add `xsrf` check on WS upgrade endpoint if behind proxy
### 1.3 Strip debug prints
- [ ] `config.go`: Remove `fmt.Printf("Debug Config:...")` and `fmt.Printf("Debug Env:...")`
- [ ] `server.go` `logRequest()`: Remove `fmt.Printf("[DEBUG] Request logged:...")`
- [ ] `config.go`: Remove `fmt.Printf("[DEBUG] Final Ollama Config:...")`
- [ ] `providers/ollama.go`: Remove `fmt.Printf("[Ollama]...")` debug logs or gate behind `LLM_PROXY_DEBUG=1`
- [ ] Replace all `fmt.Printf` with structured logger (slog from stdlib)
### 1.4 Fix registry data race
- [ ] `server.go`: Add `sync.RWMutex` around `s.registry`
- [ ] `handleListModels()`: Lock read
- [ ] `logRequest()`: Lock read
- [ ] Background refresh goroutines: Lock write
- [ ] Verify with `go run -race`
### 1.5 Session cleanup
- [ ] `sessions.go`: Add periodic cleanup goroutine for expired sessions
- [ ] Cleanup interval: every 15 minutes
- [ ] `RevokeSession`: Return error instead of silent no-op
---
## Phase 2 — Reliability & Observability (Weeks 3-4)
**Goal:** Error handling, timeouts, logging maturity, concurrency hardening.
### 2.1 Provider HTTP timeouts
- [ ] Each provider `New*Provider()`: Set `client.SetTimeout(30 * time.Second)` for non-stream
- [ ] Streaming: No timeout, but add `context.Context` cancellation from request
- [ ] `circuit_breaker.go`: Configure real thresholds
- `MaxRequests: 5`
- `Interval: 60 * time.Second`
- `Timeout: 30 * time.Second`
- `ReadyToTrip: func(counts) bool { return counts.ConsecutiveFailures > 3 }`
- [ ] Test: Stop Ollama, hit endpoint → circuit opens after 3 failures → auto-recovers after 30s
### 2.2 Structured logging (slog)
- [ ] Create `internal/logger/logger.go``slog.NewJSONHandler`
- [ ] Log levels: error/warn/info/debug
- [ ] Replace all `fmt.Printf` in: server, providers, config, logging
- [ ] `RequestLogger`: Use slog structured fields, remove manual JSON building
- [ ] Log channel: increase buffer from 100 to 10000 or use batch insert every 5s
### 2.3 Stream error propagation
- [ ] `ChatCompletionStream`: Send error chunks as SSE events, not just `fmt.Printf`
- [ ] Format: `data: {"error":"..."}\n\n`
- [ ] Client sees full error in stream instead of silent truncation
### 2.4 Registry fetch retry
- [ ] `FetchRegistry()`: Add retry with backoff (3 tries, 1s/2s/4s)
- [ ] Cache last-known-good registry so startup works offline
### 2.5 Token truncation safety
- [ ] `helpers.go`: Deep-copy ToolCall before truncation, don't mutate original
- [ ] Same pattern across all providers that sanitize IDs
### 2.6 RevokeSession error handling
- [ ] `RevokeSession(token)``RevokeSession(token) error`
- [ ] Update all callers to handle error
---
## Phase 3 — Architecture & Maintainability (Weeks 5-6)
**Goal:** Code splitting, test coverage, billing integrity.
### 3.1 Split dashboard.go
- [ ] Create `internal/server/clients.go` — client CRUD handlers
- [ ] Create `internal/server/providers.go` — provider handlers
- [ ] Create `internal/server/users.go` — user handlers
- [ ] Create `internal/server/analytics.go` — usage/analytics handlers
- [ ] Create `internal/server/system.go` — health, metrics, logs, backup
- [ ] `dashboard.go` shrinks to imports + route wiring only
### 3.2 Provider routing via config
- [ ] Replace `strings.Contains` routing table with config-driven model→provider map
- [ ] `config.go`: Add `server.model_routing` map (e.g. `"llama-*": "ollama"`)
- [ ] Fallback chain: explicit match → prefix match → glob match → default
- [ ] Backward-compat: keep old prefix logic as fallback
### 3.3 Billing integrity
- [ ] `logging.go`: Add idempotency key to log entries (unique request ID)
- [ ] Before deducting balance, check if `request_id` already processed
- [ ] `processLog`: Wrap in retry on serialization failure (SQLite busy)
- [ ] Credit deduction: move to separate async worker with replay protection
### 3.4 Add tests
- [ ] `internal/models/`: Unit tests for `FindModel()`, message conversion
- [ ] `internal/providers/helpers_test.go`: Unit tests for `MessagesToOpenAIJSON`, `ParseOpenAIResponse`
- [ ] `internal/utils/`: Tests for `Encrypt`/`Decrypt`, `CalculateCost`
- [ ] `internal/server/`: Integration test for auth flow (token → chat completion)
- [ ] `internal/middleware/`: Test auth bypass fix
- [ ] Goal: ≥40% coverage on non-UI packages
### 3.5 go.mod hygiene
- [ ] `go mod tidy` (done)
- [ ] Add `go vet ./...` to CI/pre-commit hook
- [ ] Pin dependencies with `go mod verify`
---
## Dependency Map
```
Phase 1 ──────────────────────────▶ Phase 2 ──────────────────────────▶ Phase 3
│ │ │
├─ 1.1 Auth bypass ──────────▶ 2.3 Stream errors (depends on auth) │
├─ 1.2 WS origin │ │
├─ 1.3 Debug prints │ │
├─ 1.4 Registry race │ │
├─ 1.5 Session cleanup │ │
│ ├─ 2.1 HTTP timeouts │
│ ├─ 2.2 Structured logging ───────────▶ 3.3 Billing (depends on good logs)
│ ├─ 2.4 Registry retry │
│ ├─ 2.5 Token truncation │
│ ├─ 2.6 RevokeSession errors │
│ │
│ ├─ 3.1 Split dashboard.go
│ ├─ 3.2 Config routing
│ ├─ 3.4 Tests
│ ├─ 3.5 go.mod hygiene
```
---
## Mermaid Gantt
```mermaid
gantt
title GopherGate Remediation
dateFormat YYYY-MM-DD
axisFormat %b %d
section Phase 1 — Security
Auth bypass fix :p1a, 2026-05-04, 2d
WS origin lock :p1b, after p1a, 1d
Strip debug prints :p1c, 2026-05-04, 2d
Registry race fix :p1d, after p1c, 1d
Session cleanup :p1e, after p1d, 2d
section Phase 2 — Reliability
HTTP timeouts + CB :p2a, 2026-05-11, 3d
Structured logging :p2b, 2026-05-11, 3d
Stream error propagation :p2c, after p2a, 1d
Registry retry :p2d, after p2b, 1d
Token truncation fix :p2e, after p2a, 1d
RevokeSession errors :p2f, after p2b, 1d
section Phase 3 — Architecture
Split dashboard.go :p3a, 2026-05-25, 4d
Config-driven routing :p3b, 2026-05-25, 3d
Billing integrity :p3c, after p3a, 3d
Add tests :p3d, 2026-06-01, 5d
go.mod hygiene :p3e, after p3d, 1d
```
---
## Immediate Next Action
**Start 1.1 — Fix auth bypass:**
- Edit `middleware/auth.go` → change `c.Next()` to `c.AbortWithStatusJSON(401, ...)` when no header
- Add `RequireAuth` bool param
- Update `server.go` `setupRoutes()` to pass `requireAuth=true` for `/v1/*`
- `curl localhost:8080/v1/chat/completions -d '{}'` → 401