Commit Graph

9 Commits

Author SHA1 Message Date
db5824f0fb feat: add cache token tracking and cache-aware cost calculation
Some checks failed
CI / Check (push) Has been cancelled
CI / Clippy (push) Has been cancelled
CI / Formatting (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Release Build (push) Has been cancelled
Track cache_read_tokens and cache_write_tokens end-to-end: parse from
provider responses (OpenAI, DeepSeek, Grok, Gemini), persist to SQLite,
apply cache-aware pricing from the model registry, and surface in API
responses and the dashboard.

- Add cache fields to ProviderResponse, StreamUsage, RequestLog structs
- Parse cached_tokens (OpenAI/Grok), prompt_cache_hit/miss (DeepSeek),
  cachedContentTokenCount (Gemini) from provider responses
- Send stream_options.include_usage for streaming; capture real usage
  from final SSE chunk in AggregatingStream
- ALTER TABLE migration for cache_read_tokens/cache_write_tokens columns
- Cache-aware cost formula using registry cache_read/cache_write rates
- Update Provider trait calculate_cost signature across all providers
- Add cache_read_tokens/cache_write_tokens to Usage API response
- Dashboard: cache hit rate card, cache columns in pricing and usage
  tables, cache token aggregation in SQL queries
- Remove API debug panel and verbose console logging from api.js
- Bump static asset cache-bust to v5
2026-03-02 14:45:21 -05:00
232f092f27 fix(server): map provider names to registry keys for /v1/models
Some checks failed
CI / Check (push) Has been cancelled
CI / Clippy (push) Has been cancelled
CI / Formatting (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Release Build (push) Has been cancelled
The model registry from models.dev uses 'google' and 'xai' as provider
IDs, but internal providers use 'gemini' and 'grok'. Added mapping so
all provider models appear in the listing.
2026-03-02 14:14:56 -05:00
88aae389d2 feat(server): add /v1/models endpoint for OpenAI-compatible model discovery
Some checks failed
CI / Check (push) Has been cancelled
CI / Clippy (push) Has been cancelled
CI / Formatting (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Release Build (push) Has been cancelled
Open WebUI and other OpenAI-compatible clients call GET /v1/models to
discover available models. Lists all models from enabled providers via
the model registry, respects disabled models, and handles Ollama models
from TOML config.
2026-03-02 14:06:31 -05:00
8d50ce7c22 perf: eliminate per-request SQLite queries and optimize proxy latency
Some checks failed
CI / Check (push) Has been cancelled
CI / Clippy (push) Has been cancelled
CI / Formatting (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Release Build (push) Has been cancelled
- Add in-memory ModelConfigCache (30s refresh, explicit invalidation)
  replacing 2 SQLite queries per request (model lookup + cost override)
- Configure all 5 provider HTTP clients with proper timeouts (300s),
  connection pooling (4 idle/host, 90s idle timeout), and TCP keepalive
- Move client_usage update to tokio::spawn in non-streaming path
- Use fast chars/4 heuristic for token estimation on large inputs (>1KB)
- Generate single UUID/timestamp per SSE stream instead of per chunk
- Add shared LazyLock<Client> for image fetching in multimodal module
- Add proxy overhead timing instrumentation for both request paths
- Fix test helper to include new model_config_cache field
2026-03-02 12:53:22 -05:00
9318336f62 feat: add tool-calling passthrough for all providers
Some checks failed
CI / Check (push) Has been cancelled
CI / Clippy (push) Has been cancelled
CI / Formatting (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Release Build (push) Has been cancelled
Implement full OpenAI-compatible tool-calling support across the proxy,
enabling OpenCode to use llm-proxy as its sole LLM backend.

- Add 9 tool-calling types (Tool, FunctionDef, ToolChoice, ToolCall, etc.)
- Update ChatCompletionRequest/ChatMessage/ChatStreamDelta with tool fields
- Update UnifiedRequest/UnifiedMessage to carry tool data through the pipeline
- Shared helpers: messages_to_openai_json handles tool messages, build_openai_body
  includes tools/tool_choice, parse/stream extract tool_calls from responses
- Gemini: full OpenAI<->Gemini format translation (functionDeclarations,
  functionCall/functionResponse, synthetic call IDs, tool_config mapping)
- Gemini: extract duplicated message-conversion into shared convert_messages()
- Server: SSE streams include tool_calls deltas, finish_reason='tool_calls'
- AggregatingStream: accumulate tool call deltas across stream chunks
- OpenAI provider: add o4- prefix to supports_model()
2026-03-02 09:40:57 -05:00
2cdc49d7f2 refactor: comprehensive audit — fix bugs, harden security, deduplicate providers, add CI/Docker
Some checks failed
CI / Check (push) Has been cancelled
CI / Clippy (push) Has been cancelled
CI / Formatting (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Release Build (push) Has been cancelled
Phase 1: Fix compilation (config_path Option<PathBuf>, streaming test, stale test cleanup)
Phase 2: Fix critical bugs (remove block_on deadlocks in 4 providers, fix broken SQL query builder)
Phase 3: Security hardening (session manager, real auth, token masking, Gemini key to header, password policy)
Phase 4: Implement stubs (real provider test, /proc health metrics, client/provider/backup endpoints, has_images)
Phase 5: Code quality (shared provider helpers, explicit re-exports, all Clippy warnings fixed, unwrap removal, 6 unused deps removed, dashboard split into 7 sub-modules)
Phase 6: Infrastructure (GitHub Actions CI, multi-stage Dockerfile, rustfmt.toml, clippy.toml, script fixes)
2026-03-02 00:35:45 -05:00
3165aa1859 feat: implement web UI for provider and model configuration
- Added 'provider_configs' and 'model_configs' tables to database.
- Refactored ProviderManager to support thread-safe dynamic updates and database overrides.
- Implemented 'Models' tab in dashboard to manage model visibility, mapping, and pricing.
- Added provider configuration modal to 'Providers' tab.
- Integrated database overrides into chat completion logic (enabled state, mapping, and cost).
2026-02-26 18:13:04 -05:00
3aaa309d38 feat: enforce master token authentication and reasoning support
- Added strict token validation against LLM_PROXY__SERVER__AUTH_TOKENS.
- Integrated 'reasoning_content' support into providers and server responses.
- Updated AppState to carry valid auth tokens for request-time validation.
2026-02-26 14:12:51 -05:00
1755075657 chore: initial clean commit 2026-02-26 13:56:21 -05:00