feat: add cache token tracking and cache-aware cost calculation
Some checks failed
CI / Check (push) Has been cancelled
CI / Clippy (push) Has been cancelled
CI / Formatting (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Release Build (push) Has been cancelled

Track cache_read_tokens and cache_write_tokens end-to-end: parse from
provider responses (OpenAI, DeepSeek, Grok, Gemini), persist to SQLite,
apply cache-aware pricing from the model registry, and surface in API
responses and the dashboard.

- Add cache fields to ProviderResponse, StreamUsage, RequestLog structs
- Parse cached_tokens (OpenAI/Grok), prompt_cache_hit/miss (DeepSeek),
  cachedContentTokenCount (Gemini) from provider responses
- Send stream_options.include_usage for streaming; capture real usage
  from final SSE chunk in AggregatingStream
- ALTER TABLE migration for cache_read_tokens/cache_write_tokens columns
- Cache-aware cost formula using registry cache_read/cache_write rates
- Update Provider trait calculate_cost signature across all providers
- Add cache_read_tokens/cache_write_tokens to Usage API response
- Dashboard: cache hit rate card, cache columns in pricing and usage
  tables, cache token aggregation in SQL queries
- Remove API debug panel and verbose console logging from api.js
- Bump static asset cache-bust to v5
This commit is contained in:
2026-03-02 14:45:21 -05:00
parent 232f092f27
commit db5824f0fb
19 changed files with 352 additions and 109 deletions

View File

@@ -31,7 +31,10 @@ class CostsPage {
avgDailyCost: data.total_cost / 30, // Simplified
costTrend: 5.2,
budgetUsed: Math.min(Math.round((data.total_cost / 100) * 100), 100), // Assuming $100 budget
projectedMonthEnd: data.today_cost * 30
projectedMonthEnd: data.today_cost * 30,
cacheReadTokens: data.total_cache_read_tokens || 0,
cacheWriteTokens: data.total_cache_write_tokens || 0,
totalTokens: data.total_tokens || 0,
};
this.renderCostStats();
@@ -44,6 +47,10 @@ class CostsPage {
renderCostStats() {
const container = document.getElementById('cost-stats');
if (!container) return;
const cacheHitRate = this.costData.totalTokens > 0
? ((this.costData.cacheReadTokens / this.costData.totalTokens) * 100).toFixed(1)
: '0.0';
container.innerHTML = `
<div class="stat-card">
@@ -74,6 +81,19 @@ class CostsPage {
</div>
</div>
<div class="stat-card">
<div class="stat-icon primary">
<i class="fas fa-bolt"></i>
</div>
<div class="stat-content">
<div class="stat-value">${cacheHitRate}%</div>
<div class="stat-label">Cache Hit Rate</div>
<div class="stat-change">
${window.api.formatNumber(this.costData.cacheReadTokens)} cached tokens
</div>
</div>
</div>
<div class="stat-card">
<div class="stat-icon danger">
<i class="fas fa-piggy-bank"></i>
@@ -181,15 +201,24 @@ class CostsPage {
const tableBody = document.querySelector('#pricing-table tbody');
if (!tableBody) return;
tableBody.innerHTML = data.map(row => `
<tr>
<td><span class="badge-client">${row.provider.toUpperCase()}</span></td>
<td><code class="code-sm">${row.id}</code></td>
<td>${window.api.formatCurrency(row.prompt_cost)} / 1M</td>
<td>${window.api.formatCurrency(row.completion_cost)} / 1M</td>
<td>Now</td>
</tr>
`).join('');
tableBody.innerHTML = data.map(row => {
const cacheRead = row.cache_read_cost != null
? `${window.api.formatCurrency(row.cache_read_cost)} / 1M`
: '<span style="color:var(--fg4)">--</span>';
const cacheWrite = row.cache_write_cost != null
? `${window.api.formatCurrency(row.cache_write_cost)} / 1M`
: '<span style="color:var(--fg4)">--</span>';
return `
<tr>
<td><span class="badge-client">${row.provider.toUpperCase()}</span></td>
<td><code class="code-sm">${row.id}</code></td>
<td>${window.api.formatCurrency(row.prompt_cost)} / 1M</td>
<td>${window.api.formatCurrency(row.completion_cost)} / 1M</td>
<td>${cacheRead}</td>
<td>${cacheWrite}</td>
</tr>
`;
}).join('');
}
setupEventListeners() {