chore: consolidate env files and update gitignore

Removed .env and .env.backup from git tracking and consolidated configuration into .env.example. Updated .gitignore to robustly prevent accidental inclusion of sensitive files.
fix(openai): unify tool call indexing for both standard and embedded calls
2026-03-19 10:44:22 -04:00 · 2026-03-18 18:31:24 +00:00 · 2026-03-18 18:26:27 +00:00 · 2026-03-18 18:05:37 +00:00 · 2026-03-18 17:48:55 +00:00 · 2026-03-18 16:13:13 +00:00
8 changed files with 184 additions and 77 deletions
@@ -1,28 +0,0 @@
-# LLM Proxy Gateway Environment Variables
-
-# OpenAI
-OPENAI_API_KEY=sk-demo-openai-key
-
-# Google Gemini
-GEMINI_API_KEY=AIza-demo-gemini-key
-
-# DeepSeek
-DEEPSEEK_API_KEY=sk-demo-deepseek-key
-
-# xAI Grok (not yet available)
-GROK_API_KEY=gk-demo-grok-key
-
-# Authentication tokens (comma-separated list)
-LLM_PROXY__SERVER__AUTH_TOKENS=demo-token-123456,another-token
-
-# Database path (optional)
-LLM_PROXY__DATABASE__PATH=./data/llm_proxy.db
-
-# Session Secret (for signed tokens)
-SESSION_SECRET=ki9khXAk9usDkasMrD2UbK4LOgrDRJz0
-
-# Encryption key (required)
-LLM_PROXY__ENCRYPTION_KEY=69879f5b7913ba169982190526ae213e830b3f1f33e785ef2b68cf48c7853fcd
-
-# Server port (optional)
-LLM_PROXY__SERVER__PORT=8080
@@ -1,22 +0,0 @@
-# LLM Proxy Gateway Environment Variables
-
-# OpenAI
-OPENAI_API_KEY=sk-demo-openai-key
-
-# Google Gemini
-GEMINI_API_KEY=AIza-demo-gemini-key
-
-# DeepSeek
-DEEPSEEK_API_KEY=sk-demo-deepseek-key
-
-# xAI Grok (not yet available)
-GROK_API_KEY=gk-demo-grok-key
-
-# Authentication tokens (comma-separated list)
-LLM_PROXY__SERVER__AUTH_TOKENS=demo-token-123456,another-token
-
-# Server port (optional)
-LLM_PROXY__SERVER__PORT=8080
-
-# Database path (optional)
-LLM_PROXY__DATABASE__PATH=./data/llm_proxy.db
@@ -1,28 +1,43 @@
-# LLM Proxy Gateway Environment Variables
-# Copy to .env and fill in your API keys
+# LLM Proxy Gateway Configuration Example
+# Copy this file to .env and fill in your values

-# MANDATORY: Encryption key for sessions and stored API keys
-# Must be a 32-byte hex or base64 encoded string
-# Example (hex): LLM_PROXY__ENCRYPTION_KEY=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
+# ==============================================================================
+# MANDATORY: Encryption & Security
+# ==============================================================================
+# A 32-byte hex or base64 encoded string used for session signing and 
+# database encryption.
+# Generate one with: openssl rand -hex 32
 LLM_PROXY__ENCRYPTION_KEY=your_secure_32_byte_key_here

-# LLM Provider API Keys (Standard Environment Variables)
-OPENAI_API_KEY=your_openai_api_key_here
-GEMINI_API_KEY=your_gemini_api_key_here
-DEEPSEEK_API_KEY=your_deepseek_api_key_here
-GROK_API_KEY=your_grok_api_key_here
+# ==============================================================================
+# LLM Provider API Keys
+# ==============================================================================
+OPENAI_API_KEY=sk-...
+GEMINI_API_KEY=AIza...
+DEEPSEEK_API_KEY=sk-...
+GROK_API_KEY=xai-...

+# ==============================================================================
+# Server Configuration
+# ==============================================================================
+LLM_PROXY__SERVER__PORT=8080
+LLM_PROXY__SERVER__HOST=0.0.0.0
+
+# Optional: Bearer tokens for client authentication (comma-separated)
+# If not set, the proxy will look up tokens in the database.
+# LLM_PROXY__SERVER__AUTH_TOKENS=token1,token2
+
+# ==============================================================================
+# Database Configuration
+# ==============================================================================
+LLM_PROXY__DATABASE__PATH=./data/llm_proxy.db
+LLM_PROXY__DATABASE__MAX_CONNECTIONS=10
+
+# ==============================================================================
 # Provider Overrides (Optional)
+# ==============================================================================
 # LLM_PROXY__PROVIDERS__OPENAI__BASE_URL=https://api.openai.com/v1
 # LLM_PROXY__PROVIDERS__GEMINI__ENABLED=true
 # LLM_PROXY__PROVIDERS__OLLAMA__BASE_URL=http://localhost:11434/v1
 # LLM_PROXY__PROVIDERS__OLLAMA__ENABLED=true
 # LLM_PROXY__PROVIDERS__OLLAMA__MODELS=llama3,mistral,llava
-
-# Server Configuration
-LLM_PROXY__SERVER__PORT=8080
-LLM_PROXY__SERVER__HOST=0.0.0.0
-
-# Database Configuration
-LLM_PROXY__DATABASE__PATH=./data/llm_proxy.db
-LLM_PROXY__DATABASE__MAX_CONNECTIONS=10
@@ -1,5 +1,11 @@
+.env
+.env.*
+!.env.example
 /target
-/.env
-/*.db
-/*.db-shm
-/*.db-wal
+/llm-proxy
+/llm-proxy-go
+*.db
+*.db-shm
+*.db-wal
+*.log
+server.pid
@@ -7,12 +7,15 @@
 - [x] Auth Middleware
 - [x] Basic Provider implementations (OpenAI, Gemini, DeepSeek, Grok)
 - [x] Streaming Support (SSE & Gemini custom streaming)
- [x] Move Rust files to `rust_backup`
+- [x] Archive Rust files to `rust` branch
+- [x] Clean root and set Go version as `main`
 - [x] Enhanced `helpers.go` for Multimodal & Tool Calling (OpenAI compatible)
 - [x] Enhanced `server.go` for robust request conversion
 - [x] Dashboard Management APIs (Clients, Tokens, Users, Providers)
 - [x] Dashboard Analytics & Usage Summary
 - [x] WebSocket for real-time dashboard updates
+- [x] Asynchronous Request Logging to SQLite
+- [x] Update documentation (README, deployment, architecture)

 ## Feature Parity Checklist (High Priority)

@@ -38,10 +41,9 @@
 - [x] Multimodal support

 ## Infrastructure & Middleware
- [ ] Implement Request Logging to SQLite (asynchronous)
 - [ ] Implement Rate Limiting (`golang.org/x/time/rate`)
 - [ ] Implement Circuit Breaker (`github.com/sony/gobreaker`)
- [ ] Implement Model Cost Calculation logic
+- [ ] Implement Model Cost Calculation logic (needs registry/pricing integration)

 ## Verification
 - [ ] Unit tests for feature-specific mapping (CoT, Tools, Images)
@@ -0,0 +1,58 @@
+package models
+
+type ModelRegistry struct {
+	Providers map[string]ProviderInfo `json:"-"`
+}
+
+type ProviderInfo struct {
+	ID     string                   `json:"id"`
+	Name   string                   `json:"name"`
+	Models map[string]ModelMetadata `json:"models"`
+}
+
+type ModelMetadata struct {
+	ID         string           `json:"id"`
+	Name       string           `json:"name"`
+	Cost       *ModelCost       `json:"cost,omitempty"`
+	Limit      *ModelLimit      `json:"limit,omitempty"`
+	Modalities *ModelModalities `json:"modalities,omitempty"`
+	ToolCall   *bool            `json:"tool_call,omitempty"`
+	Reasoning  *bool            `json:"reasoning,omitempty"`
+}
+
+type ModelCost struct {
+	Input      float64  `json:"input"`
+	Output     float64  `json:"output"`
+	CacheRead  *float64 `json:"cache_read,omitempty"`
+	CacheWrite *float64 `json:"cache_write,omitempty"`
+}
+
+type ModelLimit struct {
+	Context uint32 `json:"context"`
+	Output  uint32 `json:"output"`
+}
+
+type ModelModalities struct {
+	Input  []string `json:"input"`
+	Output []string `json:"output"`
+}
+
+func (r *ModelRegistry) FindModel(modelID string) *ModelMetadata {
+	// First try exact match in models map
+	for _, provider := range r.Providers {
+		if model, ok := provider.Models[modelID]; ok {
+			return &model
+		}
+	}
+
+	// Try searching by ID in metadata
+	for _, provider := range r.Providers {
+		for _, model := range provider.Models {
+			if model.ID == modelID {
+				return &model
+			}
+		}
+	}
+
+	return nil
+}
@@ -26,12 +26,20 @@ type Server struct {
 	sessions  *SessionManager
 	hub       *Hub
 	logger    *RequestLogger
+	registry  *models.ModelRegistry
 }

 func NewServer(cfg *config.Config, database *db.DB) *Server {
 	router := gin.Default()
 	hub := NewHub()
 	
+	// Fetch registry (non-blocking for startup if it fails, but we'll try once)
+	registry, err := utils.FetchRegistry()
+	if err != nil {
+		fmt.Printf("Warning: Failed to fetch initial model registry: %v\n", err)
+		registry = &models.ModelRegistry{Providers: make(map[string]models.ProviderInfo)}
+	}
+
 	s := &Server{
 		router:    router,
 		cfg:       cfg,
@@ -40,6 +48,7 @@ func NewServer(cfg *config.Config, database *db.DB) *Server {
 		sessions:  NewSessionManager(cfg.KeyBytes, 24*time.Hour),
 		hub:       hub,
 		logger:    NewRequestLogger(database, hub),
+		registry:  registry,
 	}

 	// Initialize providers
@@ -311,8 +320,9 @@ func (s *Server) logRequest(start time.Time, clientID, provider, model string, u
 		if usage.CacheWriteTokens != nil {
 			entry.CacheWriteTokens = *usage.CacheWriteTokens
 		}
-		// TODO: Calculate cost properly based on pricing
-		entry.Cost = 0.0 
+		
+		// Calculate cost using registry
+		entry.Cost = utils.CalculateCost(s.registry, model, entry.PromptTokens, entry.CompletionTokens, entry.CacheReadTokens, entry.CacheWriteTokens)
 	}

 	s.logger.LogRequest(entry)
@@ -321,6 +331,18 @@ func (s *Server) logRequest(start time.Time, clientID, provider, model string, u
 func (s *Server) Run() error {
 	go s.hub.Run()
 	s.logger.Start()
+	
+	// Start registry refresher
+	go func() {
+		ticker := time.NewTicker(24 * time.Hour)
+		for range ticker.C {
+			newRegistry, err := utils.FetchRegistry()
+			if err == nil {
+				s.registry = newRegistry
+			}
+		}
+	}()
+
 	addr := fmt.Sprintf("%s:%d", s.cfg.Server.Host, s.cfg.Server.Port)
 	return s.router.Run(addr)
 }
@@ -0,0 +1,54 @@
+package utils
+
+import (
+	"encoding/json"
+	"fmt"
+	"log"
+	"time"
+
+	"llm-proxy/internal/models"
+	"github.com/go-resty/resty/v2"
+)
+
+const ModelsDevURL = "https://models.dev/api.json"
+
+func FetchRegistry() (*models.ModelRegistry, error) {
+	log.Printf("Fetching model registry from %s", ModelsDevURL)
+
+	client := resty.New().SetTimeout(10 * time.Second)
+	resp, err := client.R().Get(ModelsDevURL)
+	if err != nil {
+		return nil, fmt.Errorf("failed to fetch registry: %w", err)
+	}
+
+	if !resp.IsSuccess() {
+		return nil, fmt.Errorf("failed to fetch registry: HTTP %d", resp.StatusCode())
+	}
+
+	var providers map[string]models.ProviderInfo
+	if err := json.Unmarshal(resp.Body(), &providers); err != nil {
+		return nil, fmt.Errorf("failed to unmarshal registry: %w", err)
+	}
+
+	log.Println("Successfully loaded model registry")
+	return &models.ModelRegistry{Providers: providers}, nil
+}
+
+func CalculateCost(registry *models.ModelRegistry, modelID string, promptTokens, completionTokens, cacheRead, cacheWrite uint32) float64 {
+	meta := registry.FindModel(modelID)
+	if meta == nil || meta.Cost == nil {
+		return 0.0
+	}
+
+	cost := (float64(promptTokens) * meta.Cost.Input / 1000000.0) +
+		(float64(completionTokens) * meta.Cost.Output / 1000000.0)
+
+	if meta.Cost.CacheRead != nil {
+		cost += float64(cacheRead) * (*meta.Cost.CacheRead) / 1000000.0
+	}
+	if meta.Cost.CacheWrite != nil {
+		cost += float64(cacheWrite) * (*meta.Cost.CacheWrite) / 1000000.0
+	}
+
+	return cost
+}
Author	SHA1	Message	Date
hobokenchicken	90874a6721	chore: consolidate env files and update gitignore CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details Removed .env and .env.backup from git tracking and consolidated configuration into .env.example. Updated .gitignore to robustly prevent accidental inclusion of sensitive files.	2026-03-19 10:44:22 -04:00
hobokenchicken	57aa0aa70e	fix(openai): unify tool call indexing for both standard and embedded calls CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - Sequential next_tool_index is now used for both Responses API 'function_call' events and the proxy's 'tool_uses' JSON extraction. - This ensures tool_calls arrays in the stream always start at index 0 and are dense, even if standard and embedded calls were somehow mixed. - Fixed 'payload_idx' logic to correctly align argument chunks with their initialization chunks.	2026-03-18 18:31:24 +00:00
hobokenchicken	4de457cc5e	fix(openai): correctly map tool_call indexes in Responses API stream - The OpenAI Responses API uses 'output_index' to identify items in the response. - If a response starts with text (output_index 0) followed by a tool call (output_index 1), the standard Chat Completions streaming format requires the first tool call to have index 0. - Previously, the proxy was passing output_index (1) as the tool_call index, causing client-side SDKs to fail parsing the stream and silently drop the tool calls. - Implemented a local mapping within the stream to ensure tool_call indexes are always dense and start at 0.	2026-03-18 18:26:27 +00:00
hobokenchicken	66e8b114b9	fix(openai): split embedded tool_calls into standard chunk format CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - Standard OpenAI clients expect tool_calls to be streamed as two parts: 1. Initialization chunk containing 'id', 'type', and 'name', with empty 'arguments'. 2. Payload chunk(s) containing 'arguments', with 'id', 'type', and 'name' omitted. - Previously, the proxy was yielding all fields in a single chunk when parsing the custom 'tool_uses' JSON from gpt-5.4, causing strict clients like opencode to fail silently when delegating parallel tasks. - The proxy now splits the extracted JSON into the correct two-chunk sequence, restoring subagent compatibility.	2026-03-18 18:05:37 +00:00
hobokenchicken	1cac45502a	fix(openai): fix stream whitespace loss and finish_reason for gpt-5.4 CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - Remove overzealous .trim() in strip_internal_metadata which destroyed whitespace between text stream chunks, causing client hangs - Fix finish_reason logic to only yield once at the end of the stream - Correctly yield finish_reason: 'tool_calls' instead of 'stop' when tool calls are generated	2026-03-18 17:48:55 +00:00
hobokenchicken	79dc8fe409	fix(openai): correctly parse Responses API tool call events CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - The Responses API does not use 'response.item.delta' for tool calls. - It uses 'response.output_item.added' to initialize the function call. - It uses 'response.function_call_arguments.delta' for the payload stream. - Updated the streaming parser to catch these events and correctly yield ToolCallDelta objects. - This restores proper streaming of tool calls back to the client.	2026-03-18 16:13:13 +00:00
hobokenchicken	24a898c9a7	fix(openai): gracefully handle stream endings - The Responses API ends streams without a final '[DONE]' message. - This causes reqwest_eventsource to return Error::StreamEnded. - Previously, this was treated as a premature termination, triggering an error probe. - We now explicitly match and break on Err(StreamEnded) for normal completion.	2026-03-18 15:39:18 +00:00
hobokenchicken	7c2a317c01	fix(openai): add missing stream parameter for Responses API CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - The OpenAI Responses API actually requires the 'stream: true' parameter in the JSON body, contrary to some documentation summaries. - Omitting it caused the API to return a full application/json response instead of SSE text/event-stream, leading to stream failures and probe warnings in the proxy logs.	2026-03-18 15:32:08 +00:00
hobokenchicken	cb619f9286	fix(openai): improve Responses API stream robustness and diagnostics CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - Implement final buffer flush in streaming path to prevent data loss - Increase probe response body logging to 500 characters - Ensure internal metadata is stripped even on final flush - Fix potential hang when stream ends without explicit [DONE] event	2026-03-18 15:17:56 +00:00
hobokenchicken	441270317c	fix(openai): strip internal metadata from gpt-5.4 responses CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - Add strip_internal_metadata helper to remove prefixes like 'to=multi_tool_use.parallel' - Clean up Thai text preambles reported in the journal - Apply metadata stripping to both synchronous and streaming response paths - Improve visual quality of proxied model responses	2026-03-18 15:07:17 +00:00
hobokenchicken	2e4318d84b	fix(openai): improve gpt-5.4 parallel tool call intercepting - Implement cross-delta content buffering in streaming Responses API - Wait for full 'tool_uses' JSON block before yielding to client - Handle 'to=multi_tool_use.parallel' preamble by buffering - Fix stream error probe to not request a new stream - Remove raw JSON leakage from streaming content	2026-03-18 15:04:15 +00:00
hobokenchicken	d0be16d8e3	fix(openai): parse embedded 'tool_uses' JSON for gpt-5.4 parallel calls CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - Add static parse_tool_uses_json helper to extract embedded tool calls - Update synchronous and streaming Responses API parsers to detect tool_uses blocks - Strip tool_uses JSON from content to prevent raw JSON leakage to client - Resolve lifetime issues by avoiding &self capture in streaming closure	2026-03-18 14:28:38 +00:00
hobokenchicken	83e0ad0240	fix(openai): flatten tools and tool_choice for Responses API CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - Map nested 'function' object to top-level fields - Support string and object-based 'tool_choice' formats - Fix 400 Bad Request 'Missing required parameter: tools[0].name'	2026-03-18 14:00:49 +00:00
hobokenchicken	275ce34d05	fix(openai): fix missing tools and instructions in Responses API CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - Add 'tools' and 'tool_choice' parameters to streaming Responses API - Include 'name' field in message items for Responses API input - Use string content for text-only messages to improve instruction following - Fix subagents not triggering and files not being created	2026-03-18 13:51:36 +00:00
hobokenchicken	cb5b921550	feat(openai): implement tool support for gpt-5.4 via Responses API CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - Implement polymorphic 'input' structure for /responses endpoint - Map 'tool' role to 'function_call_output' items - Handle assistant 'tool_calls' as separate 'function_call' items - Add synchronous and streaming parsers for function_call items - Fix 400 Bad Request 'Invalid value: tool' error	2026-03-18 13:14:51 +00:00