chore: consolidate env files and update gitignore

Removed .env and .env.backup from git tracking and consolidated configuration into .env.example. Updated .gitignore to robustly prevent accidental inclusion of sensitive files.
fix(openai): unify tool call indexing for both standard and embedded calls
2026-03-19 10:44:22 -04:00 · 2026-03-18 18:31:24 +00:00 · 2026-03-18 18:26:27 +00:00 · 2026-03-18 18:05:37 +00:00 · 2026-03-18 17:48:55 +00:00 · 2026-03-18 16:13:13 +00:00
8 changed files with 184 additions and 77 deletions
@@ -1,28 +0,0 @@
 # LLM Proxy Gateway Environment Variables
 # OpenAI
 OPENAI_API_KEY=sk-demo-openai-key
 # Google Gemini
 GEMINI_API_KEY=AIza-demo-gemini-key
 # DeepSeek
 DEEPSEEK_API_KEY=sk-demo-deepseek-key
 # xAI Grok (not yet available)
 GROK_API_KEY=gk-demo-grok-key
 # Authentication tokens (comma-separated list)
 LLM_PROXY__SERVER__AUTH_TOKENS=demo-token-123456,another-token
 # Database path (optional)
 LLM_PROXY__DATABASE__PATH=./data/llm_proxy.db
 # Session Secret (for signed tokens)
 SESSION_SECRET=ki9khXAk9usDkasMrD2UbK4LOgrDRJz0
 # Encryption key (required)
 LLM_PROXY__ENCRYPTION_KEY=69879f5b7913ba169982190526ae213e830b3f1f33e785ef2b68cf48c7853fcd
 # Server port (optional)
 LLM_PROXY__SERVER__PORT=8080
@@ -1,22 +0,0 @@
 # LLM Proxy Gateway Environment Variables
 # OpenAI
 OPENAI_API_KEY=sk-demo-openai-key
 # Google Gemini
 GEMINI_API_KEY=AIza-demo-gemini-key
 # DeepSeek
 DEEPSEEK_API_KEY=sk-demo-deepseek-key
 # xAI Grok (not yet available)
 GROK_API_KEY=gk-demo-grok-key
 # Authentication tokens (comma-separated list)
 LLM_PROXY__SERVER__AUTH_TOKENS=demo-token-123456,another-token
 # Server port (optional)
 LLM_PROXY__SERVER__PORT=8080
 # Database path (optional)
 LLM_PROXY__DATABASE__PATH=./data/llm_proxy.db
@@ -1,28 +1,43 @@
-# LLM Proxy Gateway Environment Variables
+# LLM Proxy Gateway Configuration Example
-# Copy to .env and fill in your API keys
+# Copy this file to .env and fill in your values
-# MANDATORY: Encryption key for sessions and stored API keys
+# ==============================================================================
-# Must be a 32-byte hex or base64 encoded string
+# MANDATORY: Encryption & Security
-# Example (hex): LLM_PROXY__ENCRYPTION_KEY=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
+# ==============================================================================
 # A 32-byte hex or base64 encoded string used for session signing and 
 # database encryption.
 # Generate one with: openssl rand -hex 32
 LLM_PROXY__ENCRYPTION_KEY=your_secure_32_byte_key_here
-# LLM Provider API Keys (Standard Environment Variables)
+# ==============================================================================
-OPENAI_API_KEY=your_openai_api_key_here
+# LLM Provider API Keys
-GEMINI_API_KEY=your_gemini_api_key_here
+# ==============================================================================
-DEEPSEEK_API_KEY=your_deepseek_api_key_here
+OPENAI_API_KEY=sk-...
-GROK_API_KEY=your_grok_api_key_here
+GEMINI_API_KEY=AIza...
 DEEPSEEK_API_KEY=sk-...
 GROK_API_KEY=xai-...
 # ==============================================================================
 # Server Configuration
 # ==============================================================================
 LLM_PROXY__SERVER__PORT=8080
 LLM_PROXY__SERVER__HOST=0.0.0.0
 # Optional: Bearer tokens for client authentication (comma-separated)
 # If not set, the proxy will look up tokens in the database.
 # LLM_PROXY__SERVER__AUTH_TOKENS=token1,token2
 # ==============================================================================
 # Database Configuration
 # ==============================================================================
 LLM_PROXY__DATABASE__PATH=./data/llm_proxy.db
 LLM_PROXY__DATABASE__MAX_CONNECTIONS=10
 # ==============================================================================
 # Provider Overrides (Optional)
 # ==============================================================================
 # LLM_PROXY__PROVIDERS__OPENAI__BASE_URL=https://api.openai.com/v1
 # LLM_PROXY__PROVIDERS__GEMINI__ENABLED=true
 # LLM_PROXY__PROVIDERS__OLLAMA__BASE_URL=http://localhost:11434/v1
 # LLM_PROXY__PROVIDERS__OLLAMA__ENABLED=true
 # LLM_PROXY__PROVIDERS__OLLAMA__MODELS=llama3,mistral,llava
 # Server Configuration
 LLM_PROXY__SERVER__PORT=8080
 LLM_PROXY__SERVER__HOST=0.0.0.0
 # Database Configuration
 LLM_PROXY__DATABASE__PATH=./data/llm_proxy.db
 LLM_PROXY__DATABASE__MAX_CONNECTIONS=10
@@ -1,5 +1,11 @@
 .env
 .env.*
 !.env.example
 /target
-/.env
+/llm-proxy
-/*.db
+/llm-proxy-go
-/*.db-shm
+*.db
-/*.db-wal
+*.db-shm
 *.db-wal
 *.log
 server.pid
@@ -7,12 +7,15 @@
 - [x] Auth Middleware
 - [x] Basic Provider implementations (OpenAI, Gemini, DeepSeek, Grok)
 - [x] Streaming Support (SSE & Gemini custom streaming)
- [x] Move Rust files to `rust_backup`
+- [x] Archive Rust files to `rust` branch
 - [x] Clean root and set Go version as `main`
 - [x] Enhanced `helpers.go` for Multimodal & Tool Calling (OpenAI compatible)
 - [x] Enhanced `server.go` for robust request conversion
 - [x] Dashboard Management APIs (Clients, Tokens, Users, Providers)
 - [x] Dashboard Analytics & Usage Summary
 - [x] WebSocket for real-time dashboard updates
 - [x] Asynchronous Request Logging to SQLite
 - [x] Update documentation (README, deployment, architecture)
 ## Feature Parity Checklist (High Priority)
@@ -38,10 +41,9 @@
 - [x] Multimodal support
 ## Infrastructure & Middleware
 - [ ] Implement Request Logging to SQLite (asynchronous)
 - [ ] Implement Rate Limiting (`golang.org/x/time/rate`)
 - [ ] Implement Circuit Breaker (`github.com/sony/gobreaker`)
- [ ] Implement Model Cost Calculation logic
+- [ ] Implement Model Cost Calculation logic (needs registry/pricing integration)
 ## Verification
 - [ ] Unit tests for feature-specific mapping (CoT, Tools, Images)
@@ -0,0 +1,58 @@
 package models
 type ModelRegistry struct {
 	Providers map[string]ProviderInfo `json:"-"`
 }
 type ProviderInfo struct {
 	ID     string                   `json:"id"`
 	Name   string                   `json:"name"`
 	Models map[string]ModelMetadata `json:"models"`
 }
 type ModelMetadata struct {
 	ID         string           `json:"id"`
 	Name       string           `json:"name"`
 	Cost       *ModelCost       `json:"cost,omitempty"`
 	Limit      *ModelLimit      `json:"limit,omitempty"`
 	Modalities *ModelModalities `json:"modalities,omitempty"`
 	ToolCall   *bool            `json:"tool_call,omitempty"`
 	Reasoning  *bool            `json:"reasoning,omitempty"`
 }
 type ModelCost struct {
 	Input      float64  `json:"input"`
 	Output     float64  `json:"output"`
 	CacheRead  *float64 `json:"cache_read,omitempty"`
 	CacheWrite *float64 `json:"cache_write,omitempty"`
 }
 type ModelLimit struct {
 	Context uint32 `json:"context"`
 	Output  uint32 `json:"output"`
 }
 type ModelModalities struct {
 	Input  []string `json:"input"`
 	Output []string `json:"output"`
 }
 func (r *ModelRegistry) FindModel(modelID string) *ModelMetadata {
 	// First try exact match in models map
 	for _, provider := range r.Providers {
 		if model, ok := provider.Models[modelID]; ok {
 			return &model
 		}
 	}
 	// Try searching by ID in metadata
 	for _, provider := range r.Providers {
 		for _, model := range provider.Models {
 			if model.ID == modelID {
 				return &model
 			}
 		}
 	}
 	return nil
 }
@@ -26,12 +26,20 @@ type Server struct {
 	sessions  *SessionManager
 	hub       *Hub
 	logger    *RequestLogger
 	registry  *models.ModelRegistry
 }
 func NewServer(cfg *config.Config, database *db.DB) *Server {
 	router := gin.Default()
 	hub := NewHub()
 	// Fetch registry (non-blocking for startup if it fails, but we'll try once)
 	registry, err := utils.FetchRegistry()
 	if err != nil {
 		fmt.Printf("Warning: Failed to fetch initial model registry: %v\n", err)
 		registry = &models.ModelRegistry{Providers: make(map[string]models.ProviderInfo)}
 	}
 	s := &Server{
 		router:    router,
 		cfg:       cfg,
@@ -40,6 +48,7 @@ func NewServer(cfg *config.Config, database *db.DB) *Server {
 		sessions:  NewSessionManager(cfg.KeyBytes, 24*time.Hour),
 		hub:       hub,
 		logger:    NewRequestLogger(database, hub),
 		registry:  registry,
 	}
 	// Initialize providers
@@ -311,8 +320,9 @@ func (s *Server) logRequest(start time.Time, clientID, provider, model string, u
 		if usage.CacheWriteTokens != nil {
 			entry.CacheWriteTokens = *usage.CacheWriteTokens
 		}
-		// TODO: Calculate cost properly based on pricing
+		
-		entry.Cost = 0.0 
+		// Calculate cost using registry
 		entry.Cost = utils.CalculateCost(s.registry, model, entry.PromptTokens, entry.CompletionTokens, entry.CacheReadTokens, entry.CacheWriteTokens)
 	}
 	s.logger.LogRequest(entry)
@@ -321,6 +331,18 @@ func (s *Server) logRequest(start time.Time, clientID, provider, model string, u
 func (s *Server) Run() error {
 	go s.hub.Run()
 	s.logger.Start()
 	// Start registry refresher
 	go func() {
 		ticker := time.NewTicker(24 * time.Hour)
 		for range ticker.C {
 			newRegistry, err := utils.FetchRegistry()
 			if err == nil {
 				s.registry = newRegistry
 			}
 		}
 	}()
 	addr := fmt.Sprintf("%s:%d", s.cfg.Server.Host, s.cfg.Server.Port)
 	return s.router.Run(addr)
 }
@@ -0,0 +1,54 @@
 package utils
 import (
 	"encoding/json"
 	"fmt"
 	"log"
 	"time"
 	"llm-proxy/internal/models"
 	"github.com/go-resty/resty/v2"
 )
 const ModelsDevURL = "https://models.dev/api.json"
 func FetchRegistry() (*models.ModelRegistry, error) {
 	log.Printf("Fetching model registry from %s", ModelsDevURL)
 	client := resty.New().SetTimeout(10 * time.Second)
 	resp, err := client.R().Get(ModelsDevURL)
 	if err != nil {
 		return nil, fmt.Errorf("failed to fetch registry: %w", err)
 	}
 	if !resp.IsSuccess() {
 		return nil, fmt.Errorf("failed to fetch registry: HTTP %d", resp.StatusCode())
 	}
 	var providers map[string]models.ProviderInfo
 	if err := json.Unmarshal(resp.Body(), &providers); err != nil {
 		return nil, fmt.Errorf("failed to unmarshal registry: %w", err)
 	}
 	log.Println("Successfully loaded model registry")
 	return &models.ModelRegistry{Providers: providers}, nil
 }
 func CalculateCost(registry *models.ModelRegistry, modelID string, promptTokens, completionTokens, cacheRead, cacheWrite uint32) float64 {
 	meta := registry.FindModel(modelID)
 	if meta == nil || meta.Cost == nil {
 		return 0.0
 	}
 	cost := (float64(promptTokens) * meta.Cost.Input / 1000000.0) +
 		(float64(completionTokens) * meta.Cost.Output / 1000000.0)
 	if meta.Cost.CacheRead != nil {
 		cost += float64(cacheRead) * (*meta.Cost.CacheRead) / 1000000.0
 	}
 	if meta.Cost.CacheWrite != nil {
 		cost += float64(cacheWrite) * (*meta.Cost.CacheWrite) / 1000000.0
 	}
 	return cost
 }
Author	SHA1	Message	Date
hobokenchicken	90874a6721	chore: consolidate env files and update gitignore CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details Removed .env and .env.backup from git tracking and consolidated configuration into .env.example. Updated .gitignore to robustly prevent accidental inclusion of sensitive files.	2026-03-19 10:44:22 -04:00
hobokenchicken	57aa0aa70e	fix(openai): unify tool call indexing for both standard and embedded calls CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - Sequential next_tool_index is now used for both Responses API 'function_call' events and the proxy's 'tool_uses' JSON extraction. - This ensures tool_calls arrays in the stream always start at index 0 and are dense, even if standard and embedded calls were somehow mixed. - Fixed 'payload_idx' logic to correctly align argument chunks with their initialization chunks.	2026-03-18 18:31:24 +00:00
hobokenchicken	4de457cc5e	fix(openai): correctly map tool_call indexes in Responses API stream - The OpenAI Responses API uses 'output_index' to identify items in the response. - If a response starts with text (output_index 0) followed by a tool call (output_index 1), the standard Chat Completions streaming format requires the first tool call to have index 0. - Previously, the proxy was passing output_index (1) as the tool_call index, causing client-side SDKs to fail parsing the stream and silently drop the tool calls. - Implemented a local mapping within the stream to ensure tool_call indexes are always dense and start at 0.	2026-03-18 18:26:27 +00:00
hobokenchicken	66e8b114b9	fix(openai): split embedded tool_calls into standard chunk format CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - Standard OpenAI clients expect tool_calls to be streamed as two parts: 1. Initialization chunk containing 'id', 'type', and 'name', with empty 'arguments'. 2. Payload chunk(s) containing 'arguments', with 'id', 'type', and 'name' omitted. - Previously, the proxy was yielding all fields in a single chunk when parsing the custom 'tool_uses' JSON from gpt-5.4, causing strict clients like opencode to fail silently when delegating parallel tasks. - The proxy now splits the extracted JSON into the correct two-chunk sequence, restoring subagent compatibility.	2026-03-18 18:05:37 +00:00
hobokenchicken	1cac45502a	fix(openai): fix stream whitespace loss and finish_reason for gpt-5.4 CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - Remove overzealous .trim() in strip_internal_metadata which destroyed whitespace between text stream chunks, causing client hangs - Fix finish_reason logic to only yield once at the end of the stream - Correctly yield finish_reason: 'tool_calls' instead of 'stop' when tool calls are generated	2026-03-18 17:48:55 +00:00
hobokenchicken	79dc8fe409	fix(openai): correctly parse Responses API tool call events CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - The Responses API does not use 'response.item.delta' for tool calls. - It uses 'response.output_item.added' to initialize the function call. - It uses 'response.function_call_arguments.delta' for the payload stream. - Updated the streaming parser to catch these events and correctly yield ToolCallDelta objects. - This restores proper streaming of tool calls back to the client.	2026-03-18 16:13:13 +00:00
hobokenchicken	24a898c9a7	fix(openai): gracefully handle stream endings - The Responses API ends streams without a final '[DONE]' message. - This causes reqwest_eventsource to return Error::StreamEnded. - Previously, this was treated as a premature termination, triggering an error probe. - We now explicitly match and break on Err(StreamEnded) for normal completion.	2026-03-18 15:39:18 +00:00
hobokenchicken	7c2a317c01	fix(openai): add missing stream parameter for Responses API CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - The OpenAI Responses API actually requires the 'stream: true' parameter in the JSON body, contrary to some documentation summaries. - Omitting it caused the API to return a full application/json response instead of SSE text/event-stream, leading to stream failures and probe warnings in the proxy logs.	2026-03-18 15:32:08 +00:00
hobokenchicken	cb619f9286	fix(openai): improve Responses API stream robustness and diagnostics CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - Implement final buffer flush in streaming path to prevent data loss - Increase probe response body logging to 500 characters - Ensure internal metadata is stripped even on final flush - Fix potential hang when stream ends without explicit [DONE] event	2026-03-18 15:17:56 +00:00
hobokenchicken	441270317c	fix(openai): strip internal metadata from gpt-5.4 responses CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - Add strip_internal_metadata helper to remove prefixes like 'to=multi_tool_use.parallel' - Clean up Thai text preambles reported in the journal - Apply metadata stripping to both synchronous and streaming response paths - Improve visual quality of proxied model responses	2026-03-18 15:07:17 +00:00
hobokenchicken	2e4318d84b	fix(openai): improve gpt-5.4 parallel tool call intercepting - Implement cross-delta content buffering in streaming Responses API - Wait for full 'tool_uses' JSON block before yielding to client - Handle 'to=multi_tool_use.parallel' preamble by buffering - Fix stream error probe to not request a new stream - Remove raw JSON leakage from streaming content	2026-03-18 15:04:15 +00:00
hobokenchicken	d0be16d8e3	fix(openai): parse embedded 'tool_uses' JSON for gpt-5.4 parallel calls CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - Add static parse_tool_uses_json helper to extract embedded tool calls - Update synchronous and streaming Responses API parsers to detect tool_uses blocks - Strip tool_uses JSON from content to prevent raw JSON leakage to client - Resolve lifetime issues by avoiding &self capture in streaming closure	2026-03-18 14:28:38 +00:00
hobokenchicken	83e0ad0240	fix(openai): flatten tools and tool_choice for Responses API CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - Map nested 'function' object to top-level fields - Support string and object-based 'tool_choice' formats - Fix 400 Bad Request 'Missing required parameter: tools[0].name'	2026-03-18 14:00:49 +00:00
hobokenchicken	275ce34d05	fix(openai): fix missing tools and instructions in Responses API CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - Add 'tools' and 'tool_choice' parameters to streaming Responses API - Include 'name' field in message items for Responses API input - Use string content for text-only messages to improve instruction following - Fix subagents not triggering and files not being created	2026-03-18 13:51:36 +00:00
hobokenchicken	cb5b921550	feat(openai): implement tool support for gpt-5.4 via Responses API CI / Check (push) Has been cancelled Details CI / Clippy (push) Has been cancelled Details CI / Formatting (push) Has been cancelled Details CI / Test (push) Has been cancelled Details CI / Release Build (push) Has been cancelled Details - Implement polymorphic 'input' structure for /responses endpoint - Map 'tool' role to 'function_call_output' items - Handle assistant 'tool_calls' as separate 'function_call' items - Add synchronous and streaming parsers for function_call items - Fix 400 Bad Request 'Invalid value: tool' error	2026-03-18 13:14:51 +00:00