Cost Optimization Examples

This example demonstrates how to use Umwelten for cost-effective AI model evaluation and optimization. These examples correspond to the migrated google-pricing.ts script and show how to compare costs, optimize spending, and make informed decisions about model selection.

Basic Cost Comparison

Google Models Pricing Analysis (google-pricing.ts equivalent)

Test Google models for pricing and cost analysis:

bash

umwelten eval run \
  --prompt "Write a detailed analysis of machine learning trends in 2024, including key developments in LLMs, computer vision, and AI safety. Include specific examples and future predictions." \
  --models "google:gemini-2.0-flash,google:gemini-2.5-pro-exp-03-25,google:gemini-1.5-flash-8b" \
  --id "google-pricing-comparison" \
  --temperature 0.3 \
  --concurrent

Cross-Provider Cost Comparison

Compare costs across different providers for the same task:

bash

umwelten eval run \
  --prompt "Analyze the competitive landscape for cloud computing services, focusing on AWS, Google Cloud, and Microsoft Azure. Include market share, pricing strategies, and key differentiators." \
  --models "google:gemini-2.0-flash,openrouter:openai/gpt-4o-mini,openrouter:openai/gpt-4o,ollama:gemma3:12b" \
  --id "provider-cost-comparison" \
  --concurrent \
  --max-concurrency 3

Generate Cost Analysis Report

bash

# Generate detailed cost report
umwelten eval report --id google-pricing-comparison --format markdown

# Export cost data for further analysis  
umwelten eval report --id provider-cost-comparison --format csv --output cost-analysis.csv

Cost Optimization Strategies

Model Tier Comparison

Compare different model tiers for the same task:

bash

# Premium tier
umwelten eval run \
  --prompt "Create a comprehensive business plan for a sustainable energy startup" \
  --models "google:gemini-2.5-pro-exp-03-25,openrouter:openai/gpt-4o" \
  --id "business-plan-premium" \
  --system "You are a business consultant with expertise in sustainable energy and startup development"

# Standard tier
umwelten eval run \
  --prompt "Create a comprehensive business plan for a sustainable energy startup" \
  --models "google:gemini-2.0-flash,openrouter:openai/gpt-4o-mini" \
  --id "business-plan-standard" \
  --system "You are a business consultant with expertise in sustainable energy and startup development"

# Budget tier (free)
umwelten eval run \
  --prompt "Create a comprehensive business plan for a sustainable energy startup" \
  --models "ollama:gemma3:27b,ollama:llama3.2:latest" \
  --id "business-plan-budget" \
  --system "You are a business consultant with expertise in sustainable energy and startup development"

Prompt Length Optimization

Test how prompt length affects costs:

bash

# Detailed prompt (higher cost)
umwelten eval run \
  --prompt "You are a financial analyst. Please provide a comprehensive analysis of Tesla's stock performance over the past 5 years, including quarterly earnings, market trends, competitive positioning, regulatory environment, technological developments, executive leadership changes, and forward-looking projections. Include specific metrics, comparisons to competitors, and risk assessments." \
  --models "google:gemini-2.0-flash,openrouter:openai/gpt-4o-mini" \
  --id "detailed-prompt-cost"

# Concise prompt (lower cost)
umwelten eval run \
  --prompt "Analyze Tesla's 5-year stock performance including earnings, market trends, competition, and future outlook." \
  --models "google:gemini-2.0-flash,openrouter:openai/gpt-4o-mini" \
  --id "concise-prompt-cost"

Batch vs Individual Processing

Compare costs of batch vs individual processing:

bash

# Individual processing (potentially higher cost due to overhead)
umwelten eval run \
  --prompt "Summarize this document in 100 words" \
  --models "google:gemini-2.0-flash" \
  --id "individual-doc-1" \
  --attach "./docs/doc1.pdf"

umwelten eval run \
  --prompt "Summarize this document in 100 words" \
  --models "google:gemini-2.0-flash" \
  --id "individual-doc-2" \
  --attach "./docs/doc2.pdf"

# Batch processing (more efficient)
umwelten eval batch \
  --prompt "Summarize this document in 100 words" \
  --models "google:gemini-2.0-flash" \
  --id "batch-docs" \
  --directory "./docs" \
  --file-pattern "*.pdf" \
  --concurrent

Advanced Cost Optimization

Response Length Impact

Test how response length requirements affect costs:

bash

# Short response
umwelten eval run \
  --prompt "Explain quantum computing in 50 words or less" \
  --models "google:gemini-2.0-flash,openrouter:openai/gpt-4o-mini,ollama:gemma3:12b" \
  --id "short-response-cost" \
  --concurrent

# Medium response  
umwelten eval run \
  --prompt "Explain quantum computing in approximately 200 words" \
  --models "google:gemini-2.0-flash,openrouter:openai/gpt-4o-mini,ollama:gemma3:12b" \
  --id "medium-response-cost" \
  --concurrent

# Long response
umwelten eval run \
  --prompt "Write a comprehensive explanation of quantum computing (800-1000 words)" \
  --models "google:gemini-2.0-flash,openrouter:openai/gpt-4o-mini,ollama:gemma3:12b" \
  --id "long-response-cost" \
  --concurrent

Context Window Optimization

Optimize for different context window needs:

bash

# Large context (higher cost)
umwelten eval run \
  --prompt "Analyze this entire document and provide insights" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --id "large-context-test" \
  --attach "./large-document.pdf"

# Chunked processing (potentially lower cost)
umwelten eval run \
  --prompt "Analyze the executive summary and key findings sections" \
  --models "google:gemini-2.0-flash" \
  --id "chunked-processing-test" \
  --attach "./large-document.pdf"

Temperature and Quality Trade-offs

Balance cost with output quality using temperature:

bash

# High quality, deterministic (may use more tokens for consistency)
umwelten eval run \
  --prompt "Write a professional email responding to a customer complaint" \
  --models "google:gemini-2.0-flash,openrouter:openai/gpt-4o-mini" \
  --id "high-quality-email" \
  --temperature 0.1 \
  --concurrent

# Balanced creativity and cost
umwelten eval run \
  --prompt "Write a professional email responding to a customer complaint" \
  --models "google:gemini-2.0-flash,openrouter:openai/gpt-4o-mini" \
  --id "balanced-email" \
  --temperature 0.7 \
  --concurrent

Expected Cost Analysis Results

Sample Cost Comparison Report

markdown

# Cost Optimization Report: provider-cost-comparison

**Generated:** 2025-01-27T23:00:00.000Z  
**Task:** Cloud Computing Analysis (1000+ word responses)
**Total Models:** 4

## Cost Breakdown

| Model | Provider | Input Tokens | Output Tokens | Total Cost | Cost/1K Out | Response Quality |
|-------|----------|-------------|---------------|------------|-------------|------------------|
| gemini-2.0-flash | google | 45 | 1,240 | $0.000405 | $0.327 | Excellent |
| gpt-4o-mini | openrouter | 45 | 1,180 | $0.000713 | $0.604 | Very Good |
| gpt-4o | openrouter | 45 | 1,320 | $0.013245 | $10.034 | Outstanding |
| gemma3:12b | ollama | 45 | 1,200 | $0.000000 | $0.000 | Good |

## Cost Efficiency Analysis

### Best Value Models
1. **Ollama Gemma3:12b** - Free local processing, good quality
2. **Google Gemini 2.0 Flash** - Excellent quality at low cost ($0.0004/response)
3. **OpenRouter GPT-4o-mini** - Premium quality at reasonable cost ($0.0007/response)

### Cost vs Quality Trade-offs
- **Gemini 2.0 Flash**: 18x cheaper than GPT-4o with 90% of the quality
- **GPT-4o-mini**: 19x cheaper than GPT-4o with 85% of the quality  
- **Ollama**: Free but requires local hardware and setup

### Scaling Projections
- **1,000 evaluations**: Gemini ($0.41) vs GPT-4o-mini ($0.71) vs GPT-4o ($13.25)
- **10,000 evaluations**: Gemini ($4.10) vs GPT-4o-mini ($7.10) vs GPT-4o ($132.50)
- **100,000 evaluations**: Gemini ($41) vs GPT-4o-mini ($71) vs GPT-4o ($1,325)

## Optimization Recommendations

### For Budget-Conscious Projects
- Use Ollama models for experimentation and development
- Switch to Gemini 2.0 Flash for production workloads
- Reserve premium models for critical tasks only

### For Quality-Critical Projects  
- Use GPT-4o for highest stakes content
- Use Gemini 2.5 Pro for complex analysis tasks
- Use multiple models and select best outputs

### For High-Volume Processing
- Implement model cascading (cheap model first, expensive model for edge cases)
- Use concurrent processing to reduce wall-clock time
- Consider batch processing for efficiency gains

Detailed Cost Metrics

json

{
  "cost_analysis": {
    "models": [
      {
        "name": "gemini-2.0-flash",
        "provider": "google",
        "total_cost": 0.000405,
        "cost_per_1k_input": 0.075,
        "cost_per_1k_output": 0.300,
        "avg_response_time_ms": 3200,
        "cost_per_second": 0.000127
      },
      {
        "name": "gpt-4o-mini", 
        "provider": "openrouter",
        "total_cost": 0.000713,
        "cost_per_1k_input": 0.150,
        "cost_per_1k_output": 0.600,
        "avg_response_time_ms": 4100,
        "cost_per_second": 0.000174
      }
    ],
    "recommendations": {
      "most_cost_effective": "gemini-2.0-flash",
      "best_quality_per_dollar": "gemini-2.0-flash", 
      "fastest_response": "gemini-2.0-flash",
      "free_alternative": "ollama:gemma3:12b"
    }
  }
}

Cost Optimization Strategies

1. Model Selection Framework

bash

# Tier 1: Experimentation and Development (Free)
umwelten eval run \
  --models "ollama:gemma3:12b,ollama:llama3.2:latest" \
  --id "dev-testing" \
  --prompt "Test prompt for development"

# Tier 2: Production Workloads (Low Cost)  
umwelten eval run \
  --models "google:gemini-2.0-flash,google:gemini-1.5-flash-8b" \
  --id "production-standard" \
  --prompt "Production prompt"

# Tier 3: Premium Quality (High Cost)
umwelten eval run \
  --models "google:gemini-2.5-pro-exp-03-25,openrouter:openai/gpt-4o" \
  --id "premium-quality" \
  --prompt "Critical quality prompt"

2. Prompt Engineering for Cost

bash

# Cost-inefficient prompt
umwelten eval run \
  --prompt "Please provide a very detailed, comprehensive, extensive, and thorough analysis with multiple examples, extensive background information, detailed explanations, and comprehensive coverage of all aspects..." \
  --models "google:gemini-2.0-flash" \
  --id "inefficient-prompt"

# Cost-efficient prompt
umwelten eval run \
  --prompt "Analyze X focusing on key points A, B, C. Include 2-3 specific examples. Target length: 300 words." \
  --models "google:gemini-2.0-flash" \
  --id "efficient-prompt"

3. Batch Processing for Scale

bash

# Process 100 documents efficiently
umwelten eval batch \
  --prompt "Summarize key points in 100 words" \
  --models "google:gemini-2.0-flash" \
  --id "cost-efficient-batch" \
  --directory "./documents" \
  --file-pattern "*.pdf" \
  --concurrent \
  --max-concurrency 5

4. Quality Validation Strategies

bash

# Use cheap model for first pass
umwelten eval run \
  --prompt "Quick analysis of this document" \
  --models "google:gemini-2.0-flash" \
  --id "first-pass"

# Use expensive model only for validation/refinement
umwelten eval run \
  --prompt "Detailed analysis building on initial findings" \
  --models "openrouter:openai/gpt-4o" \
  --id "validation-pass"

Real-World Cost Scenarios

Scenario 1: Content Creation (1000 articles/month)

bash

# Option A: Premium models ($150/month)
umwelten eval batch \
  --models "openrouter:openai/gpt-4o" \
  --prompt "Write a 500-word article about [topic]" \
  --id "premium-content"

# Option B: Balanced approach ($15/month)  
umwelten eval batch \
  --models "google:gemini-2.0-flash" \
  --prompt "Write a 500-word article about [topic]" \
  --id "balanced-content"

# Option C: Free local processing ($0/month + hardware)
umwelten eval batch \
  --models "ollama:gemma3:27b" \
  --prompt "Write a 500-word article about [topic]" \
  --id "free-content"

Scenario 2: Document Analysis (10,000 PDFs/month)

bash

# High-volume processing optimization
umwelten eval batch \
  --prompt "Extract key information: title, date, summary" \
  --models "google:gemini-2.0-flash" \
  --schema "title, date, summary, confidence int" \
  --directory "./monthly-documents" \
  --file-pattern "*.pdf" \
  --concurrent \
  --max-concurrency 8 \
  --id "monthly-document-processing"

Scenario 3: Research Analysis (100 papers/week)

bash

# Mixed approach: cheap for screening, expensive for deep analysis
umwelten eval batch \
  --prompt "Quick relevance assessment (relevant/not relevant)" \
  --models "google:gemini-2.0-flash" \
  --id "paper-screening" \
  --directory "./papers" \
  --file-pattern "*.pdf"

# Follow-up with detailed analysis for relevant papers only
umwelten eval run \
  --prompt "Detailed analysis of methodology, findings, and significance" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --id "deep-paper-analysis" \
  --attach "./relevant-papers/*.pdf"

Monitoring and Optimization Tools

Cost Tracking Commands

bash

# List all evaluations with cost information
umwelten eval list --details

# Export cost data for analysis
umwelten eval report --id all --format csv --output monthly-costs.csv

# Compare costs across time periods
umwelten eval report --id january-batch --format json > jan-costs.json
umwelten eval report --id february-batch --format json > feb-costs.json

Budget Management

bash

# Set evaluation limits for cost control
umwelten eval batch \
  --file-limit 100 \
  --timeout 30000 \
  --models "google:gemini-2.0-flash" \
  --id "budget-controlled-batch"

Tips for Cost Optimization

Model Selection Guidelines

Development: Use Ollama models (free)
Production: Start with Gemini 2.0 Flash
Premium: Use GPT-4o only when quality is critical
Validation: Use multiple cheap models instead of one expensive model

Prompt Engineering

Be specific about desired output length
Avoid redundant instructions and examples
Use structured output to reduce post-processing
Test with cheap models first

Processing Optimization

Use batch processing for similar tasks
Enable concurrent processing for speed
Set appropriate timeouts to avoid waste
Use resume capability for interrupted jobs

Quality vs Cost Balance

Define quality thresholds for different use cases
Use cheap models for screening, expensive for final analysis
Implement human review for critical decisions
Track quality metrics alongside cost metrics

Next Steps

Explore batch processing for large-scale cost optimization
See structured output for reducing post-processing costs
Try model evaluation for systematic quality comparison
Review analysis & reasoning for complex task optimization

Cost Optimization Examples ​

Basic Cost Comparison ​

Google Models Pricing Analysis (google-pricing.ts equivalent) ​

Cross-Provider Cost Comparison ​

Generate Cost Analysis Report ​

Cost Optimization Strategies ​

Model Tier Comparison ​

Prompt Length Optimization ​

Batch vs Individual Processing ​

Advanced Cost Optimization ​

Response Length Impact ​

Context Window Optimization ​

Temperature and Quality Trade-offs ​

Expected Cost Analysis Results ​

Sample Cost Comparison Report ​

Detailed Cost Metrics ​

Cost Optimization Strategies ​

1. Model Selection Framework ​

2. Prompt Engineering for Cost ​

3. Batch Processing for Scale ​

4. Quality Validation Strategies ​

Real-World Cost Scenarios ​

Scenario 1: Content Creation (1000 articles/month) ​

Scenario 2: Document Analysis (10,000 PDFs/month) ​

Scenario 3: Research Analysis (100 papers/week) ​

Monitoring and Optimization Tools ​

Cost Tracking Commands ​

Budget Management ​

Tips for Cost Optimization ​

Model Selection Guidelines ​

Prompt Engineering ​

Processing Optimization ​

Quality vs Cost Balance ​

Next Steps ​

Cost Optimization Examples

Basic Cost Comparison

Google Models Pricing Analysis (google-pricing.ts equivalent)

Cross-Provider Cost Comparison

Generate Cost Analysis Report

Cost Optimization Strategies

Model Tier Comparison

Prompt Length Optimization

Batch vs Individual Processing

Advanced Cost Optimization

Response Length Impact

Context Window Optimization

Temperature and Quality Trade-offs

Expected Cost Analysis Results

Sample Cost Comparison Report

Detailed Cost Metrics

Cost Optimization Strategies

1. Model Selection Framework

2. Prompt Engineering for Cost

3. Batch Processing for Scale

4. Quality Validation Strategies

Real-World Cost Scenarios

Scenario 1: Content Creation (1000 articles/month)

Scenario 2: Document Analysis (10,000 PDFs/month)

Scenario 3: Research Analysis (100 papers/week)

Monitoring and Optimization Tools

Cost Tracking Commands

Budget Management

Tips for Cost Optimization

Model Selection Guidelines

Prompt Engineering

Processing Optimization

Quality vs Cost Balance

Next Steps