Cost Optimization Examples
This example demonstrates how to use Umwelten for cost-effective AI model evaluation and optimization. These examples correspond to the migrated google-pricing.ts
script and show how to compare costs, optimize spending, and make informed decisions about model selection.
Basic Cost Comparison
Google Models Pricing Analysis (google-pricing.ts equivalent)
Test Google models for pricing and cost analysis:
bash
umwelten eval run \
--prompt "Write a detailed analysis of machine learning trends in 2024, including key developments in LLMs, computer vision, and AI safety. Include specific examples and future predictions." \
--models "google:gemini-2.0-flash,google:gemini-2.5-pro-exp-03-25,google:gemini-1.5-flash-8b" \
--id "google-pricing-comparison" \
--temperature 0.3 \
--concurrent
Cross-Provider Cost Comparison
Compare costs across different providers for the same task:
bash
umwelten eval run \
--prompt "Analyze the competitive landscape for cloud computing services, focusing on AWS, Google Cloud, and Microsoft Azure. Include market share, pricing strategies, and key differentiators." \
--models "google:gemini-2.0-flash,openrouter:openai/gpt-4o-mini,openrouter:openai/gpt-4o,ollama:gemma3:12b" \
--id "provider-cost-comparison" \
--concurrent \
--max-concurrency 3
Generate Cost Analysis Report
bash
# Generate detailed cost report
umwelten eval report --id google-pricing-comparison --format markdown
# Export cost data for further analysis
umwelten eval report --id provider-cost-comparison --format csv --output cost-analysis.csv
Cost Optimization Strategies
Model Tier Comparison
Compare different model tiers for the same task:
bash
# Premium tier
umwelten eval run \
--prompt "Create a comprehensive business plan for a sustainable energy startup" \
--models "google:gemini-2.5-pro-exp-03-25,openrouter:openai/gpt-4o" \
--id "business-plan-premium" \
--system "You are a business consultant with expertise in sustainable energy and startup development"
# Standard tier
umwelten eval run \
--prompt "Create a comprehensive business plan for a sustainable energy startup" \
--models "google:gemini-2.0-flash,openrouter:openai/gpt-4o-mini" \
--id "business-plan-standard" \
--system "You are a business consultant with expertise in sustainable energy and startup development"
# Budget tier (free)
umwelten eval run \
--prompt "Create a comprehensive business plan for a sustainable energy startup" \
--models "ollama:gemma3:27b,ollama:llama3.2:latest" \
--id "business-plan-budget" \
--system "You are a business consultant with expertise in sustainable energy and startup development"
Prompt Length Optimization
Test how prompt length affects costs:
bash
# Detailed prompt (higher cost)
umwelten eval run \
--prompt "You are a financial analyst. Please provide a comprehensive analysis of Tesla's stock performance over the past 5 years, including quarterly earnings, market trends, competitive positioning, regulatory environment, technological developments, executive leadership changes, and forward-looking projections. Include specific metrics, comparisons to competitors, and risk assessments." \
--models "google:gemini-2.0-flash,openrouter:openai/gpt-4o-mini" \
--id "detailed-prompt-cost"
# Concise prompt (lower cost)
umwelten eval run \
--prompt "Analyze Tesla's 5-year stock performance including earnings, market trends, competition, and future outlook." \
--models "google:gemini-2.0-flash,openrouter:openai/gpt-4o-mini" \
--id "concise-prompt-cost"
Batch vs Individual Processing
Compare costs of batch vs individual processing:
bash
# Individual processing (potentially higher cost due to overhead)
umwelten eval run \
--prompt "Summarize this document in 100 words" \
--models "google:gemini-2.0-flash" \
--id "individual-doc-1" \
--attach "./docs/doc1.pdf"
umwelten eval run \
--prompt "Summarize this document in 100 words" \
--models "google:gemini-2.0-flash" \
--id "individual-doc-2" \
--attach "./docs/doc2.pdf"
# Batch processing (more efficient)
umwelten eval batch \
--prompt "Summarize this document in 100 words" \
--models "google:gemini-2.0-flash" \
--id "batch-docs" \
--directory "./docs" \
--file-pattern "*.pdf" \
--concurrent
Advanced Cost Optimization
Response Length Impact
Test how response length requirements affect costs:
bash
# Short response
umwelten eval run \
--prompt "Explain quantum computing in 50 words or less" \
--models "google:gemini-2.0-flash,openrouter:openai/gpt-4o-mini,ollama:gemma3:12b" \
--id "short-response-cost" \
--concurrent
# Medium response
umwelten eval run \
--prompt "Explain quantum computing in approximately 200 words" \
--models "google:gemini-2.0-flash,openrouter:openai/gpt-4o-mini,ollama:gemma3:12b" \
--id "medium-response-cost" \
--concurrent
# Long response
umwelten eval run \
--prompt "Write a comprehensive explanation of quantum computing (800-1000 words)" \
--models "google:gemini-2.0-flash,openrouter:openai/gpt-4o-mini,ollama:gemma3:12b" \
--id "long-response-cost" \
--concurrent
Context Window Optimization
Optimize for different context window needs:
bash
# Large context (higher cost)
umwelten eval run \
--prompt "Analyze this entire document and provide insights" \
--models "google:gemini-2.5-pro-exp-03-25" \
--id "large-context-test" \
--attach "./large-document.pdf"
# Chunked processing (potentially lower cost)
umwelten eval run \
--prompt "Analyze the executive summary and key findings sections" \
--models "google:gemini-2.0-flash" \
--id "chunked-processing-test" \
--attach "./large-document.pdf"
Temperature and Quality Trade-offs
Balance cost with output quality using temperature:
bash
# High quality, deterministic (may use more tokens for consistency)
umwelten eval run \
--prompt "Write a professional email responding to a customer complaint" \
--models "google:gemini-2.0-flash,openrouter:openai/gpt-4o-mini" \
--id "high-quality-email" \
--temperature 0.1 \
--concurrent
# Balanced creativity and cost
umwelten eval run \
--prompt "Write a professional email responding to a customer complaint" \
--models "google:gemini-2.0-flash,openrouter:openai/gpt-4o-mini" \
--id "balanced-email" \
--temperature 0.7 \
--concurrent
Expected Cost Analysis Results
Sample Cost Comparison Report
markdown
# Cost Optimization Report: provider-cost-comparison
**Generated:** 2025-01-27T23:00:00.000Z
**Task:** Cloud Computing Analysis (1000+ word responses)
**Total Models:** 4
## Cost Breakdown
| Model | Provider | Input Tokens | Output Tokens | Total Cost | Cost/1K Out | Response Quality |
|-------|----------|-------------|---------------|------------|-------------|------------------|
| gemini-2.0-flash | google | 45 | 1,240 | $0.000405 | $0.327 | Excellent |
| gpt-4o-mini | openrouter | 45 | 1,180 | $0.000713 | $0.604 | Very Good |
| gpt-4o | openrouter | 45 | 1,320 | $0.013245 | $10.034 | Outstanding |
| gemma3:12b | ollama | 45 | 1,200 | $0.000000 | $0.000 | Good |
## Cost Efficiency Analysis
### Best Value Models
1. **Ollama Gemma3:12b** - Free local processing, good quality
2. **Google Gemini 2.0 Flash** - Excellent quality at low cost ($0.0004/response)
3. **OpenRouter GPT-4o-mini** - Premium quality at reasonable cost ($0.0007/response)
### Cost vs Quality Trade-offs
- **Gemini 2.0 Flash**: 18x cheaper than GPT-4o with 90% of the quality
- **GPT-4o-mini**: 19x cheaper than GPT-4o with 85% of the quality
- **Ollama**: Free but requires local hardware and setup
### Scaling Projections
- **1,000 evaluations**: Gemini ($0.41) vs GPT-4o-mini ($0.71) vs GPT-4o ($13.25)
- **10,000 evaluations**: Gemini ($4.10) vs GPT-4o-mini ($7.10) vs GPT-4o ($132.50)
- **100,000 evaluations**: Gemini ($41) vs GPT-4o-mini ($71) vs GPT-4o ($1,325)
## Optimization Recommendations
### For Budget-Conscious Projects
- Use Ollama models for experimentation and development
- Switch to Gemini 2.0 Flash for production workloads
- Reserve premium models for critical tasks only
### For Quality-Critical Projects
- Use GPT-4o for highest stakes content
- Use Gemini 2.5 Pro for complex analysis tasks
- Use multiple models and select best outputs
### For High-Volume Processing
- Implement model cascading (cheap model first, expensive model for edge cases)
- Use concurrent processing to reduce wall-clock time
- Consider batch processing for efficiency gains
Detailed Cost Metrics
json
{
"cost_analysis": {
"models": [
{
"name": "gemini-2.0-flash",
"provider": "google",
"total_cost": 0.000405,
"cost_per_1k_input": 0.075,
"cost_per_1k_output": 0.300,
"avg_response_time_ms": 3200,
"cost_per_second": 0.000127
},
{
"name": "gpt-4o-mini",
"provider": "openrouter",
"total_cost": 0.000713,
"cost_per_1k_input": 0.150,
"cost_per_1k_output": 0.600,
"avg_response_time_ms": 4100,
"cost_per_second": 0.000174
}
],
"recommendations": {
"most_cost_effective": "gemini-2.0-flash",
"best_quality_per_dollar": "gemini-2.0-flash",
"fastest_response": "gemini-2.0-flash",
"free_alternative": "ollama:gemma3:12b"
}
}
}
Cost Optimization Strategies
1. Model Selection Framework
bash
# Tier 1: Experimentation and Development (Free)
umwelten eval run \
--models "ollama:gemma3:12b,ollama:llama3.2:latest" \
--id "dev-testing" \
--prompt "Test prompt for development"
# Tier 2: Production Workloads (Low Cost)
umwelten eval run \
--models "google:gemini-2.0-flash,google:gemini-1.5-flash-8b" \
--id "production-standard" \
--prompt "Production prompt"
# Tier 3: Premium Quality (High Cost)
umwelten eval run \
--models "google:gemini-2.5-pro-exp-03-25,openrouter:openai/gpt-4o" \
--id "premium-quality" \
--prompt "Critical quality prompt"
2. Prompt Engineering for Cost
bash
# Cost-inefficient prompt
umwelten eval run \
--prompt "Please provide a very detailed, comprehensive, extensive, and thorough analysis with multiple examples, extensive background information, detailed explanations, and comprehensive coverage of all aspects..." \
--models "google:gemini-2.0-flash" \
--id "inefficient-prompt"
# Cost-efficient prompt
umwelten eval run \
--prompt "Analyze X focusing on key points A, B, C. Include 2-3 specific examples. Target length: 300 words." \
--models "google:gemini-2.0-flash" \
--id "efficient-prompt"
3. Batch Processing for Scale
bash
# Process 100 documents efficiently
umwelten eval batch \
--prompt "Summarize key points in 100 words" \
--models "google:gemini-2.0-flash" \
--id "cost-efficient-batch" \
--directory "./documents" \
--file-pattern "*.pdf" \
--concurrent \
--max-concurrency 5
4. Quality Validation Strategies
bash
# Use cheap model for first pass
umwelten eval run \
--prompt "Quick analysis of this document" \
--models "google:gemini-2.0-flash" \
--id "first-pass"
# Use expensive model only for validation/refinement
umwelten eval run \
--prompt "Detailed analysis building on initial findings" \
--models "openrouter:openai/gpt-4o" \
--id "validation-pass"
Real-World Cost Scenarios
Scenario 1: Content Creation (1000 articles/month)
bash
# Option A: Premium models ($150/month)
umwelten eval batch \
--models "openrouter:openai/gpt-4o" \
--prompt "Write a 500-word article about [topic]" \
--id "premium-content"
# Option B: Balanced approach ($15/month)
umwelten eval batch \
--models "google:gemini-2.0-flash" \
--prompt "Write a 500-word article about [topic]" \
--id "balanced-content"
# Option C: Free local processing ($0/month + hardware)
umwelten eval batch \
--models "ollama:gemma3:27b" \
--prompt "Write a 500-word article about [topic]" \
--id "free-content"
Scenario 2: Document Analysis (10,000 PDFs/month)
bash
# High-volume processing optimization
umwelten eval batch \
--prompt "Extract key information: title, date, summary" \
--models "google:gemini-2.0-flash" \
--schema "title, date, summary, confidence int" \
--directory "./monthly-documents" \
--file-pattern "*.pdf" \
--concurrent \
--max-concurrency 8 \
--id "monthly-document-processing"
Scenario 3: Research Analysis (100 papers/week)
bash
# Mixed approach: cheap for screening, expensive for deep analysis
umwelten eval batch \
--prompt "Quick relevance assessment (relevant/not relevant)" \
--models "google:gemini-2.0-flash" \
--id "paper-screening" \
--directory "./papers" \
--file-pattern "*.pdf"
# Follow-up with detailed analysis for relevant papers only
umwelten eval run \
--prompt "Detailed analysis of methodology, findings, and significance" \
--models "google:gemini-2.5-pro-exp-03-25" \
--id "deep-paper-analysis" \
--attach "./relevant-papers/*.pdf"
Monitoring and Optimization Tools
Cost Tracking Commands
bash
# List all evaluations with cost information
umwelten eval list --details
# Export cost data for analysis
umwelten eval report --id all --format csv --output monthly-costs.csv
# Compare costs across time periods
umwelten eval report --id january-batch --format json > jan-costs.json
umwelten eval report --id february-batch --format json > feb-costs.json
Budget Management
bash
# Set evaluation limits for cost control
umwelten eval batch \
--file-limit 100 \
--timeout 30000 \
--models "google:gemini-2.0-flash" \
--id "budget-controlled-batch"
Tips for Cost Optimization
Model Selection Guidelines
- Development: Use Ollama models (free)
- Production: Start with Gemini 2.0 Flash
- Premium: Use GPT-4o only when quality is critical
- Validation: Use multiple cheap models instead of one expensive model
Prompt Engineering
- Be specific about desired output length
- Avoid redundant instructions and examples
- Use structured output to reduce post-processing
- Test with cheap models first
Processing Optimization
- Use batch processing for similar tasks
- Enable concurrent processing for speed
- Set appropriate timeouts to avoid waste
- Use resume capability for interrupted jobs
Quality vs Cost Balance
- Define quality thresholds for different use cases
- Use cheap models for screening, expensive for final analysis
- Implement human review for critical decisions
- Track quality metrics alongside cost metrics
Next Steps
- Explore batch processing for large-scale cost optimization
- See structured output for reducing post-processing costs
- Try model evaluation for systematic quality comparison
- Review analysis & reasoning for complex task optimization