Concurrent Processing
Learn how to use Umwelten's concurrent processing capabilities to dramatically speed up your evaluations and batch operations. Concurrent processing allows you to run multiple model evaluations simultaneously, making it ideal for large-scale testing and analysis.
Overview
Concurrent processing in Umwelten enables you to:
- Speed up evaluations: Run multiple models simultaneously instead of sequentially
- Process large datasets: Handle hundreds of files efficiently
- Optimize resource usage: Make better use of available computing resources
- Scale operations: Handle production workloads with predictable performance
Basic Concurrent Processing
The --concurrent
Flag
Enable concurrent processing for any evaluation:
# Basic concurrent evaluation
umwelten eval run \
--prompt "Explain quantum computing" \
--models "ollama:gemma3:12b,ollama:codestral:latest,google:gemini-2.0-flash" \
--id "quantum-comparison" \
--concurrent
Concurrent Batch Processing
Process multiple files concurrently:
# Process images concurrently
umwelten eval batch \
--prompt "Analyze this image and describe key features" \
--models "google:gemini-2.0-flash,ollama:qwen2.5vl:latest" \
--id "image-analysis" \
--directory "input/images" \
--file-pattern "*.jpg" \
--concurrent
Performance Optimization
Concurrency Control
Control the level of concurrency to optimize performance:
# Set maximum concurrency (default: 5)
umwelten eval batch \
--prompt "Process this file" \
--models "ollama:gemma3:12b,google:gemini-2.0-flash" \
--id "batch-test" \
--directory "input/files" \
--file-pattern "*.txt" \
--concurrent \
--max-concurrency 10
Provider-Specific Considerations
Different providers have different rate limits and capabilities:
Google Models
# Google models handle high concurrency well
umwelten eval run \
--prompt "Your prompt" \
--models "google:gemini-2.0-flash,google:gemini-2.5-pro-exp-03-25" \
--id "google-comparison" \
--concurrent \
--max-concurrency 20
OpenRouter Models
# OpenRouter has rate limits - use moderate concurrency
umwelten eval run \
--prompt "Your prompt" \
--models "openai/gpt-4o,anthropic/claude-3.7-sonnet" \
--id "openrouter-comparison" \
--concurrent \
--max-concurrency 5
Ollama Models (Local)
# Local models - concurrency limited by hardware
umwelten eval run \
--prompt "Your prompt" \
--models "ollama:gemma3:12b,ollama:codestral:latest" \
--id "ollama-comparison" \
--concurrent \
--max-concurrency 3
Advanced Concurrent Patterns
Multi-Model Evaluation
Compare multiple models on the same prompt:
# Evaluate 5 models concurrently
umwelten eval run \
--prompt "Write a function to calculate fibonacci numbers in Python" \
--models "ollama:gemma3:12b,ollama:codestral:latest,google:gemini-2.0-flash,openai/gpt-4o,anthropic/claude-3.7-sonnet" \
--id "fibonacci-comparison" \
--concurrent \
--max-concurrency 5
Multi-File Processing
Process large datasets efficiently:
# Process 100+ images concurrently
umwelten eval batch \
--prompt "Analyze this image and extract: objects, colors, text, people" \
--models "google:gemini-2.0-flash,ollama:qwen2.5vl:latest" \
--id "large-image-dataset" \
--directory "input/dataset" \
--file-pattern "*.jpg" \
--concurrent \
--max-concurrency 10 \
--max-files 1000
Mixed Provider Evaluation
Combine local and cloud models:
# Mix local and cloud models for cost optimization
umwelten eval run \
--prompt "Explain machine learning concepts" \
--models "ollama:gemma3:12b,ollama:codestral:latest,google:gemini-2.0-flash" \
--id "mixed-provider-eval" \
--concurrent \
--max-concurrency 5
Performance Monitoring
Progress Tracking
Monitor concurrent operations in real-time:
# Enable verbose output for progress tracking
umwelten eval batch \
--prompt "Process this file" \
--models "google:gemini-2.0-flash" \
--id "progress-test" \
--directory "input/files" \
--file-pattern "*.txt" \
--concurrent \
--verbose
Performance Metrics
Track performance improvements:
# Compare sequential vs concurrent performance
echo "Sequential processing:"
time umwelten eval batch \
--prompt "Process this file" \
--models "google:gemini-2.0-flash" \
--id "sequential-test" \
--directory "input/files" \
--file-pattern "*.txt"
echo "Concurrent processing:"
time umwelten eval batch \
--prompt "Process this file" \
--models "google:gemini-2.0-flash" \
--id "concurrent-test" \
--directory "input/files" \
--file-pattern "*.txt" \
--concurrent \
--max-concurrency 5
Best Practices
1. Start Conservative
Begin with lower concurrency and increase gradually:
# Start with 2-3 concurrent operations
umwelten eval run \
--prompt "Your prompt" \
--models "model1,model2,model3" \
--id "conservative-test" \
--concurrent \
--max-concurrency 2
# Increase if performance is good
umwelten eval run \
--prompt "Your prompt" \
--models "model1,model2,model3" \
--id "optimized-test" \
--concurrent \
--max-concurrency 5
2. Consider Provider Limits
Respect provider-specific rate limits:
# Google: High concurrency (20+)
umwelten eval run \
--models "google:gemini-2.0-flash,google:gemini-2.5-pro-exp-03-25" \
--concurrent \
--max-concurrency 20
# OpenRouter: Moderate concurrency (5-10)
umwelten eval run \
--models "openai/gpt-4o,anthropic/claude-3.7-sonnet" \
--concurrent \
--max-concurrency 5
# Ollama: Limited by hardware (2-5)
umwelten eval run \
--models "ollama:gemma3:12b,ollama:codestral:latest" \
--concurrent \
--max-concurrency 3
3. Monitor Resource Usage
Keep an eye on system resources:
# Monitor CPU and memory usage during concurrent operations
htop # or top
# Check for memory leaks or excessive CPU usage
umwelten eval batch \
--prompt "Process this file" \
--models "google:gemini-2.0-flash" \
--id "resource-test" \
--directory "input/files" \
--file-pattern "*.txt" \
--concurrent \
--max-concurrency 10
4. Handle Failures Gracefully
Concurrent operations can fail - handle them properly:
# Use resume capability for large batch operations
umwelten eval batch \
--prompt "Process this file" \
--models "google:gemini-2.0-flash" \
--id "resume-test" \
--directory "input/files" \
--file-pattern "*.txt" \
--concurrent \
--max-concurrency 5 \
--resume # Skip completed files if interrupted
Error Handling
Rate Limiting
Handle provider rate limits:
# If you hit rate limits, reduce concurrency
umwelten eval run \
--prompt "Your prompt" \
--models "openai/gpt-4o,anthropic/claude-3.7-sonnet" \
--id "rate-limit-test" \
--concurrent \
--max-concurrency 2 # Reduced from 5
Network Issues
Handle network connectivity problems:
# Use retry logic for network issues
umwelten eval run \
--prompt "Your prompt" \
--models "google:gemini-2.0-flash" \
--id "network-test" \
--concurrent \
--max-concurrency 3 \
--verbose # See detailed error messages
Memory Management
Handle memory constraints:
# Reduce concurrency if memory usage is high
umwelten eval batch \
--prompt "Process this file" \
--models "ollama:gemma3:12b" \
--id "memory-test" \
--directory "input/files" \
--file-pattern "*.txt" \
--concurrent \
--max-concurrency 1 # Reduced for memory constraints
Performance Benchmarks
Typical Performance Improvements
Operation Type | Sequential Time | Concurrent Time | Improvement |
---|---|---|---|
5 models, 1 prompt | 25 seconds | 8 seconds | 3.1x faster |
1 model, 100 files | 300 seconds | 60 seconds | 5x faster |
3 models, 50 files | 450 seconds | 90 seconds | 5x faster |
Optimal Concurrency Settings
Provider | Recommended Max Concurrency | Notes |
---|---|---|
20-30 | High rate limits, good performance | |
OpenRouter | 5-10 | Moderate rate limits, cost considerations |
Ollama (Local) | 2-5 | Limited by hardware resources |
LM Studio (Local) | 2-3 | Limited by local server capacity |
Advanced Use Cases
Pipeline Processing
Chain multiple concurrent operations:
# Step 1: Process images concurrently
umwelten eval batch \
--prompt "Extract text from this image" \
--models "google:gemini-2.0-flash" \
--id "text-extraction" \
--directory "input/images" \
--file-pattern "*.jpg" \
--concurrent \
--max-concurrency 10
# Step 2: Process extracted text concurrently
umwelten eval batch \
--prompt "Analyze this extracted text" \
--models "ollama:gemma3:12b,google:gemini-2.0-flash" \
--id "text-analysis" \
--directory "output/text-extraction" \
--file-pattern "*.txt" \
--concurrent \
--max-concurrency 5
Load Balancing
Distribute load across different providers:
# Balance load between local and cloud models
umwelten eval run \
--prompt "Your prompt" \
--models "ollama:gemma3:12b,ollama:codestral:latest,google:gemini-2.0-flash" \
--id "load-balanced" \
--concurrent \
--max-concurrency 6 # 2 per provider
Troubleshooting
Common Issues
- "Rate limit exceeded": Reduce
--max-concurrency
- "Memory usage too high": Reduce concurrency or use smaller models
- "Network timeouts": Check connectivity and reduce concurrency
- "Provider unavailable": Check provider status and API keys
Debugging Concurrent Operations
# Enable verbose output for debugging
umwelten eval run \
--prompt "Your prompt" \
--models "model1,model2,model3" \
--id "debug-test" \
--concurrent \
--max-concurrency 3 \
--verbose \
--debug
Next Steps
Now that you understand concurrent processing, explore:
- 📊 Reports & Analysis - Generate comprehensive reports from concurrent evaluations
- 💰 Cost Analysis - Optimize costs with concurrent processing
- 🔧 Memory & Tools - Combine concurrent processing with memory and tools
- 📈 Batch Processing - Scale concurrent operations to large datasets