Skip to content

Concurrent Processing

Learn how to use Umwelten's concurrent processing capabilities to dramatically speed up your evaluations and batch operations. Concurrent processing allows you to run multiple model evaluations simultaneously, making it ideal for large-scale testing and analysis.

Overview

Concurrent processing in Umwelten enables you to:

  • Speed up evaluations: Run multiple models simultaneously instead of sequentially
  • Process large datasets: Handle hundreds of files efficiently
  • Optimize resource usage: Make better use of available computing resources
  • Scale operations: Handle production workloads with predictable performance

Basic Concurrent Processing

The --concurrent Flag

Enable concurrent processing for any evaluation:

bash
# Basic concurrent evaluation
umwelten eval run \
  --prompt "Explain quantum computing" \
  --models "ollama:gemma3:12b,ollama:codestral:latest,google:gemini-2.0-flash" \
  --id "quantum-comparison" \
  --concurrent

Concurrent Batch Processing

Process multiple files concurrently:

bash
# Process images concurrently
umwelten eval batch \
  --prompt "Analyze this image and describe key features" \
  --models "google:gemini-2.0-flash,ollama:qwen2.5vl:latest" \
  --id "image-analysis" \
  --directory "input/images" \
  --file-pattern "*.jpg" \
  --concurrent

Performance Optimization

Concurrency Control

Control the level of concurrency to optimize performance:

bash
# Set maximum concurrency (default: 5)
umwelten eval batch \
  --prompt "Process this file" \
  --models "ollama:gemma3:12b,google:gemini-2.0-flash" \
  --id "batch-test" \
  --directory "input/files" \
  --file-pattern "*.txt" \
  --concurrent \
  --max-concurrency 10

Provider-Specific Considerations

Different providers have different rate limits and capabilities:

Google Models

bash
# Google models handle high concurrency well
umwelten eval run \
  --prompt "Your prompt" \
  --models "google:gemini-2.0-flash,google:gemini-2.5-pro-exp-03-25" \
  --id "google-comparison" \
  --concurrent \
  --max-concurrency 20

OpenRouter Models

bash
# OpenRouter has rate limits - use moderate concurrency
umwelten eval run \
  --prompt "Your prompt" \
  --models "openai/gpt-4o,anthropic/claude-3.7-sonnet" \
  --id "openrouter-comparison" \
  --concurrent \
  --max-concurrency 5

Ollama Models (Local)

bash
# Local models - concurrency limited by hardware
umwelten eval run \
  --prompt "Your prompt" \
  --models "ollama:gemma3:12b,ollama:codestral:latest" \
  --id "ollama-comparison" \
  --concurrent \
  --max-concurrency 3

Advanced Concurrent Patterns

Multi-Model Evaluation

Compare multiple models on the same prompt:

bash
# Evaluate 5 models concurrently
umwelten eval run \
  --prompt "Write a function to calculate fibonacci numbers in Python" \
  --models "ollama:gemma3:12b,ollama:codestral:latest,google:gemini-2.0-flash,openai/gpt-4o,anthropic/claude-3.7-sonnet" \
  --id "fibonacci-comparison" \
  --concurrent \
  --max-concurrency 5

Multi-File Processing

Process large datasets efficiently:

bash
# Process 100+ images concurrently
umwelten eval batch \
  --prompt "Analyze this image and extract: objects, colors, text, people" \
  --models "google:gemini-2.0-flash,ollama:qwen2.5vl:latest" \
  --id "large-image-dataset" \
  --directory "input/dataset" \
  --file-pattern "*.jpg" \
  --concurrent \
  --max-concurrency 10 \
  --max-files 1000

Mixed Provider Evaluation

Combine local and cloud models:

bash
# Mix local and cloud models for cost optimization
umwelten eval run \
  --prompt "Explain machine learning concepts" \
  --models "ollama:gemma3:12b,ollama:codestral:latest,google:gemini-2.0-flash" \
  --id "mixed-provider-eval" \
  --concurrent \
  --max-concurrency 5

Performance Monitoring

Progress Tracking

Monitor concurrent operations in real-time:

bash
# Enable verbose output for progress tracking
umwelten eval batch \
  --prompt "Process this file" \
  --models "google:gemini-2.0-flash" \
  --id "progress-test" \
  --directory "input/files" \
  --file-pattern "*.txt" \
  --concurrent \
  --verbose

Performance Metrics

Track performance improvements:

bash
# Compare sequential vs concurrent performance
echo "Sequential processing:"
time umwelten eval batch \
  --prompt "Process this file" \
  --models "google:gemini-2.0-flash" \
  --id "sequential-test" \
  --directory "input/files" \
  --file-pattern "*.txt"

echo "Concurrent processing:"
time umwelten eval batch \
  --prompt "Process this file" \
  --models "google:gemini-2.0-flash" \
  --id "concurrent-test" \
  --directory "input/files" \
  --file-pattern "*.txt" \
  --concurrent \
  --max-concurrency 5

Best Practices

1. Start Conservative

Begin with lower concurrency and increase gradually:

bash
# Start with 2-3 concurrent operations
umwelten eval run \
  --prompt "Your prompt" \
  --models "model1,model2,model3" \
  --id "conservative-test" \
  --concurrent \
  --max-concurrency 2

# Increase if performance is good
umwelten eval run \
  --prompt "Your prompt" \
  --models "model1,model2,model3" \
  --id "optimized-test" \
  --concurrent \
  --max-concurrency 5

2. Consider Provider Limits

Respect provider-specific rate limits:

bash
# Google: High concurrency (20+)
umwelten eval run \
  --models "google:gemini-2.0-flash,google:gemini-2.5-pro-exp-03-25" \
  --concurrent \
  --max-concurrency 20

# OpenRouter: Moderate concurrency (5-10)
umwelten eval run \
  --models "openai/gpt-4o,anthropic/claude-3.7-sonnet" \
  --concurrent \
  --max-concurrency 5

# Ollama: Limited by hardware (2-5)
umwelten eval run \
  --models "ollama:gemma3:12b,ollama:codestral:latest" \
  --concurrent \
  --max-concurrency 3

3. Monitor Resource Usage

Keep an eye on system resources:

bash
# Monitor CPU and memory usage during concurrent operations
htop  # or top

# Check for memory leaks or excessive CPU usage
umwelten eval batch \
  --prompt "Process this file" \
  --models "google:gemini-2.0-flash" \
  --id "resource-test" \
  --directory "input/files" \
  --file-pattern "*.txt" \
  --concurrent \
  --max-concurrency 10

4. Handle Failures Gracefully

Concurrent operations can fail - handle them properly:

bash
# Use resume capability for large batch operations
umwelten eval batch \
  --prompt "Process this file" \
  --models "google:gemini-2.0-flash" \
  --id "resume-test" \
  --directory "input/files" \
  --file-pattern "*.txt" \
  --concurrent \
  --max-concurrency 5 \
  --resume  # Skip completed files if interrupted

Error Handling

Rate Limiting

Handle provider rate limits:

bash
# If you hit rate limits, reduce concurrency
umwelten eval run \
  --prompt "Your prompt" \
  --models "openai/gpt-4o,anthropic/claude-3.7-sonnet" \
  --id "rate-limit-test" \
  --concurrent \
  --max-concurrency 2  # Reduced from 5

Network Issues

Handle network connectivity problems:

bash
# Use retry logic for network issues
umwelten eval run \
  --prompt "Your prompt" \
  --models "google:gemini-2.0-flash" \
  --id "network-test" \
  --concurrent \
  --max-concurrency 3 \
  --verbose  # See detailed error messages

Memory Management

Handle memory constraints:

bash
# Reduce concurrency if memory usage is high
umwelten eval batch \
  --prompt "Process this file" \
  --models "ollama:gemma3:12b" \
  --id "memory-test" \
  --directory "input/files" \
  --file-pattern "*.txt" \
  --concurrent \
  --max-concurrency 1  # Reduced for memory constraints

Performance Benchmarks

Typical Performance Improvements

Operation TypeSequential TimeConcurrent TimeImprovement
5 models, 1 prompt25 seconds8 seconds3.1x faster
1 model, 100 files300 seconds60 seconds5x faster
3 models, 50 files450 seconds90 seconds5x faster

Optimal Concurrency Settings

ProviderRecommended Max ConcurrencyNotes
Google20-30High rate limits, good performance
OpenRouter5-10Moderate rate limits, cost considerations
Ollama (Local)2-5Limited by hardware resources
LM Studio (Local)2-3Limited by local server capacity

Advanced Use Cases

Pipeline Processing

Chain multiple concurrent operations:

bash
# Step 1: Process images concurrently
umwelten eval batch \
  --prompt "Extract text from this image" \
  --models "google:gemini-2.0-flash" \
  --id "text-extraction" \
  --directory "input/images" \
  --file-pattern "*.jpg" \
  --concurrent \
  --max-concurrency 10

# Step 2: Process extracted text concurrently
umwelten eval batch \
  --prompt "Analyze this extracted text" \
  --models "ollama:gemma3:12b,google:gemini-2.0-flash" \
  --id "text-analysis" \
  --directory "output/text-extraction" \
  --file-pattern "*.txt" \
  --concurrent \
  --max-concurrency 5

Load Balancing

Distribute load across different providers:

bash
# Balance load between local and cloud models
umwelten eval run \
  --prompt "Your prompt" \
  --models "ollama:gemma3:12b,ollama:codestral:latest,google:gemini-2.0-flash" \
  --id "load-balanced" \
  --concurrent \
  --max-concurrency 6  # 2 per provider

Troubleshooting

Common Issues

  1. "Rate limit exceeded": Reduce --max-concurrency
  2. "Memory usage too high": Reduce concurrency or use smaller models
  3. "Network timeouts": Check connectivity and reduce concurrency
  4. "Provider unavailable": Check provider status and API keys

Debugging Concurrent Operations

bash
# Enable verbose output for debugging
umwelten eval run \
  --prompt "Your prompt" \
  --models "model1,model2,model3" \
  --id "debug-test" \
  --concurrent \
  --max-concurrency 3 \
  --verbose \
  --debug

Next Steps

Now that you understand concurrent processing, explore:

Released under the MIT License.