Batch Processing
Learn how to process multiple files efficiently using Umwelten's batch processing capabilities with concurrent execution and intelligent error handling.
Overview
Batch processing allows you to evaluate multiple files with the same prompt across multiple models concurrently. This is ideal for processing document libraries, image collections, or any set of files that need consistent analysis.
Basic Batch Processing
Simple File Processing
Process all files in a directory:
umwelten eval batch \
--prompt "Analyze this document and provide a summary" \
--models "google:gemini-2.0-flash,ollama:gemma3:12b" \
--id "document-analysis" \
--directory "./documents" \
--file-pattern "*.pdf" \
--concurrent
File Pattern Matching
Use glob patterns to target specific files:
# Process only JPEG images
umwelten eval batch \
--prompt "Describe this image in detail" \
--models "google:gemini-2.0-flash,ollama:qwen2.5vl:latest" \
--id "image-descriptions" \
--directory "./photos" \
--file-pattern "*.{jpg,jpeg}" \
--concurrent
# Process files with specific naming patterns
umwelten eval batch \
--prompt "Analyze this report" \
--models "google:gemini-2.0-flash" \
--id "monthly-reports" \
--directory "./reports" \
--file-pattern "report_2024_*.pdf"
Recursive Directory Processing
Process files in subdirectories:
umwelten eval batch \
--prompt "Categorize this document by type and content" \
--models "google:gemini-2.0-flash" \
--id "document-categorization" \
--directory "./document-library" \
--file-pattern "**/*.{pdf,docx,txt}" \
--concurrent
Advanced Batch Options
Concurrency Control
Optimize processing speed with concurrency settings:
# High concurrency for fast processing
umwelten eval batch \
--prompt "Extract key information from this file" \
--models "google:gemini-2.0-flash" \
--id "high-speed-processing" \
--directory "./files" \
--file-pattern "*.pdf" \
--concurrent \
--max-concurrency 8
# Conservative concurrency to avoid rate limits
umwelten eval batch \
--prompt "Detailed analysis of this document" \
--models "google:gemini-2.5-pro-exp-03-25" \
--id "detailed-analysis" \
--directory "./important-docs" \
--file-pattern "*.pdf" \
--concurrent \
--max-concurrency 2
File Limits
Control the number of files processed:
# Process only the first 10 files (for testing)
umwelten eval batch \
--prompt "Analyze this document" \
--models "google:gemini-2.0-flash" \
--id "test-batch" \
--directory "./large-collection" \
--file-pattern "*.pdf" \
--file-limit 10 \
--concurrent
# Process all files (default behavior)
umwelten eval batch \
--prompt "Full analysis" \
--models "google:gemini-2.0-flash" \
--id "complete-batch" \
--directory "./documents" \
--file-pattern "*.pdf" \
--concurrent
Resume Interrupted Processing
Continue from where you left off:
umwelten eval batch \
--prompt "Continue processing from where we left off" \
--models "google:gemini-2.0-flash" \
--id "large-document-batch" \
--directory "./documents" \
--file-pattern "*.pdf" \
--resume \
--concurrent
Structured Output in Batches
Schema Validation
Apply structured output schemas to batch processing:
umwelten eval batch \
--prompt "Extract structured metadata from this document" \
--models "google:gemini-2.0-flash" \
--id "metadata-extraction" \
--directory "./documents" \
--file-pattern "*.pdf" \
--schema "title, author, date, category, summary" \
--concurrent
Complex Zod Schemas
Use TypeScript schemas for complex validation:
umwelten eval batch \
--prompt "Analyze this image and extract detailed features" \
--models "google:gemini-2.0-flash,ollama:qwen2.5vl:latest" \
--id "image-feature-batch" \
--directory "./images" \
--file-pattern "*.{jpg,png}" \
--zod-schema "./schemas/image-features.ts" \
--validate-output \
--concurrent
Interactive Batch Processing
Real-time Progress Monitoring
Watch batch processing progress in real-time:
umwelten eval batch \
--prompt "Process this file and extract insights" \
--models "google:gemini-2.0-flash,ollama:gemma3:12b" \
--id "interactive-batch" \
--directory "./documents" \
--file-pattern "*.pdf" \
--ui \
--concurrent \
--max-concurrency 4
File Type Specific Examples
Document Processing
# PDF documents
umwelten eval batch \
--prompt "Summarize this PDF document in 200 words" \
--models "google:gemini-2.0-flash" \
--id "pdf-summaries" \
--directory "./pdfs" \
--file-pattern "*.pdf" \
--concurrent
# Text files
umwelten eval batch \
--prompt "Analyze the sentiment and key themes in this text" \
--models "ollama:gemma3:12b" \
--id "text-analysis" \
--directory "./texts" \
--file-pattern "*.{txt,md}" \
--concurrent
Image Processing
# Photo analysis
umwelten eval batch \
--prompt "Describe this photo including objects, setting, and mood" \
--models "google:gemini-2.0-flash,ollama:qwen2.5vl:latest" \
--id "photo-descriptions" \
--directory "./photos" \
--file-pattern "*.{jpg,jpeg,png}" \
--concurrent
# Screenshot analysis
umwelten eval batch \
--prompt "Identify the type of application or website in this screenshot" \
--models "google:gemini-2.0-flash" \
--id "screenshot-classification" \
--directory "./screenshots" \
--file-pattern "screenshot_*.png" \
--concurrent
Mixed Media Processing
# All supported file types
umwelten eval batch \
--prompt "Analyze this file and determine its content type and key information" \
--models "google:gemini-2.0-flash" \
--id "mixed-media-analysis" \
--directory "./mixed-files" \
--file-pattern "*.{pdf,jpg,png,txt,md}" \
--concurrent
Output Structure
Directory Organization
Batch processing creates organized output directories:
output/evaluations/batch-id/
├── responses/
│ ├── file1.pdf/
│ │ ├── google_gemini-2.0-flash.json
│ │ └── ollama_gemma3_12b.json
│ ├── file2.pdf/
│ │ ├── google_gemini-2.0-flash.json
│ │ └── ollama_gemma3_12b.json
│ └── file3.pdf/
│ ├── google_gemini-2.0-flash.json
│ └── ollama_gemma3_12b.json
└── reports/
├── results.md
├── results.html
└── results.csv
Generate Batch Reports
# Comprehensive markdown report
umwelten eval report --id document-analysis --format markdown
# HTML report with rich formatting
umwelten eval report --id image-descriptions --format html --output batch-report.html
# CSV export for data analysis
umwelten eval report --id metadata-extraction --format csv --output batch-results.csv
# JSON for programmatic processing
umwelten eval report --id interactive-batch --format json
Performance Optimization
Concurrency Guidelines
- Start Conservative: Begin with 2-3 concurrent processes
- Monitor Resources: Watch CPU, memory, and network usage
- Adjust Based on Model: Expensive models may need lower concurrency
- Rate Limit Awareness: Some providers have rate limits
File Organization Tips
- Group Similar Files: Process similar file types together
- Use Descriptive Names: Help with organization and debugging
- Consider File Sizes: Large files may need longer timeouts
- Test with Samples: Use
--file-limit
to test on small batches first
Model Selection for Batches
- Google Gemini 2.0 Flash: Best balance of speed, quality, and cost
- Ollama Models: Free processing, good for large batches
- Premium Models: Reserve for high-value or critical files
- Multiple Models: Use for comparison and validation
Error Handling
Robust Processing
# With timeout and validation
umwelten eval batch \
--prompt "Analyze this file with error handling" \
--models "google:gemini-2.0-flash" \
--id "robust-batch" \
--directory "./files" \
--file-pattern "*.pdf" \
--timeout 45000 \
--concurrent \
--validate-output
Resume on Failures
# Resume after fixing issues
umwelten eval batch \
--prompt "Continue processing after resolving errors" \
--models "google:gemini-2.0-flash" \
--id "robust-batch" \
--directory "./files" \
--file-pattern "*.pdf" \
--resume \
--concurrent
Common Batch Patterns
Document Library Processing
# Categorize and tag documents
umwelten eval batch \
--prompt "Categorize this document and extract tags" \
--models "google:gemini-2.0-flash" \
--id "document-library" \
--directory "./library" \
--file-pattern "**/*.{pdf,docx}" \
--schema "category, tags array, summary, confidence int: 1-10" \
--concurrent
Content Moderation
# Screen content for appropriateness
umwelten eval batch \
--prompt "Analyze this content for safety and appropriateness" \
--models "google:gemini-2.0-flash" \
--id "content-moderation" \
--directory "./user-uploads" \
--file-pattern "*.{jpg,png,pdf}" \
--schema "safe bool, category, issues array, confidence int: 1-10" \
--concurrent
Data Extraction Pipeline
# Extract structured data from forms
umwelten eval batch \
--prompt "Extract form data from this document" \
--models "google:gemini-2.0-flash" \
--id "form-extraction" \
--directory "./forms" \
--file-pattern "*.pdf" \
--zod-schema "./schemas/form-data.ts" \
--concurrent
Monitoring and Analytics
List Batch Evaluations
# Show all batch evaluations
umwelten eval list --details
# Filter for batch evaluations only
umwelten eval list --json | jq '.[] | select(.type == "batch")'
Performance Analysis
# Generate performance report
umwelten eval report --id large-batch --format json > performance.json
# Analyze timing and costs
umwelten eval report --id document-batch --format csv --output analysis.csv
Best Practices
Planning
- Test First: Use
--file-limit 5
to test on small samples - Estimate Costs: Calculate costs before processing large batches
- Organize Files: Use clear directory structure and naming
- Check Capacity: Ensure sufficient disk space for outputs
Execution
- Monitor Progress: Use
--ui
for long-running batches - Set Timeouts: Prevent hanging on problematic files
- Enable Resume: Always use
--resume
capability for large batches - Concurrent Processing: Enable for significant speed improvements
Quality Control
- Validate Schema: Test schemas on individual files first
- Random Sampling: Review random samples from large batches
- Error Analysis: Check failed files and adjust parameters
- Version Control: Keep track of batch parameters and results
Troubleshooting
Common Issues
- Rate Limiting: Reduce
--max-concurrency
- Memory Issues: Process smaller batches or increase system memory
- Timeout Errors: Increase
--timeout
value - Schema Validation Failures: Test schema on individual files first
- File Not Found: Check file patterns and directory paths
Debugging Commands
# Test single file from batch
umwelten eval run \
--prompt "Test prompt" \
--models "google:gemini-2.0-flash" \
--id "debug-single" \
--attach "./problematic-file.pdf"
# List files that would be processed
ls ./directory/*.pdf | head -10
# Check batch status
umwelten eval list --details | grep batch-id
Next Steps
- Try structured output for consistent data extraction
- Explore cost optimization for budget-conscious batches
- See model evaluation for systematic testing
- Review examples for specific use cases