Batch Processing

Learn how to process multiple files efficiently using Umwelten's batch processing capabilities with concurrent execution and intelligent error handling.

Overview

Batch processing allows you to evaluate multiple files with the same prompt across multiple models concurrently. This is ideal for processing document libraries, image collections, or any set of files that need consistent analysis.

Basic Batch Processing

Simple File Processing

Process all files in a directory:

bash

umwelten eval batch \
  --prompt "Analyze this document and provide a summary" \
  --models "google:gemini-2.0-flash,ollama:gemma3:12b" \
  --id "document-analysis" \
  --directory "./documents" \
  --file-pattern "*.pdf" \
  --concurrent

File Pattern Matching

Use glob patterns to target specific files:

bash

# Process only JPEG images
umwelten eval batch \
  --prompt "Describe this image in detail" \
  --models "google:gemini-2.0-flash,ollama:qwen2.5vl:latest" \
  --id "image-descriptions" \
  --directory "./photos" \
  --file-pattern "*.{jpg,jpeg}" \
  --concurrent

# Process files with specific naming patterns
umwelten eval batch \
  --prompt "Analyze this report" \
  --models "google:gemini-2.0-flash" \
  --id "monthly-reports" \
  --directory "./reports" \
  --file-pattern "report_2024_*.pdf"

Recursive Directory Processing

Process files in subdirectories:

bash

umwelten eval batch \
  --prompt "Categorize this document by type and content" \
  --models "google:gemini-2.0-flash" \
  --id "document-categorization" \
  --directory "./document-library" \
  --file-pattern "**/*.{pdf,docx,txt}" \
  --concurrent

Advanced Batch Options

Concurrency Control

Optimize processing speed with concurrency settings:

bash

# High concurrency for fast processing
umwelten eval batch \
  --prompt "Extract key information from this file" \
  --models "google:gemini-2.0-flash" \
  --id "high-speed-processing" \
  --directory "./files" \
  --file-pattern "*.pdf" \
  --concurrent \
  --max-concurrency 8

# Conservative concurrency to avoid rate limits
umwelten eval batch \
  --prompt "Detailed analysis of this document" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --id "detailed-analysis" \
  --directory "./important-docs" \
  --file-pattern "*.pdf" \
  --concurrent \
  --max-concurrency 2

File Limits

Control the number of files processed:

bash

# Process only the first 10 files (for testing)
umwelten eval batch \
  --prompt "Analyze this document" \
  --models "google:gemini-2.0-flash" \
  --id "test-batch" \
  --directory "./large-collection" \
  --file-pattern "*.pdf" \
  --file-limit 10 \
  --concurrent

# Process all files (default behavior)
umwelten eval batch \
  --prompt "Full analysis" \
  --models "google:gemini-2.0-flash" \
  --id "complete-batch" \
  --directory "./documents" \
  --file-pattern "*.pdf" \
  --concurrent

Resume Interrupted Processing

Continue from where you left off:

bash

umwelten eval batch \
  --prompt "Continue processing from where we left off" \
  --models "google:gemini-2.0-flash" \
  --id "large-document-batch" \
  --directory "./documents" \
  --file-pattern "*.pdf" \
  --resume \
  --concurrent

Structured Output in Batches

Schema Validation

Apply structured output schemas to batch processing:

bash

umwelten eval batch \
  --prompt "Extract structured metadata from this document" \
  --models "google:gemini-2.0-flash" \
  --id "metadata-extraction" \
  --directory "./documents" \
  --file-pattern "*.pdf" \
  --schema "title, author, date, category, summary" \
  --concurrent

Complex Zod Schemas

Use TypeScript schemas for complex validation:

bash

umwelten eval batch \
  --prompt "Analyze this image and extract detailed features" \
  --models "google:gemini-2.0-flash,ollama:qwen2.5vl:latest" \
  --id "image-feature-batch" \
  --directory "./images" \
  --file-pattern "*.{jpg,png}" \
  --zod-schema "./schemas/image-features.ts" \
  --validate-output \
  --concurrent

Interactive Batch Processing

Real-time Progress Monitoring

Watch batch processing progress in real-time:

bash

umwelten eval batch \
  --prompt "Process this file and extract insights" \
  --models "google:gemini-2.0-flash,ollama:gemma3:12b" \
  --id "interactive-batch" \
  --directory "./documents" \
  --file-pattern "*.pdf" \
  --ui \
  --concurrent \
  --max-concurrency 4

File Type Specific Examples

Document Processing

bash

# PDF documents
umwelten eval batch \
  --prompt "Summarize this PDF document in 200 words" \
  --models "google:gemini-2.0-flash" \
  --id "pdf-summaries" \
  --directory "./pdfs" \
  --file-pattern "*.pdf" \
  --concurrent

# Text files
umwelten eval batch \
  --prompt "Analyze the sentiment and key themes in this text" \
  --models "ollama:gemma3:12b" \
  --id "text-analysis" \
  --directory "./texts" \
  --file-pattern "*.{txt,md}" \
  --concurrent

Image Processing

bash

# Photo analysis
umwelten eval batch \
  --prompt "Describe this photo including objects, setting, and mood" \
  --models "google:gemini-2.0-flash,ollama:qwen2.5vl:latest" \
  --id "photo-descriptions" \
  --directory "./photos" \
  --file-pattern "*.{jpg,jpeg,png}" \
  --concurrent

# Screenshot analysis
umwelten eval batch \
  --prompt "Identify the type of application or website in this screenshot" \
  --models "google:gemini-2.0-flash" \
  --id "screenshot-classification" \
  --directory "./screenshots" \
  --file-pattern "screenshot_*.png" \
  --concurrent

Mixed Media Processing

bash

# All supported file types
umwelten eval batch \
  --prompt "Analyze this file and determine its content type and key information" \
  --models "google:gemini-2.0-flash" \
  --id "mixed-media-analysis" \
  --directory "./mixed-files" \
  --file-pattern "*.{pdf,jpg,png,txt,md}" \
  --concurrent

Output Structure

Directory Organization

Batch processing creates organized output directories:

output/evaluations/batch-id/
├── responses/
│   ├── file1.pdf/
│   │   ├── google_gemini-2.0-flash.json
│   │   └── ollama_gemma3_12b.json
│   ├── file2.pdf/
│   │   ├── google_gemini-2.0-flash.json
│   │   └── ollama_gemma3_12b.json
│   └── file3.pdf/
│       ├── google_gemini-2.0-flash.json
│       └── ollama_gemma3_12b.json
└── reports/
    ├── results.md
    ├── results.html
    └── results.csv

Generate Batch Reports

bash

# Comprehensive markdown report
umwelten eval report --id document-analysis --format markdown

# HTML report with rich formatting
umwelten eval report --id image-descriptions --format html --output batch-report.html

# CSV export for data analysis
umwelten eval report --id metadata-extraction --format csv --output batch-results.csv

# JSON for programmatic processing
umwelten eval report --id interactive-batch --format json

Performance Optimization

Concurrency Guidelines

Start Conservative: Begin with 2-3 concurrent processes
Monitor Resources: Watch CPU, memory, and network usage
Adjust Based on Model: Expensive models may need lower concurrency
Rate Limit Awareness: Some providers have rate limits

File Organization Tips

Group Similar Files: Process similar file types together
Use Descriptive Names: Help with organization and debugging
Consider File Sizes: Large files may need longer timeouts
Test with Samples: Use --file-limit to test on small batches first

Model Selection for Batches

Google Gemini 2.0 Flash: Best balance of speed, quality, and cost
Ollama Models: Free processing, good for large batches
Premium Models: Reserve for high-value or critical files
Multiple Models: Use for comparison and validation

Error Handling

Robust Processing

bash

# With timeout and validation
umwelten eval batch \
  --prompt "Analyze this file with error handling" \
  --models "google:gemini-2.0-flash" \
  --id "robust-batch" \
  --directory "./files" \
  --file-pattern "*.pdf" \
  --timeout 45000 \
  --concurrent \
  --validate-output

Resume on Failures

bash

# Resume after fixing issues
umwelten eval batch \
  --prompt "Continue processing after resolving errors" \
  --models "google:gemini-2.0-flash" \
  --id "robust-batch" \
  --directory "./files" \
  --file-pattern "*.pdf" \
  --resume \
  --concurrent

Common Batch Patterns

Document Library Processing

bash

# Categorize and tag documents
umwelten eval batch \
  --prompt "Categorize this document and extract tags" \
  --models "google:gemini-2.0-flash" \
  --id "document-library" \
  --directory "./library" \
  --file-pattern "**/*.{pdf,docx}" \
  --schema "category, tags array, summary, confidence int: 1-10" \
  --concurrent

Content Moderation

bash

# Screen content for appropriateness
umwelten eval batch \
  --prompt "Analyze this content for safety and appropriateness" \
  --models "google:gemini-2.0-flash" \
  --id "content-moderation" \
  --directory "./user-uploads" \
  --file-pattern "*.{jpg,png,pdf}" \
  --schema "safe bool, category, issues array, confidence int: 1-10" \
  --concurrent

Data Extraction Pipeline

bash

# Extract structured data from forms
umwelten eval batch \
  --prompt "Extract form data from this document" \
  --models "google:gemini-2.0-flash" \
  --id "form-extraction" \
  --directory "./forms" \
  --file-pattern "*.pdf" \
  --zod-schema "./schemas/form-data.ts" \
  --concurrent

Monitoring and Analytics

List Batch Evaluations

bash

# Show all batch evaluations
umwelten eval list --details

# Filter for batch evaluations only
umwelten eval list --json | jq '.[] | select(.type == "batch")'

Performance Analysis

bash

# Generate performance report
umwelten eval report --id large-batch --format json > performance.json

# Analyze timing and costs
umwelten eval report --id document-batch --format csv --output analysis.csv

Best Practices

Planning

Test First: Use --file-limit 5 to test on small samples
Estimate Costs: Calculate costs before processing large batches
Organize Files: Use clear directory structure and naming
Check Capacity: Ensure sufficient disk space for outputs

Execution

Monitor Progress: Use --ui for long-running batches
Set Timeouts: Prevent hanging on problematic files
Enable Resume: Always use --resume capability for large batches
Concurrent Processing: Enable for significant speed improvements

Quality Control

Validate Schema: Test schemas on individual files first
Random Sampling: Review random samples from large batches
Error Analysis: Check failed files and adjust parameters
Version Control: Keep track of batch parameters and results

Troubleshooting

Common Issues

Rate Limiting: Reduce --max-concurrency
Memory Issues: Process smaller batches or increase system memory
Timeout Errors: Increase --timeout value
Schema Validation Failures: Test schema on individual files first
File Not Found: Check file patterns and directory paths

Debugging Commands

bash

# Test single file from batch
umwelten eval run \
  --prompt "Test prompt" \
  --models "google:gemini-2.0-flash" \
  --id "debug-single" \
  --attach "./problematic-file.pdf"

# List files that would be processed
ls ./directory/*.pdf | head -10

# Check batch status
umwelten eval list --details | grep batch-id

Next Steps

Try structured output for consistent data extraction
Explore cost optimization for budget-conscious batches
See model evaluation for systematic testing
Review examples for specific use cases

Batch Processing ​

Overview ​

Basic Batch Processing ​

Simple File Processing ​

File Pattern Matching ​

Recursive Directory Processing ​

Advanced Batch Options ​

Concurrency Control ​

File Limits ​

Resume Interrupted Processing ​

Structured Output in Batches ​

Schema Validation ​

Complex Zod Schemas ​

Interactive Batch Processing ​

Real-time Progress Monitoring ​

File Type Specific Examples ​

Document Processing ​

Image Processing ​

Mixed Media Processing ​

Output Structure ​

Directory Organization ​

Generate Batch Reports ​

Performance Optimization ​

Concurrency Guidelines ​

File Organization Tips ​

Model Selection for Batches ​

Error Handling ​

Robust Processing ​

Resume on Failures ​

Common Batch Patterns ​

Document Library Processing ​

Content Moderation ​

Data Extraction Pipeline ​

Monitoring and Analytics ​

List Batch Evaluations ​

Performance Analysis ​

Best Practices ​

Planning ​

Execution ​

Quality Control ​

Troubleshooting ​

Common Issues ​

Debugging Commands ​

Next Steps ​

Batch Processing

Overview

Basic Batch Processing

Simple File Processing

File Pattern Matching

Recursive Directory Processing

Advanced Batch Options

Concurrency Control

File Limits

Resume Interrupted Processing

Structured Output in Batches

Schema Validation

Complex Zod Schemas

Interactive Batch Processing

Real-time Progress Monitoring

File Type Specific Examples

Document Processing

Image Processing

Mixed Media Processing

Output Structure

Directory Organization

Generate Batch Reports

Performance Optimization

Concurrency Guidelines

File Organization Tips

Model Selection for Batches

Error Handling

Robust Processing

Resume on Failures

Common Batch Patterns

Document Library Processing

Content Moderation

Data Extraction Pipeline

Monitoring and Analytics

List Batch Evaluations

Performance Analysis

Best Practices

Planning

Execution

Quality Control

Troubleshooting

Common Issues

Debugging Commands

Next Steps