Batch Processing
Learn how to process multiple files efficiently using Umwelten's batch processing capabilities with concurrent execution and intelligent error handling.
Overview
Batch processing allows you to evaluate multiple files with the same prompt across multiple models concurrently. This is ideal for processing document libraries, image collections, or any set of files that need consistent analysis.
Basic Batch Processing
Simple File Processing
Process all files in a directory:
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Analyze this document and provide a summary" \
--models "google:gemini-3-flash-preview,ollama:gemma3:12b" \
--id "document-analysis" \
--directory "./documents" \
--file-pattern "*.pdf" \
--concurrentFile Pattern Matching
Use glob patterns to target specific files:
# Process only JPEG images
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Describe this image in detail" \
--models "google:gemini-3-flash-preview,ollama:qwen2.5vl:latest" \
--id "image-descriptions" \
--directory "./photos" \
--file-pattern "*.{jpg,jpeg}" \
--concurrent
# Process files with specific naming patterns
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Analyze this report" \
--models "google:gemini-3-flash-preview" \
--id "monthly-reports" \
--directory "./reports" \
--file-pattern "report_2024_*.pdf"Recursive Directory Processing
Process files in subdirectories:
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Categorize this document by type and content" \
--models "google:gemini-3-flash-preview" \
--id "document-categorization" \
--directory "./document-library" \
--file-pattern "**/*.{pdf,docx,txt}" \
--concurrentAdvanced Batch Options
Concurrency Control
Optimize processing speed with concurrency settings:
# High concurrency for fast processing
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Extract key information from this file" \
--models "google:gemini-3-flash-preview" \
--id "high-speed-processing" \
--directory "./files" \
--file-pattern "*.pdf" \
--concurrent \
--max-concurrency 8
# Conservative concurrency to avoid rate limits
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Detailed analysis of this document" \
--models "google:gemini-2.5-pro-exp-03-25" \
--id "detailed-analysis" \
--directory "./important-docs" \
--file-pattern "*.pdf" \
--concurrent \
--max-concurrency 2File Limits
Control the number of files processed:
# Process only the first 10 files (for testing)
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Analyze this document" \
--models "google:gemini-3-flash-preview" \
--id "test-batch" \
--directory "./large-collection" \
--file-pattern "*.pdf" \
--file-limit 10 \
--concurrent
# Process all files (default behavior)
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Full analysis" \
--models "google:gemini-3-flash-preview" \
--id "complete-batch" \
--directory "./documents" \
--file-pattern "*.pdf" \
--concurrentResume Interrupted Processing
Continue from where you left off:
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Continue processing from where we left off" \
--models "google:gemini-3-flash-preview" \
--id "large-document-batch" \
--directory "./documents" \
--file-pattern "*.pdf" \
--resume \
--concurrentStructured Output in Batches
Schema Validation
Apply structured output schemas to batch processing:
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Extract structured metadata from this document" \
--models "google:gemini-3-flash-preview" \
--id "metadata-extraction" \
--directory "./documents" \
--file-pattern "*.pdf" \
--schema "title, author, date, category, summary" \
--concurrentComplex Zod Schemas
Use TypeScript schemas for complex validation:
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Analyze this image and extract detailed features" \
--models "google:gemini-3-flash-preview,ollama:qwen2.5vl:latest" \
--id "image-feature-batch" \
--directory "./images" \
--file-pattern "*.{jpg,png}" \
--zod-schema "./schemas/image-features.ts" \
--validate-output \
--concurrentInteractive Batch Processing
Real-time Progress Monitoring
Watch batch processing progress in real-time:
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Process this file and extract insights" \
--models "google:gemini-3-flash-preview,ollama:gemma3:12b" \
--id "interactive-batch" \
--directory "./documents" \
--file-pattern "*.pdf" \
--ui \
--concurrent \
--max-concurrency 4File Type Specific Examples
Document Processing
# PDF documents
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Summarize this PDF document in 200 words" \
--models "google:gemini-3-flash-preview" \
--id "pdf-summaries" \
--directory "./pdfs" \
--file-pattern "*.pdf" \
--concurrent
# Text files
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Analyze the sentiment and key themes in this text" \
--models "ollama:gemma3:12b" \
--id "text-analysis" \
--directory "./texts" \
--file-pattern "*.{txt,md}" \
--concurrentImage Processing
# Photo analysis
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Describe this photo including objects, setting, and mood" \
--models "google:gemini-3-flash-preview,ollama:qwen2.5vl:latest" \
--id "photo-descriptions" \
--directory "./photos" \
--file-pattern "*.{jpg,jpeg,png}" \
--concurrent
# Screenshot analysis
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Identify the type of application or website in this screenshot" \
--models "google:gemini-3-flash-preview" \
--id "screenshot-classification" \
--directory "./screenshots" \
--file-pattern "screenshot_*.png" \
--concurrentMixed Media Processing
# All supported file types
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Analyze this file and determine its content type and key information" \
--models "google:gemini-3-flash-preview" \
--id "mixed-media-analysis" \
--directory "./mixed-files" \
--file-pattern "*.{pdf,jpg,png,txt,md}" \
--concurrentOutput Structure
Directory Organization
Batch processing creates organized output directories:
output/evaluations/batch-id/
├── responses/
│ ├── file1.pdf/
│ │ ├── google_gemini-3-flash-preview.json
│ │ └── ollama_gemma3_12b.json
│ ├── file2.pdf/
│ │ ├── google_gemini-3-flash-preview.json
│ │ └── ollama_gemma3_12b.json
│ └── file3.pdf/
│ ├── google_gemini-3-flash-preview.json
│ └── ollama_gemma3_12b.json
└── reports/
├── results.md
├── results.html
└── results.csvGenerate Batch Reports
# Comprehensive markdown report
dotenvx run -- pnpm run cli -- eval report --id document-analysis --format markdown
# HTML report with rich formatting
dotenvx run -- pnpm run cli -- eval report --id image-descriptions --format html --output batch-report.html
# CSV export for data analysis
dotenvx run -- pnpm run cli -- eval report --id metadata-extraction --format csv --output batch-results.csv
# JSON for programmatic processing
dotenvx run -- pnpm run cli -- eval report --id interactive-batch --format jsonPerformance Optimization
Concurrency Guidelines
- Start Conservative: Begin with 2-3 concurrent processes
- Monitor Resources: Watch CPU, memory, and network usage
- Adjust Based on Model: Expensive models may need lower concurrency
- Rate Limit Awareness: Some providers have rate limits
File Organization Tips
- Group Similar Files: Process similar file types together
- Use Descriptive Names: Help with organization and debugging
- Consider File Sizes: Large files may need longer timeouts
- Test with Samples: Use
--file-limitto test on small batches first
Model Selection for Batches
- Google Gemini 2.0 Flash: Best balance of speed, quality, and cost
- Ollama Models: Free processing, good for large batches
- Premium Models: Reserve for high-value or critical files
- Multiple Models: Use for comparison and validation
Error Handling
Robust Processing
# With timeout and validation
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Analyze this file with error handling" \
--models "google:gemini-3-flash-preview" \
--id "robust-batch" \
--directory "./files" \
--file-pattern "*.pdf" \
--timeout 45000 \
--concurrent \
--validate-outputResume on Failures
# Resume after fixing issues
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Continue processing after resolving errors" \
--models "google:gemini-3-flash-preview" \
--id "robust-batch" \
--directory "./files" \
--file-pattern "*.pdf" \
--resume \
--concurrentCommon Batch Patterns
Document Library Processing
# Categorize and tag documents
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Categorize this document and extract tags" \
--models "google:gemini-3-flash-preview" \
--id "document-library" \
--directory "./library" \
--file-pattern "**/*.{pdf,docx}" \
--schema "category, tags array, summary, confidence int: 1-10" \
--concurrentContent Moderation
# Screen content for appropriateness
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Analyze this content for safety and appropriateness" \
--models "google:gemini-3-flash-preview" \
--id "content-moderation" \
--directory "./user-uploads" \
--file-pattern "*.{jpg,png,pdf}" \
--schema "safe bool, category, issues array, confidence int: 1-10" \
--concurrentData Extraction Pipeline
# Extract structured data from forms
dotenvx run -- pnpm run cli -- eval batch \
--prompt "Extract form data from this document" \
--models "google:gemini-3-flash-preview" \
--id "form-extraction" \
--directory "./forms" \
--file-pattern "*.pdf" \
--zod-schema "./schemas/form-data.ts" \
--concurrentMonitoring and Analytics
List Batch Evaluations
# Show all batch evaluations
dotenvx run -- pnpm run cli -- eval list --details
# Filter for batch evaluations only
dotenvx run -- pnpm run cli -- eval list --json | jq '.[] | select(.type == "batch")'Performance Analysis
# Generate performance report
dotenvx run -- pnpm run cli -- eval report --id large-batch --format json > performance.json
# Analyze timing and costs
dotenvx run -- pnpm run cli -- eval report --id document-batch --format csv --output analysis.csvBest Practices
Planning
- Test First: Use
--file-limit 5to test on small samples - Estimate Costs: Calculate costs before processing large batches
- Organize Files: Use clear directory structure and naming
- Check Capacity: Ensure sufficient disk space for outputs
Execution
- Monitor Progress: Use
--uifor long-running batches - Set Timeouts: Prevent hanging on problematic files
- Enable Resume: Always use
--resumecapability for large batches - Concurrent Processing: Enable for significant speed improvements
Quality Control
- Validate Schema: Test schemas on individual files first
- Random Sampling: Review random samples from large batches
- Error Analysis: Check failed files and adjust parameters
- Version Control: Keep track of batch parameters and results
Troubleshooting
Common Issues
- Rate Limiting: Reduce
--max-concurrency - Memory Issues: Process smaller batches or increase system memory
- Timeout Errors: Increase
--timeoutvalue - Schema Validation Failures: Test schema on individual files first
- File Not Found: Check file patterns and directory paths
Debugging Commands
# Test single file from batch
dotenvx run -- pnpm run cli -- eval run \
--prompt "Test prompt" \
--models "google:gemini-3-flash-preview" \
--id "debug-single" \
--attach "./problematic-file.pdf"
# List files that would be processed
ls ./directory/*.pdf | head -10
# Check batch status
dotenvx run -- pnpm run cli -- eval list --details | grep batch-idNext Steps
- Try structured output for consistent data extraction
- Explore cost optimization for budget-conscious batches
- See model evaluation for systematic testing
- Review examples for specific use cases