Examples Overview
This section provides comprehensive examples showing how to use Umwelten for various AI model evaluation tasks. The examples demonstrate the new infrastructure-first architecture with stimulus-centric design.
🚀 Quick Start Examples
Run these examples to get started immediately:
bash
# Simple evaluation example
pnpm tsx scripts/examples/simple-evaluation-example.ts
# Matrix evaluation (compare multiple models)
pnpm tsx scripts/examples/matrix-evaluation-example.ts
# Batch evaluation (process multiple inputs)
pnpm tsx scripts/examples/batch-evaluation-example.ts
# Complex pipeline (multi-step evaluation)
pnpm tsx scripts/examples/complex-pipeline-example.ts
# Comprehensive analysis
pnpm tsx scripts/examples/comprehensive-analysis-example.ts📚 Infrastructure Examples
The new infrastructure-first approach with reusable components:
- Simple Evaluation - Basic single-model evaluation using stimulus templates
- Matrix Evaluation - Multi-model comparison and benchmarking
- Batch Evaluation - Batch processing with tool integration
- Complex Pipeline - Multi-step evaluations with dependencies
- Comprehensive Analysis - Performance and quality analysis
Basic Examples
Perfect for getting started with Umwelten:
- Simple Text Generation - Basic prompt evaluation across models
- Creative Writing - Poetry and story generation with temperature control
- Analysis & Reasoning - Complex reasoning tasks and literary analysis
- Tool Integration - Using and creating tools to enhance AI capabilities
Image Processing Examples
Working with visual content and structured data extraction:
- Basic Image Analysis - Simple image description and analysis
- Structured Image Features - Extract structured data with confidence scores
- Batch Image Processing - Process multiple images concurrently
Document Processing
Handle various document formats:
- PDF Analysis - Test native PDF parsing capabilities
- Multi-format Documents - Work with different document types
Advanced Workflows
Complex evaluation patterns and optimization:
- Multi-language Evaluation - Code generation across programming languages
- Complex Structured Output - Advanced schema validation with nested objects
- Cost Optimization - Compare model costs and performance
Migration Reference
These examples show CLI equivalents for scripts that have been migrated:
| Script | Example | Status |
|---|---|---|
cat-poem.ts | Creative Writing | ✅ Complete |
temperature.ts | Creative Writing | ✅ Complete |
frankenstein.ts | Analysis & Reasoning | ✅ Complete |
google-pricing.ts | Cost Optimization | ✅ Complete |
image-parsing.ts | Basic Image Analysis | ✅ Complete |
image-feature-extract.ts | Structured Image Features | ✅ Complete |
image-feature-batch.ts | Batch Image Processing | ✅ Complete |
pdf-identify.ts | PDF Analysis | ✅ Complete |
pdf-parsing.ts | PDF Analysis | ✅ Complete |
roadtrip.ts | Complex Structured Output | 🔄 Partial |
multi-language-evaluation.ts | Multi-language Evaluation | 🔄 Needs Pipeline |
Quick Examples
🆕 New Pattern Examples
bash
# Interactive chat with tools
pnpm tsx src/cli/cli.ts chat-new -p ollama -m llama3.2:latest
# Tools demonstration
pnpm tsx scripts/tools.ts -p ollama -m llama3.2:latest --prompt "What's the weather in New York?"
# Programmatic usage
pnpm tsx scripts/new-pattern-example.tsTraditional CLI Examples
Here are some quick examples to get you started:
Basic Evaluation
bash
umwelten eval run \
--prompt "Explain quantum computing in simple terms" \
--models "ollama:gemma3:12b,google:gemini-2.0-flash" \
--id "quantum-explanation"With Structured Output
bash
umwelten eval run \
--prompt "Extract person info: John is 25 and works as a developer" \
--models "google:gemini-2.0-flash" \
--id "person-extraction" \
--schema "name, age int, job"Batch Processing
bash
umwelten eval batch \
--prompt "Analyze this image and describe key features" \
--models "google:gemini-2.0-flash,ollama:qwen2.5vl:latest" \
--id "image-batch" \
--directory "input/images" \
--file-pattern "*.jpg" \
--concurrentGenerate Reports
bash
# Markdown report
umwelten eval report --id quantum-explanation --format markdown
# HTML report with export
umwelten eval report --id image-batch --format html --output report.htmlCommon Patterns
Temperature Testing
Compare model outputs at different creativity levels:
bash
# High creativity
umwelten eval run --prompt "Write a creative story" --models "ollama:gemma3:12b" --temperature 1.5 --id "creative-high"
# Low creativity
umwelten eval run --prompt "Write a creative story" --models "ollama:gemma3:12b" --temperature 0.2 --id "creative-low"Cost Comparison
Evaluate cost vs. quality trade-offs:
bash
umwelten eval run \
--prompt "Write a detailed analysis of renewable energy trends" \
--models "google:gemini-2.0-flash,openrouter:openai/gpt-4o-mini,openrouter:openai/gpt-4o" \
--id "cost-comparison" \
--concurrentMulti-modal Evaluation
Test vision capabilities across models:
bash
umwelten eval run \
--prompt "Describe this image in detail and identify any text" \
--models "google:gemini-2.0-flash,ollama:qwen2.5vl:latest" \
--id "vision-test" \
--attach "./test-image.jpg"Next Steps
- Browse specific examples for your use case
- Check the Migration Guide to see how scripts were converted
- Review Advanced Features for complex workflows