Walkthroughs
Practical, step-by-step guides demonstrating real-world usage of umwelten features.
Habitat
Setting Up an Agent in Habitat
Build a complete agent environment from scratch with a custom persona, tools, sub-agents, and multiple interfaces:
- Create and customize a habitat work directory
- Write a custom persona (STIMULUS.md)
- Add tools (direct export and factory pattern)
- Register and delegate to sub-agents
- Run as CLI REPL and Telegram bot
- Configure environment variables and model defaults
Time Required: 15-20 minutes Prerequisites: Node.js 20+, pnpm, a provider API key Optional: Telegram bot token, Tavily API key
Habitat Bridge Walkthrough
Use the Habitat Bridge System to manage remote repositories in isolated containers:
- Three-phase design: Create, Start, Inspect
- Saved provisioning for instant subsequent starts
- Execute commands in persistent containers via MCP
- Work with git repositories programmatically
- Debug and monitor bridge containers
Time Required: 10-15 minutes Prerequisites: Docker running (Dagger uses Docker) Best For: Remote repo execution, complex containerized workflows
Bridge MCP Test
Manual testing guide for the Bridge MCP system:
- Start a bridge and test MCP tools with curl
- Use the TypeScript client programmatically
- Inspect saved provisioning and logs
Session Management
Session Analysis Walkthrough
Learn how to use the session management tools to understand your Claude Code work:
- List and inspect sessions
- Index sessions with LLM analysis
- Search through your work semantically
- Analyze patterns and trends
- Find past solutions quickly
Time Required: 10-15 minutes Prerequisites: Claude Code sessions in a project Cost: ~$0.03 per 100 sessions indexed
TRMNL Project Analysis
Real example of analyzing a project using session management tools:
- Project: TRMNL Image Agent (automated e-ink dashboard)
- Sessions Analyzed: 42
- Insights: Success rates, technology stack, key learnings, optimization patterns
- Outcome: Comprehensive understanding of 44 Claude sessions
This walkthrough demonstrates the actual output and insights you can gain from analyzing your own projects.
Evaluation
Building a Multi-Model Evaluation with LLM Judging
Build a complete evaluation pipeline from scratch — the "Car Wash Test" that benchmarks 131 models on common-sense reasoning:
- Define model lists across Google, OpenRouter, and Ollama
- Configure Stimulus and SimpleEvaluation with caching
- Build an LLM judge with Zod-validated structured output
- Run-based caching for resumable, comparable evaluations
- Analyze and categorize results (correct / lucky / failed)
Time Required: 30 minutes to build, 15 minutes to run Prerequisites: Node.js 20+, pnpm, Google + OpenRouter API keys Cost: ~$0.50 for a full 131-model run
Coming Soon
- Batch Processing Workflows - Process large document sets efficiently
- Cost Optimization Strategies - Minimize API costs while maximizing quality
- Custom Tool Integration - Build and integrate your own tools
Contributing Walkthroughs
Have an interesting use case or workflow? We'd love to feature it!
- Create a markdown file in
docs/walkthroughs/ - Follow the format: Overview -> Prerequisites -> Step-by-step -> Results
- Include real commands and actual output where possible
- Submit a PR with your walkthrough
See our Contributing Guide for details.