Walkthroughs

Practical, step-by-step guides demonstrating real-world usage of umwelten features.

Habitat

Setting Up an Agent in Habitat

Build a complete agent environment from scratch with a custom persona, tools, sub-agents, and multiple interfaces:

Create and customize a habitat work directory
Write a custom persona (STIMULUS.md)
Add tools (direct export and factory pattern)
Register and delegate to sub-agents
Run as CLI REPL and Telegram bot
Configure environment variables and model defaults

Time Required: 15-20 minutes Prerequisites: Node.js 20+, pnpm, a provider API key Optional: Telegram bot token, Tavily API key

Habitat Bridge Walkthrough

Use the Habitat Bridge System to manage remote repositories in isolated containers:

Three-phase design: Create, Start, Inspect
Saved provisioning for instant subsequent starts
Execute commands in persistent containers via MCP
Work with git repositories programmatically
Debug and monitor bridge containers

Time Required: 10-15 minutes Prerequisites: Docker running (Dagger uses Docker) Best For: Remote repo execution, complex containerized workflows

Bridge MCP Test

Manual testing guide for the Bridge MCP system:

Start a bridge and test MCP tools with curl
Use the TypeScript client programmatically
Inspect saved provisioning and logs

Session Management

Session Search Walkthrough

Search every Claude Code session you've ever had by full message content, then hop into the Exploration Browser and back:

umwelten search — interactive two-pane TUI with live debounced re-scan
Open a hit in the dashboard with Enter; q bounces back to your results
--json and --no-tui stdout modes for scripting and piping
Powered by ripgrep — sub-second cold scans, no index to maintain

Time Required: 5 minutes Prerequisites: ripgrep (rg) on PATH, Claude Code sessions on disk

Knowledge Pipeline Walkthrough

End-to-end through the Exploration-centered knowledge workflow — the modern path that replaces the older sessions index/search flow:

Project Source Sessions (Claude Code / Cursor / Habitat) into Explorations
Run the digester to extract topics, tags, key learnings, and solutionType (including planning)
Identify planning sessions specifically — by filter or by tag
Build a reflective Interaction (no new runner) to ask questions across one or more Explorations
Classify and promote answers to AGENTS.md, FACTS.md, ADRs, Skills, or Artifacts
Save Explorations for later reuse

Time Required: 15-20 minutes Prerequisites: Sessions in a project, a provider API key Best For: Finding past decisions, turning planning sessions into ADRs

Session Analysis Walkthrough (legacy)

Older sessions index/search/analyze flow. Still works, but the knowledge pipeline above is the recommended path:

List and inspect sessions
Index sessions with LLM analysis
Search through your work semantically
Analyze patterns and trends
Find past solutions quickly

Time Required: 10-15 minutes Prerequisites: Claude Code sessions in a project Cost: ~$0.03 per 100 sessions indexed

TRMNL Project Analysis

Real example of analyzing a project using session management tools:

Project: TRMNL Image Agent (automated e-ink dashboard)
Sessions Analyzed: 42
Insights: Success rates, technology stack, key learnings, optimization patterns
Outcome: Comprehensive understanding of 44 Claude sessions

This walkthrough demonstrates the actual output and insights you can gain from analyzing your own projects.

Evaluation

Building a Multi-Model Evaluation with LLM Judging

Build a complete evaluation pipeline from scratch — the "Car Wash Test" that benchmarks 131 models on common-sense reasoning:

Define model lists across Google, OpenRouter, and Ollama
Configure Stimulus and SimpleEvaluation with caching
Build an LLM judge with Zod-validated structured output
Run-based caching for resumable, comparable evaluations
Analyze and categorize results (correct / lucky / failed)

Time Required: 30 minutes to build, 15 minutes to run Prerequisites: Node.js 20+, pnpm, Google + OpenRouter API keys Cost: ~$0.50 for a full 131-model run

Building a Multi-Dimension Model Showdown

Build a comprehensive evaluation suite that tests models across 5 dimensions and generates a unified leaderboard with narrative analysis:

Define 5 evaluation dimensions (reasoning, knowledge, instruction, coding, MCP tool use)
Combine LLM-judged and deterministic scoring in one suite
Use the eval combine system for cross-evaluation aggregation
Generate narrative reports with per-dimension analysis and judge explanations
Compare cost efficiency and speed across providers

Time Required: 30 minutes to build, 2–4 hours to run Prerequisites: Node.js 20+, pnpm, Google + OpenRouter + DeepInfra API keys Cost: ~$4.63 for a full 49-model run

Model Showdown Results

Detailed analysis of 49 models tested across 5 dimensions — reasoning, knowledge, instruction following, coding, and MCP tool use:

49 models across 4 providers + local Ollama, 41 with full 5-dim MCP scores
Claude Sonnet 4.6 leads at 93.8% across all 5 dimensions
openai/gpt-oss-120b scores 89.9% for $0.01
Local nemotron-3-nano:latest on Ollama scores 84.0% for free
Deep dives into each dimension with judge explanations
Provider effect analysis: same weights, different results

Coming Soon

Batch Processing Workflows - Process large document sets efficiently
Cost Optimization Strategies - Minimize API costs while maximizing quality
Custom Tool Integration - Build and integrate your own tools

Contributing Walkthroughs

Have an interesting use case or workflow? We'd love to feature it!

Create a markdown file in docs/walkthroughs/
Follow the format: Overview -> Prerequisites -> Step-by-step -> Results
Include real commands and actual output where possible
Submit a PR with your walkthrough

See our Contributing Guide for details.

Walkthroughs ​

Habitat ​

Session Management ​

Evaluation ​

Coming Soon ​

Contributing Walkthroughs ​