Getting Started

Quick Start

Welcome to umwelten! This guide will help you get up and running with the evaluation framework in just a few minutes.

Prerequisites

Node.js 20+
pnpm (recommended) or npm
API keys for your chosen AI providers

Installation

Clone the repository

bash

git clone https://github.com/The-Focus-AI/umwelten.git
cd umwelten

Install dependencies

bash

pnpm install

Set up environment variables

bash

cp env.template .env
# Edit .env with your API keys

Build the project

bash

pnpm build

Running Examples

The easiest way to get started is by running the example scripts:

bash

# Simple evaluation example
pnpm tsx scripts/examples/simple-evaluation-example.ts

# Matrix evaluation (compare multiple models)
pnpm tsx scripts/examples/matrix-evaluation-example.ts

# Batch evaluation (process multiple inputs)
pnpm tsx scripts/examples/batch-evaluation-example.ts

# Complex pipeline (multi-step evaluation)
pnpm tsx scripts/examples/complex-pipeline-example.ts

# Comprehensive analysis
pnpm tsx scripts/examples/comprehensive-analysis-example.ts

CLI Usage

You can also use the umwelten CLI directly:

bash

# List available models
pnpm cli models

# Run a simple evaluation
pnpm cli run "Hello, world!" --model gpt-4

# Interactive chat
pnpm cli chat --memory

# Run evaluation with specific model
pnpm cli eval run --prompt "Explain quantum computing" --models "gpt-4,claude-3"

Your First Evaluation

Let's create a simple evaluation to test a model's creative writing capabilities.

1. Create a Basic Script

Create a new file my-first-evaluation.ts:

typescript

import { ModelDetails } from "./src/cognition/types.js";
import { Interaction } from "./src/interaction/interaction.js";
import { LiteraryAnalysisTemplate } from "./src/stimulus/templates/creative-templates.js";
import { SimpleEvaluation } from "./src/evaluation/strategies/simple-evaluation.js";

// Create evaluation
const evaluation = new SimpleEvaluation({
  id: "my-first-evaluation",
  name: "My First Evaluation",
  description: "A simple creative writing test"
});

// Define test case
const testCase = {
  id: "creative-writing-test",
  name: "Creative Writing Test",
  stimulus: LiteraryAnalysisTemplate,
  input: {
    prompt: "Write a short story about a robot learning to paint"
  }
};

// Run evaluation
const result = await evaluation.run({
  model: {
    name: "gpt-4",
    provider: "openrouter",
    costs: {
      promptTokens: 0.0001,
      completionTokens: 0.0001
    },
    maxTokens: 1000,
    temperature: 0.7
  },
  testCases: [testCase]
});

console.log("Evaluation completed!");
console.log("Response:", result.responses[0]?.content);

2. Run the Evaluation

bash

npx tsx my-first-evaluation.ts

Understanding the Results

The evaluation returns a result object with:

Responses: Model responses for each test case
Metrics: Performance data (time, tokens, cost)
Metadata: Additional information about the evaluation

typescript

console.log("Response:", result.responses[0]?.content);
console.log("Tokens used:", result.metrics?.totalTokens);
console.log("Cost:", result.metrics?.totalCost);
console.log("Time taken:", result.metrics?.totalTime);

Next Steps

1. Try Different Models

typescript

const models = [
  { name: "gpt-4", provider: "openrouter" },
  { name: "claude-3", provider: "openrouter" },
  { name: "gemini-pro", provider: "google" }
];

for (const model of models) {
  const result = await evaluation.run({ model, testCases: [testCase] });
  console.log(`${model.name}: ${result.responses[0]?.content?.substring(0, 100)}...`);
}

2. Use Different Stimulus Templates

typescript

import { PoetryGenerationTemplate } from "./src/stimulus/templates/creative-templates.js";
import { CodeGenerationTemplate } from "./src/stimulus/templates/coding-templates.js";

// Creative writing
const creativeTest = {
  id: "poetry-test",
  name: "Poetry Generation",
  stimulus: PoetryGenerationTemplate,
  input: { prompt: "Write a haiku about the ocean" }
};

// Code generation
const codingTest = {
  id: "code-test",
  name: "Code Generation",
  stimulus: CodeGenerationTemplate,
  input: { prompt: "Write a Python function to calculate fibonacci numbers" }
};

3. Compare Multiple Models

typescript

import { MatrixEvaluation } from "./src/evaluation/strategies/matrix-evaluation.js";

const matrixEvaluation = new MatrixEvaluation({
  id: "model-comparison",
  name: "Model Comparison",
  description: "Compare multiple models on the same task"
});

const result = await matrixEvaluation.run({
  models: [
    { name: "gpt-4", provider: "openrouter" },
    { name: "claude-3", provider: "openrouter" },
    { name: "gemini-pro", provider: "google" }
  ],
  testCases: [testCase]
});

Common Patterns

1. Batch Processing

typescript

import { BatchEvaluation } from "./src/evaluation/strategies/batch-evaluation.js";

const batchEvaluation = new BatchEvaluation({
  id: "batch-processing",
  name: "Batch Processing",
  description: "Process multiple inputs with the same model"
});

const testCases = [
  { id: "test-1", stimulus: myStimulus, input: { prompt: "Input 1" } },
  { id: "test-2", stimulus: myStimulus, input: { prompt: "Input 2" } },
  { id: "test-3", stimulus: myStimulus, input: { prompt: "Input 3" } }
];

const result = await batchEvaluation.run({
  model: { name: "gpt-4", provider: "openrouter" },
  testCases
});

2. Custom Stimulus

typescript

import { Stimulus } from "./src/stimulus/stimulus.js";

const customStimulus = new Stimulus({
  id: "custom-stimulus",
  name: "Custom Stimulus",
  description: "A custom stimulus for specific needs",
  
  role: "expert analyst",
  objective: "analyze data and provide insights",
  instructions: [
    "Examine the data carefully",
    "Identify key patterns and trends",
    "Provide actionable recommendations"
  ],
  output: [
    "Structured analysis report",
    "Key findings and insights",
    "Recommendations for next steps"
  ],
  
  temperature: 0.7,
  maxTokens: 1500,
  runnerType: 'base'
});

3. Using Tools

typescript

import { PDFTools } from "./src/stimulus/tools/pdf-tools.js";

const stimulusWithTools = new Stimulus({
  // ... other properties
  tools: {
    extractText: PDFTools.extractText,
    extractMetadata: PDFTools.extractMetadata
  }
});

Configuration

Environment Variables

bash

# OpenRouter API key
OPENROUTER_API_KEY=your_openrouter_key

# Google API key
GOOGLE_API_KEY=your_google_key

# Ollama base URL (optional)
OLLAMA_BASE_URL=http://localhost:11434

# LM Studio base URL (optional)
LMSTUDIO_BASE_URL=http://localhost:1234

Model Configuration

typescript

const model: ModelDetails = {
  name: "gpt-4",
  provider: "openrouter",
  costs: {
    promptTokens: 0.0001,
    completionTokens: 0.0001
  },
  maxTokens: 1000,
  temperature: 0.7,
  // ... other model-specific settings
};

Troubleshooting

Common Issues

API Key Errors

Verify your API keys are set correctly
Check the provider documentation for key format
Ensure the key has the necessary permissions

Model Not Found

Verify the model name is correct
Check if the model is available in your region
Ensure you have access to the model

Rate Limit Errors

The framework automatically handles rate limits
Consider using caching to reduce API calls
Implement delays between requests if needed

Getting Help

Check the API Reference
Look at examples
Review troubleshooting guide
Open an issue on GitHub

What's Next?

Now that you have the basics, explore:

Creating Evaluations - Learn how to create more complex evaluations
Writing Scripts - Best practices for writing test scripts
Stimulus Templates - Using and creating stimulus templates
Tool Integration - Integrating external tools
Advanced Features - Complex evaluation patterns

Happy evaluating! 🚀

Getting Started ​

Quick Start ​

Prerequisites ​

Installation ​

Running Examples ​

CLI Usage ​

Your First Evaluation ​

1. Create a Basic Script ​

2. Run the Evaluation ​

Understanding the Results ​

Next Steps ​

1. Try Different Models ​

2. Use Different Stimulus Templates ​

3. Compare Multiple Models ​

Common Patterns ​

1. Batch Processing ​

2. Custom Stimulus ​

3. Using Tools ​

Configuration ​

Environment Variables ​

Model Configuration ​

Troubleshooting ​

Common Issues ​

API Key Errors ​

Model Not Found ​

Rate Limit Errors ​

Getting Help ​

What's Next? ​

Getting Started

Quick Start

Prerequisites

Installation

Running Examples

CLI Usage

Your First Evaluation

1. Create a Basic Script

2. Run the Evaluation

Understanding the Results

Next Steps

1. Try Different Models

2. Use Different Stimulus Templates

3. Compare Multiple Models

Common Patterns

1. Batch Processing

2. Custom Stimulus

3. Using Tools

Configuration

Environment Variables

Model Configuration

Troubleshooting

Common Issues

API Key Errors

Model Not Found

Rate Limit Errors

Getting Help

What's Next?