Multi-Language Processing

Process content in multiple languages with language detection, translation, and cross-lingual analysis. This example demonstrates working with international content using AI models with strong multilingual capabilities.

Overview

Multi-language processing enables you to work with content in various languages, perform translations, detect languages automatically, and conduct cross-lingual analysis. Modern AI models excel at understanding and generating content across dozens of languages.

Language Detection and Analysis

Automatic Language Detection

bash

umwelten eval batch \
  --prompt "Detect the language of this content and provide a confidence score" \
  --models "google:gemini-2.0-flash,ollama:gemma3:12b" \
  --id "language-detection" \
  --directory "./multilingual-content" \
  --file-pattern "*.{txt,pdf}" \
  --schema "detected_language, confidence int: 1-10, secondary_languages array" \
  --concurrent

Language-Specific Analysis

bash

umwelten eval run \
  --prompt "Analyze this text in its original language, then provide an English summary" \
  --models "google:gemini-2.0-flash" \
  --file "./spanish-document.pdf" \
  --id "spanish-analysis"

umwelten eval run \
  --prompt "この日本語の文書を分析し、英語で要約してください" \
  --models "google:gemini-2.0-flash" \
  --file "./japanese-text.txt" \
  --id "japanese-analysis"

Translation Services

Document Translation

bash

umwelten eval batch \
  --prompt "Translate this document to English while preserving formatting and context" \
  --models "google:gemini-2.0-flash,openrouter:openai/gpt-4o" \
  --id "document-translation" \
  --directory "./foreign-documents" \
  --file-pattern "*.{pdf,txt}" \
  --concurrent

Multi-Target Translation

bash

umwelten eval run \
  --prompt "Translate this English text to Spanish, French, German, and Japanese" \
  --models "google:gemini-2.0-flash" \
  --id "multi-target-translation" \
  --schema "spanish, french, german, japanese"

Context-Aware Translation

bash

umwelten eval run \
  --prompt "Translate this technical document to English, preserving technical terminology and context" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --file "./technical-manual-german.pdf" \
  --timeout 60000 \
  --id "technical-translation"

Cross-Lingual Content Analysis

Sentiment Analysis Across Languages

bash

umwelten eval batch \
  --prompt "Analyze the sentiment of this content regardless of language and provide reasoning" \
  --models "google:gemini-2.0-flash" \
  --id "multilingual-sentiment" \
  --directory "./reviews-international" \
  --file-pattern "*.txt" \
  --schema "sentiment, confidence int: 1-10, detected_language, reasoning" \
  --concurrent

Cultural Context Analysis

bash

umwelten eval batch \
  --prompt "Analyze this content for cultural references, idioms, and context that might not translate directly" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --id "cultural-analysis" \
  --directory "./cultural-content" \
  --file-pattern "*.{txt,pdf}" \
  --schema "cultural_elements array, idioms array, translation_challenges array, context_notes" \
  --concurrent

Language-Specific Use Cases

International Business Documents

bash

# Process contracts in multiple languages
umwelten eval batch \
  --prompt "Extract key terms, obligations, and dates from this business document" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --id "international-contracts" \
  --directory "./contracts" \
  --file-pattern "*.pdf" \
  --schema "language, parties array, key_terms array, obligations array, important_dates array" \
  --concurrent

Academic Research Processing

bash

# Analyze research papers in various languages
umwelten eval batch \
  --prompt "Extract methodology, findings, and conclusions from this academic paper" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --id "multilingual-research" \
  --directory "./academic-papers" \
  --file-pattern "*.pdf" \
  --schema "language, title, methodology, key_findings array, conclusions array, citations int" \
  --concurrent

News and Media Analysis

bash

# Analyze international news articles
umwelten eval batch \
  --prompt "Summarize this news article and identify key events, people, and implications" \
  --models "google:gemini-2.0-flash" \
  --id "international-news" \
  --directory "./news-articles" \
  --file-pattern "*.{txt,pdf}" \
  --schema "language, headline, key_events array, people_mentioned array, implications array" \
  --concurrent

Advanced Multilingual Workflows

Translation Quality Assessment

bash

umwelten eval run \
  --prompt "Compare these two translations of the same source text and assess quality, accuracy, and fluency" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --file "./translation-comparison.txt" \
  --schema "better_translation, accuracy_score int: 1-10, fluency_score int: 1-10, issues array" \
  --id "translation-qa"

Code Comment Translation

bash

umwelten eval batch \
  --prompt "Translate code comments to English while preserving technical accuracy" \
  --models "ollama:codestral:latest,google:gemini-2.0-flash" \
  --id "code-translation" \
  --directory "./international-code" \
  --file-pattern "*.{py,js,java,cpp}" \
  --concurrent

Multilingual Customer Support

bash

umwelten eval batch \
  --prompt "Analyze this customer inquiry and provide response in the same language" \
  --models "google:gemini-2.0-flash" \
  --id "multilingual-support" \
  --directory "./customer-inquiries" \
  --file-pattern "*.txt" \
  --schema "detected_language, inquiry_type, urgency int: 1-5, suggested_response" \
  --concurrent

Language-Specific Model Performance

Model Comparison by Language

bash

# Test different models on the same multilingual content
umwelten eval run \
  --prompt "Summarize this content in English" \
  --models "google:gemini-2.0-flash,google:gemini-2.5-pro-exp-03-25,openrouter:openai/gpt-4o" \
  --file "./chinese-article.txt" \
  --id "chinese-model-comparison" \
  --concurrent

Language Coverage Testing

bash

# Test model performance across different languages
for lang in spanish french german japanese chinese arabic; do
  umwelten eval run \
    --prompt "Analyze this ${lang} text and provide insights in English" \
    --models "google:gemini-2.0-flash" \
    --file "./${lang}-sample.txt" \
    --id "${lang}-processing-test"
done

Structured Multilingual Output

Consistent Cross-Language Schema

bash

umwelten eval batch \
  --prompt "Extract structured information from this document regardless of language" \
  --models "google:gemini-2.0-flash" \
  --id "multilingual-extraction" \
  --directory "./international-documents" \
  --file-pattern "*.pdf" \
  --schema "source_language, title, summary, key_points array, document_type, confidence int: 1-10" \
  --concurrent

Language Metadata Enrichment

bash

umwelten eval batch \
  --prompt "Analyze this content and provide detailed language metadata" \
  --models "google:gemini-2.0-flash" \
  --id "language-metadata" \
  --directory "./texts" \
  --file-pattern "*.txt" \
  --schema "primary_language, secondary_languages array, dialect, formality_level int: 1-5, technical_level int: 1-5" \
  --concurrent

Interactive Multilingual Chat

Language-Adaptive Chat

bash

# Start multilingual chat session
umwelten chat --provider google --model gemini-2.0-flash

# Within chat:
> "Please respond in Spanish: ¿Cómo está el clima hoy?"
> "Now switch to French: Comment allez-vous?"
> "Respond in Japanese: こんにちは、元気ですか？"

Translation Chat Assistant

bash

umwelten chat \
  --provider google \
  --model gemini-2.0-flash \
  --system "You are a professional translator. Help users translate text between languages while preserving meaning and context."

Performance and Cost Optimization

Model Selection for Languages

Language Family	Recommended Model	Notes
European Languages	google:gemini-2.0-flash	Excellent coverage, cost-effective
East Asian Languages	google:gemini-2.5-pro	Better handling of complex scripts
Arabic/Hebrew	google:gemini-2.0-flash	Strong RTL language support
Programming Languages	ollama:codestral:latest	Code context preservation
Technical Translation	openrouter:openai/gpt-4o	Highest accuracy for specialized content

Batch Processing by Language

bash

# Group by language family for efficiency
umwelten eval batch \
  --prompt "Process European language content" \
  --models "google:gemini-2.0-flash" \
  --directory "./european-texts" \
  --file-pattern "*{en,es,fr,de,it}*.txt" \
  --concurrent

umwelten eval batch \
  --prompt "Process Asian language content" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --directory "./asian-texts" \
  --file-pattern "*{zh,ja,ko}*.txt" \
  --concurrent

Real-World Examples

International Legal Documents

bash

umwelten eval batch \
  --prompt "Extract legal obligations and key clauses from this document" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --id "legal-multilingual" \
  --directory "./legal-docs" \
  --file-pattern "*.pdf" \
  --schema "language, document_type, parties array, obligations array, key_clauses array, jurisdiction" \
  --timeout 90000 \
  --concurrent

E-commerce Product Descriptions

bash

umwelten eval batch \
  --prompt "Translate this product description to English and extract key product features" \
  --models "google:gemini-2.0-flash" \
  --id "product-translation" \
  --directory "./product-descriptions" \
  --file-pattern "*.txt" \
  --schema "source_language, translated_title, translated_description, features array, price_mentioned bool" \
  --concurrent

bash

umwelten eval batch \
  --prompt "Analyze sentiment and extract hashtags from this social media content" \
  --models "google:gemini-2.0-flash" \
  --id "social-multilingual" \
  --directory "./social-posts" \
  --file-pattern "*.txt" \
  --schema "language, sentiment, hashtags array, mentions array, engagement_indicators array" \
  --concurrent

Quality Assurance

Translation Validation

bash

umwelten eval run \
  --prompt "Check this translation for accuracy, fluency, and cultural appropriateness" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --file "./translation-to-check.txt" \
  --schema "accuracy_score int: 1-10, fluency_score int: 1-10, cultural_score int: 1-10, issues array" \
  --id "translation-validation"

Cross-Language Consistency

bash

umwelten eval batch \
  --prompt "Ensure consistent terminology and style across these translated documents" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --id "consistency-check" \
  --directory "./translated-series" \
  --file-pattern "*.txt" \
  --schema "consistent_terminology bool, style_consistency int: 1-10, inconsistencies array" \
  --concurrent

Output Analysis and Reporting

Language Distribution Analysis

bash

# Generate language statistics
umwelten eval report --id multilingual-extraction --format json | jq '
  .results | 
  group_by(.response.source_language) | 
  map({language: .[0].response.source_language, count: length}) | 
  sort_by(.count) | 
  reverse
'

Translation Quality Metrics

bash

# Analyze translation quality across different models
umwelten eval report --id translation-comparison --format csv --output translation-metrics.csv

Cross-Language Report Generation

bash

# Generate reports in multiple languages
umwelten eval report --id multilingual-analysis --format markdown --output report-en.md
umwelten eval run \
  --prompt "Translate this English report to Spanish while preserving structure" \
  --models "google:gemini-2.0-flash" \
  --file "./report-en.md" \
  --id "spanish-report"

Best Practices

Language Handling

Language Detection: Always detect language before processing
Model Selection: Choose models with strong multilingual capabilities
Context Preservation: Maintain cultural and contextual nuances
Quality Control: Validate translations and cross-language consistency

Performance Optimization

Batch by Language: Group similar languages when possible
Model Efficiency: Use cost-effective models for simple tasks
Timeout Management: Allow extra time for complex translations
Resource Planning: Account for increased processing time

Quality Assurance

Native Review: Have native speakers review critical translations
Consistency Checks: Ensure terminology consistency across documents
Cultural Sensitivity: Be aware of cultural context and appropriateness
Accuracy Validation: Cross-check important translations

Troubleshooting

Common Issues

Character Encoding: Ensure proper UTF-8 encoding for all text files
Right-to-Left Languages: Some models handle RTL languages differently
Mixed Scripts: Documents with multiple writing systems may need special handling
Cultural Context: Idiomatic expressions may not translate directly

Debug Commands

bash

# Test language detection
umwelten run --models "google:gemini-2.0-flash" "Detect the language: Bonjour, comment allez-vous?"

# Test basic translation  
umwelten run --models "google:gemini-2.0-flash" "Translate to English: Hola, ¿cómo estás?"

# Check file encoding
file -i ./multilingual-document.txt

Next Steps

Explore structured output for consistent multilingual data extraction
Try batch processing for large multilingual document collections
See cost analysis for optimizing multilingual processing costs
Learn about model evaluation for language-specific model selection

Multi-Language Processing ​

Overview ​

Language Detection and Analysis ​

Automatic Language Detection ​

Language-Specific Analysis ​

Translation Services ​

Document Translation ​

Multi-Target Translation ​

Context-Aware Translation ​

Cross-Lingual Content Analysis ​

Sentiment Analysis Across Languages ​

Cultural Context Analysis ​

Language-Specific Use Cases ​

International Business Documents ​

Academic Research Processing ​

News and Media Analysis ​

Advanced Multilingual Workflows ​

Translation Quality Assessment ​

Code Comment Translation ​

Multilingual Customer Support ​

Language-Specific Model Performance ​

Model Comparison by Language ​

Language Coverage Testing ​

Structured Multilingual Output ​

Consistent Cross-Language Schema ​

Language Metadata Enrichment ​

Interactive Multilingual Chat ​

Language-Adaptive Chat ​

Translation Chat Assistant ​

Performance and Cost Optimization ​

Model Selection for Languages ​

Batch Processing by Language ​

Real-World Examples ​

International Legal Documents ​

E-commerce Product Descriptions ​

Social Media Content Analysis ​

Quality Assurance ​

Translation Validation ​

Cross-Language Consistency ​

Output Analysis and Reporting ​

Language Distribution Analysis ​

Translation Quality Metrics ​

Cross-Language Report Generation ​

Best Practices ​

Language Handling ​

Performance Optimization ​

Quality Assurance ​

Troubleshooting ​

Common Issues ​

Debug Commands ​

Next Steps ​

Multi-Language Processing

Overview

Language Detection and Analysis

Automatic Language Detection

Language-Specific Analysis

Translation Services

Document Translation

Multi-Target Translation

Context-Aware Translation

Cross-Lingual Content Analysis

Sentiment Analysis Across Languages

Cultural Context Analysis

Language-Specific Use Cases

International Business Documents

Academic Research Processing

News and Media Analysis

Advanced Multilingual Workflows

Translation Quality Assessment

Code Comment Translation

Multilingual Customer Support

Language-Specific Model Performance

Model Comparison by Language

Language Coverage Testing

Structured Multilingual Output

Consistent Cross-Language Schema

Language Metadata Enrichment

Interactive Multilingual Chat

Language-Adaptive Chat

Translation Chat Assistant

Performance and Cost Optimization

Model Selection for Languages

Batch Processing by Language

Real-World Examples

International Legal Documents

E-commerce Product Descriptions

Social Media Content Analysis

Quality Assurance

Translation Validation

Cross-Language Consistency

Output Analysis and Reporting

Language Distribution Analysis

Translation Quality Metrics

Cross-Language Report Generation

Best Practices

Language Handling

Performance Optimization

Quality Assurance

Troubleshooting

Common Issues

Debug Commands

Next Steps