Skip to content

Multi-Language Processing

Process content in multiple languages with language detection, translation, and cross-lingual analysis. This example demonstrates working with international content using AI models with strong multilingual capabilities.

Overview

Multi-language processing enables you to work with content in various languages, perform translations, detect languages automatically, and conduct cross-lingual analysis. Modern AI models excel at understanding and generating content across dozens of languages.

Language Detection and Analysis

Automatic Language Detection

bash
umwelten eval batch \
  --prompt "Detect the language of this content and provide a confidence score" \
  --models "google:gemini-2.0-flash,ollama:gemma3:12b" \
  --id "language-detection" \
  --directory "./multilingual-content" \
  --file-pattern "*.{txt,pdf}" \
  --schema "detected_language, confidence int: 1-10, secondary_languages array" \
  --concurrent

Language-Specific Analysis

bash
umwelten eval run \
  --prompt "Analyze this text in its original language, then provide an English summary" \
  --models "google:gemini-2.0-flash" \
  --file "./spanish-document.pdf" \
  --id "spanish-analysis"

umwelten eval run \
  --prompt "この日本語の文書を分析し、英語で要約してください" \
  --models "google:gemini-2.0-flash" \
  --file "./japanese-text.txt" \
  --id "japanese-analysis"

Translation Services

Document Translation

bash
umwelten eval batch \
  --prompt "Translate this document to English while preserving formatting and context" \
  --models "google:gemini-2.0-flash,openrouter:openai/gpt-4o" \
  --id "document-translation" \
  --directory "./foreign-documents" \
  --file-pattern "*.{pdf,txt}" \
  --concurrent

Multi-Target Translation

bash
umwelten eval run \
  --prompt "Translate this English text to Spanish, French, German, and Japanese" \
  --models "google:gemini-2.0-flash" \
  --id "multi-target-translation" \
  --schema "spanish, french, german, japanese"

Context-Aware Translation

bash
umwelten eval run \
  --prompt "Translate this technical document to English, preserving technical terminology and context" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --file "./technical-manual-german.pdf" \
  --timeout 60000 \
  --id "technical-translation"

Cross-Lingual Content Analysis

Sentiment Analysis Across Languages

bash
umwelten eval batch \
  --prompt "Analyze the sentiment of this content regardless of language and provide reasoning" \
  --models "google:gemini-2.0-flash" \
  --id "multilingual-sentiment" \
  --directory "./reviews-international" \
  --file-pattern "*.txt" \
  --schema "sentiment, confidence int: 1-10, detected_language, reasoning" \
  --concurrent

Cultural Context Analysis

bash
umwelten eval batch \
  --prompt "Analyze this content for cultural references, idioms, and context that might not translate directly" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --id "cultural-analysis" \
  --directory "./cultural-content" \
  --file-pattern "*.{txt,pdf}" \
  --schema "cultural_elements array, idioms array, translation_challenges array, context_notes" \
  --concurrent

Language-Specific Use Cases

International Business Documents

bash
# Process contracts in multiple languages
umwelten eval batch \
  --prompt "Extract key terms, obligations, and dates from this business document" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --id "international-contracts" \
  --directory "./contracts" \
  --file-pattern "*.pdf" \
  --schema "language, parties array, key_terms array, obligations array, important_dates array" \
  --concurrent

Academic Research Processing

bash
# Analyze research papers in various languages
umwelten eval batch \
  --prompt "Extract methodology, findings, and conclusions from this academic paper" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --id "multilingual-research" \
  --directory "./academic-papers" \
  --file-pattern "*.pdf" \
  --schema "language, title, methodology, key_findings array, conclusions array, citations int" \
  --concurrent

News and Media Analysis

bash
# Analyze international news articles
umwelten eval batch \
  --prompt "Summarize this news article and identify key events, people, and implications" \
  --models "google:gemini-2.0-flash" \
  --id "international-news" \
  --directory "./news-articles" \
  --file-pattern "*.{txt,pdf}" \
  --schema "language, headline, key_events array, people_mentioned array, implications array" \
  --concurrent

Advanced Multilingual Workflows

Translation Quality Assessment

bash
umwelten eval run \
  --prompt "Compare these two translations of the same source text and assess quality, accuracy, and fluency" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --file "./translation-comparison.txt" \
  --schema "better_translation, accuracy_score int: 1-10, fluency_score int: 1-10, issues array" \
  --id "translation-qa"

Code Comment Translation

bash
umwelten eval batch \
  --prompt "Translate code comments to English while preserving technical accuracy" \
  --models "ollama:codestral:latest,google:gemini-2.0-flash" \
  --id "code-translation" \
  --directory "./international-code" \
  --file-pattern "*.{py,js,java,cpp}" \
  --concurrent

Multilingual Customer Support

bash
umwelten eval batch \
  --prompt "Analyze this customer inquiry and provide response in the same language" \
  --models "google:gemini-2.0-flash" \
  --id "multilingual-support" \
  --directory "./customer-inquiries" \
  --file-pattern "*.txt" \
  --schema "detected_language, inquiry_type, urgency int: 1-5, suggested_response" \
  --concurrent

Language-Specific Model Performance

Model Comparison by Language

bash
# Test different models on the same multilingual content
umwelten eval run \
  --prompt "Summarize this content in English" \
  --models "google:gemini-2.0-flash,google:gemini-2.5-pro-exp-03-25,openrouter:openai/gpt-4o" \
  --file "./chinese-article.txt" \
  --id "chinese-model-comparison" \
  --concurrent

Language Coverage Testing

bash
# Test model performance across different languages
for lang in spanish french german japanese chinese arabic; do
  umwelten eval run \
    --prompt "Analyze this ${lang} text and provide insights in English" \
    --models "google:gemini-2.0-flash" \
    --file "./${lang}-sample.txt" \
    --id "${lang}-processing-test"
done

Structured Multilingual Output

Consistent Cross-Language Schema

bash
umwelten eval batch \
  --prompt "Extract structured information from this document regardless of language" \
  --models "google:gemini-2.0-flash" \
  --id "multilingual-extraction" \
  --directory "./international-documents" \
  --file-pattern "*.pdf" \
  --schema "source_language, title, summary, key_points array, document_type, confidence int: 1-10" \
  --concurrent

Language Metadata Enrichment

bash
umwelten eval batch \
  --prompt "Analyze this content and provide detailed language metadata" \
  --models "google:gemini-2.0-flash" \
  --id "language-metadata" \
  --directory "./texts" \
  --file-pattern "*.txt" \
  --schema "primary_language, secondary_languages array, dialect, formality_level int: 1-5, technical_level int: 1-5" \
  --concurrent

Interactive Multilingual Chat

Language-Adaptive Chat

bash
# Start multilingual chat session
umwelten chat --provider google --model gemini-2.0-flash

# Within chat:
> "Please respond in Spanish: ¿Cómo está el clima hoy?"
> "Now switch to French: Comment allez-vous?"
> "Respond in Japanese: こんにちは、元気ですか?"

Translation Chat Assistant

bash
umwelten chat \
  --provider google \
  --model gemini-2.0-flash \
  --system "You are a professional translator. Help users translate text between languages while preserving meaning and context."

Performance and Cost Optimization

Model Selection for Languages

Language FamilyRecommended ModelNotes
European Languagesgoogle:gemini-2.0-flashExcellent coverage, cost-effective
East Asian Languagesgoogle:gemini-2.5-proBetter handling of complex scripts
Arabic/Hebrewgoogle:gemini-2.0-flashStrong RTL language support
Programming Languagesollama:codestral:latestCode context preservation
Technical Translationopenrouter:openai/gpt-4oHighest accuracy for specialized content

Batch Processing by Language

bash
# Group by language family for efficiency
umwelten eval batch \
  --prompt "Process European language content" \
  --models "google:gemini-2.0-flash" \
  --directory "./european-texts" \
  --file-pattern "*{en,es,fr,de,it}*.txt" \
  --concurrent

umwelten eval batch \
  --prompt "Process Asian language content" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --directory "./asian-texts" \
  --file-pattern "*{zh,ja,ko}*.txt" \
  --concurrent

Real-World Examples

bash
umwelten eval batch \
  --prompt "Extract legal obligations and key clauses from this document" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --id "legal-multilingual" \
  --directory "./legal-docs" \
  --file-pattern "*.pdf" \
  --schema "language, document_type, parties array, obligations array, key_clauses array, jurisdiction" \
  --timeout 90000 \
  --concurrent

E-commerce Product Descriptions

bash
umwelten eval batch \
  --prompt "Translate this product description to English and extract key product features" \
  --models "google:gemini-2.0-flash" \
  --id "product-translation" \
  --directory "./product-descriptions" \
  --file-pattern "*.txt" \
  --schema "source_language, translated_title, translated_description, features array, price_mentioned bool" \
  --concurrent

Social Media Content Analysis

bash
umwelten eval batch \
  --prompt "Analyze sentiment and extract hashtags from this social media content" \
  --models "google:gemini-2.0-flash" \
  --id "social-multilingual" \
  --directory "./social-posts" \
  --file-pattern "*.txt" \
  --schema "language, sentiment, hashtags array, mentions array, engagement_indicators array" \
  --concurrent

Quality Assurance

Translation Validation

bash
umwelten eval run \
  --prompt "Check this translation for accuracy, fluency, and cultural appropriateness" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --file "./translation-to-check.txt" \
  --schema "accuracy_score int: 1-10, fluency_score int: 1-10, cultural_score int: 1-10, issues array" \
  --id "translation-validation"

Cross-Language Consistency

bash
umwelten eval batch \
  --prompt "Ensure consistent terminology and style across these translated documents" \
  --models "google:gemini-2.5-pro-exp-03-25" \
  --id "consistency-check" \
  --directory "./translated-series" \
  --file-pattern "*.txt" \
  --schema "consistent_terminology bool, style_consistency int: 1-10, inconsistencies array" \
  --concurrent

Output Analysis and Reporting

Language Distribution Analysis

bash
# Generate language statistics
umwelten eval report --id multilingual-extraction --format json | jq '
  .results | 
  group_by(.response.source_language) | 
  map({language: .[0].response.source_language, count: length}) | 
  sort_by(.count) | 
  reverse
'

Translation Quality Metrics

bash
# Analyze translation quality across different models
umwelten eval report --id translation-comparison --format csv --output translation-metrics.csv

Cross-Language Report Generation

bash
# Generate reports in multiple languages
umwelten eval report --id multilingual-analysis --format markdown --output report-en.md
umwelten eval run \
  --prompt "Translate this English report to Spanish while preserving structure" \
  --models "google:gemini-2.0-flash" \
  --file "./report-en.md" \
  --id "spanish-report"

Best Practices

Language Handling

  • Language Detection: Always detect language before processing
  • Model Selection: Choose models with strong multilingual capabilities
  • Context Preservation: Maintain cultural and contextual nuances
  • Quality Control: Validate translations and cross-language consistency

Performance Optimization

  • Batch by Language: Group similar languages when possible
  • Model Efficiency: Use cost-effective models for simple tasks
  • Timeout Management: Allow extra time for complex translations
  • Resource Planning: Account for increased processing time

Quality Assurance

  • Native Review: Have native speakers review critical translations
  • Consistency Checks: Ensure terminology consistency across documents
  • Cultural Sensitivity: Be aware of cultural context and appropriateness
  • Accuracy Validation: Cross-check important translations

Troubleshooting

Common Issues

  1. Character Encoding: Ensure proper UTF-8 encoding for all text files
  2. Right-to-Left Languages: Some models handle RTL languages differently
  3. Mixed Scripts: Documents with multiple writing systems may need special handling
  4. Cultural Context: Idiomatic expressions may not translate directly

Debug Commands

bash
# Test language detection
umwelten run --models "google:gemini-2.0-flash" "Detect the language: Bonjour, comment allez-vous?"

# Test basic translation  
umwelten run --models "google:gemini-2.0-flash" "Translate to English: Hola, ¿cómo estás?"

# Check file encoding
file -i ./multilingual-document.txt

Next Steps

Released under the MIT License.