Skip to content

SRT Translator Architecture

This document describes the high-level architecture of the SRT Translator project.

Overview

The SRT Translator is designed with a clean separation between: - Core Translation Engine: Pure functions that process SRT files - Interface Layers: CLI and GUI that collect configuration and drive the core - Evaluation System: Post-processing analysis of translation quality

Core Architecture

Translation Pipeline

CLI/GUI → TranslationConfig → Core Engine → Batch Output
     ↑           ↑              ↑            ↑
  Collects    Contains      Processes    Creates
  params      ALL params    SRT files    batch dir

Configuration Flow

  • Entry points (CLI/GUI) collect user preferences and environment data
  • TranslationConfig object contains all runtime configuration
  • Core engine reads only from TranslationConfig - no global state
  • Batch output includes frozen configuration snapshot in ai_config.json

Evaluation Subsystem (v1.0)

Purpose

The evaluation subsystem provides post-translation quality analysis and reporting. It operates independently of the translation process, reading only batch artifacts to ensure reproducibility.

Input Contract

The evaluator reads only files on disk from the batch directory:

REQUIRED inputs: - batch_root/artifacts/ai_config.json - Schema-versioned configuration snapshot - batch_root/originals/ - Source SRT files - batch_root/<lang>/ - Translated SRT files for each target language (e.g., batch_root/es/, batch_root/fr/)

OPTIONAL inputs (inside ai_config.json): - dnt_terms - Batch-wide "Do Not Translate" terms list - termbase - Per-language termbase mappings (source → target)

Required vs Optional Policy

  • Required inputs missing/invalid → ERROR and STOP evaluation. No fallbacks.
  • Optional inputs missing/empty → INFO log and CONTINUE evaluation. Report coverage explicitly.

Data Normalization

The evaluator normalizes the ai_config.json shape to internal formats:

# Input (ai_config.json)
{
  "dnt_terms": ["Operating Plan", "Module"],
  "termbase": {
    "es": {"Operating Plan": "Plan Operativo"},
    "fr": {"Module": "Module"}
  }
}

# Normalized (internal)
{
  "dnt_terms": ["Operating Plan", "Module"],
  "termbase": {
    "es": [{"source": "Operating Plan", "target": "Plan Operativo"}],
    "fr": [{"source": "Module", "target": "Module"}]
  }
}

Artifacts for Audit Only

Any files under artifacts/<lang>/ (e.g., dnt_summary.json, termbase_summary.json) are derived outputs for auditing only. The evaluator never requires nor prefers these files as inputs.

Reporting Requirements

The eval_report.json MUST include: - config_source: "ai_config.json" - Source of configuration data - dnt_coverage: "present"|"none" - Whether DNT terms were available - termbase_coverage: "full"|"partial"|"none" - Coverage across target languages - termbase_entry_counts: {"es": 5, "fr": 3} - Per-language entry counts

Logging Requirements

All evaluation logs (INFO/ERROR) MUST appear in: - Console output - For real-time monitoring - Batch log file - For permanent record and debugging

This ensures that coverage decisions and evaluation results are captured in the same log file as the translation process.

Independence from TranslationConfig

The evaluator MUST NOT import or depend on TranslationConfig objects. It reads only batch artifacts to ensure: - Reproducibility: Evaluation can be re-run on old batches - Decoupling: Changes to translation config don't affect evaluation - Auditability: Complete configuration snapshot preserved in batch

Quality Assurance

Translation Quality

  • Character-per-second (CPS) analysis
  • Placeholder consistency checks
  • DNT term coverage analysis
  • Termbase collision detection

Batch Validation

  • File structure verification
  • Configuration completeness checks
  • Artifact generation validation

Security Considerations

Configuration Isolation

  • CLI reads API key from environment or .env file
  • GUI stores API key using OS-native secure storage (QSettings)
  • All translation configuration frozen in batch artifacts
  • No external API calls during evaluation

Data Handling

  • SRT content processed in memory during translation
  • Translated files written to batch output directory
  • API key persisted locally (not included in batch artifacts)