SRT Translator Architecture¶

This document describes the high-level architecture of the SRT Translator project.

Overview¶

The SRT Translator is designed with a clean separation between: - Core Translation Engine: Pure functions that process SRT files - Interface Layers: CLI and GUI that collect configuration and drive the core - Evaluation System: Post-processing analysis of translation quality

Core Architecture¶

Translation Pipeline¶

CLI/GUI → TranslationConfig → Core Engine → Batch Output
     ↑           ↑              ↑            ↑
  Collects    Contains      Processes    Creates
  params      ALL params    SRT files    batch dir

Configuration Flow¶

Entry points (CLI/GUI) collect user preferences and environment data
TranslationConfig object contains all runtime configuration
Core engine reads only from TranslationConfig - no global state
Batch output includes frozen configuration snapshot in ai_config.json

Evaluation Subsystem (v1.0)¶

Purpose¶

The evaluation subsystem provides post-translation quality analysis and reporting. It operates independently of the translation process, reading only batch artifacts to ensure reproducibility.

Input Contract¶

The evaluator reads only files on disk from the batch directory:

REQUIRED inputs: - batch_root/artifacts/ai_config.json - Schema-versioned configuration snapshot - batch_root/originals/ - Source SRT files - batch_root/<lang>/ - Translated SRT files for each target language (e.g., batch_root/es/, batch_root/fr/)

OPTIONAL inputs (inside ai_config.json): - dnt_terms - Batch-wide "Do Not Translate" terms list - termbase - Per-language termbase mappings (source → target)

Required vs Optional Policy¶

Required inputs missing/invalid → ERROR and STOP evaluation. No fallbacks.
Optional inputs missing/empty → INFO log and CONTINUE evaluation. Report coverage explicitly.

Data Normalization¶

The evaluator normalizes the ai_config.json shape to internal formats:

# Input (ai_config.json)
{
  "dnt_terms": ["Operating Plan", "Module"],
  "termbase": {
    "es": {"Operating Plan": "Plan Operativo"},
    "fr": {"Module": "Module"}
  }
}

# Normalized (internal)
{
  "dnt_terms": ["Operating Plan", "Module"],
  "termbase": {
    "es": [{"source": "Operating Plan", "target": "Plan Operativo"}],
    "fr": [{"source": "Module", "target": "Module"}]
  }
}

Artifacts for Audit Only¶

Any files under artifacts/<lang>/ (e.g., dnt_summary.json, termbase_summary.json) are derived outputs for auditing only. The evaluator never requires nor prefers these files as inputs.

Reporting Requirements¶

The eval_report.json MUST include: - config_source: "ai_config.json" - Source of configuration data - dnt_coverage: "present"|"none" - Whether DNT terms were available - termbase_coverage: "full"|"partial"|"none" - Coverage across target languages - termbase_entry_counts: {"es": 5, "fr": 3} - Per-language entry counts

Logging Requirements¶

All evaluation logs (INFO/ERROR) MUST appear in: - Console output - For real-time monitoring - Batch log file - For permanent record and debugging

This ensures that coverage decisions and evaluation results are captured in the same log file as the translation process.

Independence from TranslationConfig¶

The evaluator MUST NOT import or depend on TranslationConfig objects. It reads only batch artifacts to ensure: - Reproducibility: Evaluation can be re-run on old batches - Decoupling: Changes to translation config don't affect evaluation - Auditability: Complete configuration snapshot preserved in batch

Quality Assurance¶

Translation Quality¶

Character-per-second (CPS) analysis
Placeholder consistency checks
DNT term coverage analysis
Termbase collision detection

Batch Validation¶

File structure verification
Configuration completeness checks
Artifact generation validation

Security Considerations¶

Configuration Isolation¶

CLI reads API key from environment or .env file
GUI stores API key using OS-native secure storage (QSettings)
All translation configuration frozen in batch artifacts
No external API calls during evaluation

Data Handling¶

SRT content processed in memory during translation
Translated files written to batch output directory
API key persisted locally (not included in batch artifacts)