Machine translation quality estimation
Machine translation quality estimation (MTQE or QE) is the task of automatically estimating the quality of a machine translation, without using a reference translation.
This research area within machine translation and machine learning gave birth to the quality prediction models used in production systems like ModelFront.
Evaluation vs. estimation
Quality evaluation metrics, like BLEU, are for comparing machine translation systems. They basically work like average edit distance.
They are only intended to be directionally correct, not accurate at the sentence level — BLEU doesn't even consider the source text.
But to calculate a score, they require (human) reference translations.
So they cannot be used for new content in production, for which there are no human translations yet.
Quality estimation scores a translation, based on the original and the translation. No reference translation. So it can be used for new input.
Estimation vs. prediction
A raw quality estimation model outputs only a score, typically 0.0 to 1.0, that estimates the quality of a translation.
But to convert scores like 0.9 (90%) into decisions, thresholds need to be carefully chosen, for each language and content type.
So in practice, scores were not usable in production workflows by translation buyers, translation teams or professional human translators. They often confuse it with fuzzy match scores.
Quality prediction actually outputs a boolean flag, ✓ or ✗.
Timeline
The evolution of quality estimation models mirrored the evolution of machine translation and machine learning models in general.
Quality estimation research began at the Workshop on Statistical Machine Translation (WMT) in 2013, where researchers established defined the task.
They competed using the feature engineering approaches that were state of the art at the time, at the end of the statistical machine translation era.
•─────────────•─────────────•─────────────•───────
| | | |
2013 2016 2019 2020
QuEst QuEst++ OpenKiwi ModelFront
2013 QuEst (opens in a new tab) | 2016 QuEst++ (opens in a new tab) | 2019 OpenKiwi (opens in a new tab) | 2020 ModelFront (opens in a new tab) |
---|---|---|---|
Open-source library (opens in a new tab) | Open-source library (opens in a new tab) | Open-source library (opens in a new tab) | Production system |
Feature engineering | Feature engineering | Deep learning | Multilingual large language models (LLMs) |
Python scripts and Java program | Python scripts and Java program | Python framework | API and integrations |
Score 0.0-1.0 | Score 0.0-1.0 | Score 0.0-1.0 | Flag ✓ or ✗ |
The key early research community was driven by like Lucia Specia, then at the University of Sheffield and Imperial College London, and Radu Soricut from Google Research. Unbabel researchers like Fábio Kepler and André Martins led the development of OpenKiwi.
Learn More
- Quality estimation in A practical guide to quality prediction
- Quality estimation on machinetranslate.org ↗ (opens in a new tab)
- Quality estimation in r/machinetranslation ↗ (opens in a new tab)
- Quality estimation in Slator ↗ (opens in a new tab)
- Quality estimation on Stack Overflow ↗ (opens in a new tab)
- Quality estimation on arXiv ↗ (opens in a new tab)
Join the mission
Are you interested in joining the mission to accelerate human-quality translation?
Browse jobs at ModelFront