MT Summit XI, Copenhagen - 11 September 2007

Workshop organised by the ELRA Evaluation Committee
- Gregor Thurmair, Linguatec
- Khalid Choukri, ELDA
- Bente Maegaard, University of Copenhagen
The purpose of this workshop was to discuss automatic evaluation procedures in MT. Among the discussion points were:
- What do the scores really measure?
- What kind of implicit assumptions do they make?
- What kind initial effort do they require (e.g.: pre-translate test corpus)?
- What kind of resources do they need (e.g.: third party grammars)?
- Are they biased towards specific MT technologies?
- What kind of diagnostic support can they give? (where to improve the system)
- What kind of evaluation criteria (e.g. related to the FEMTI framework) do they support (adequacy, fluency, …)
The objective of the workshop was to have a better understanding of the strengths and limitations of the respective approaches, and perhaps make steps towards defining a common methodology for MT output evaluation.
Programme
(Click the title to view/download the presentation)
9.00 Welcome and introduction
9.20 The place of automatic evaluation metrics in external quality models for machine translation (pdf , 104 KB, 19 slides)
Andrei Popescu-Belis, University of Geneva
10.00 Evaluating Evaluation --- Lessons from the WMT’07 Shared Task (pdf , 420 KB, 38 slides)
Philipp Koehn, University of Edinburgh
10.30 Coffee break
11.00 Investigating Why BLEU Penalizes Non-Statistical Systems (pdf , 261 KB, 10 slides)
Eduard Hovy, University of Southern California
11.30 Edit distance as an evaluation metric (pdf , 997 KB, 34 slides)
Christopher Cieri, Linguistic Data Consortium
12.00 Experience and conclusions from the CESTA evaluation project (pdf , 102 KB, 22 slides)
Olivier Hamon, ELDA
12.30 Lunch
13.30 Automatic Evaluation in MT system production (pdf , 147 KB, 28 slides)
Gregor Thurmair, Linguatec
14.00 Sensitivity of performance-based and proximity-based models for MT evaluation (pdf , 144 KB, 22 slides)
Bogdan Babych, Univ. Leeds
14.30 Automatic & human Evaluations of MT in the framework of a speech to speech communication (pdf , 178 KB, 33 slides)
Khalid Choukri, ELDA
15.00 Coffee break
15.30 Discussion and conclusions
17.00 Close