LTI Colloquium: COMET: a Neural Framework for State-of-the-art MT Evaluation

Speaker: Alon Lavie @ Unbabel / Carnegie Mellon University Website: https://unbabel.com/research/people/a... Title: COMET: a Neural Framework for State-of-the-art MT Evaluation Abstract: Delivery of high-quality Machine Translation (MT) is only possible with reliable evaluation metrics to inform modelling and system development. The translation workflows we develop at Unbabel require highly-adapted MT systems which are regularly retrained to continuously deliver customer-specific, accurate translations. Unfortunately, with current state-of-the-art neural MT systems, traditional metrics such as BLEU and METEOR have been shown to no longer correlate well with human judgments, and in particular, they poorly distinguish between fine-grained accuracy distinctions of top performing MT models. This can result in misinformed MT development decisions that affect the quality of translations for our customers. To address this challenge, we recently developed COMET - a new neural-based framework for training automated MT evaluation models that are demonstrated to exhibit new state-of-the-art levels of correlation with human judgments. Our framework leverages recent breakthroughs in cross-lingual pretrained language modeling resulting in highly multilingual and adaptable MT evaluation models that exploit information from both the source input and a target-language reference translation in order to more accurately predict MT quality. We showcase our framework by training and evaluating COMET models for three different types of human judgments: Direct Assessments, Human mediated Translation Edit Rate (HTER) and Multidimensional Quality Metrics (MQM). Our models achieve new state-of-the-art performance on the WMT 2019 and 2020 Metrics shared tasks and are sensitive to fine distinctions typical of high-performing MT systems. The COMET framework and our top-performing pretrained evaluation models are freely available open-source. In this presentation we present an overview of the COMET framework and highlight its capabilities through assessments of the COMET models we have trained, their correlation with human judgments of translation quality, and their utility in practice for evaluating and contrasting MT models developed at Unbabel. #NLProc #MachineTranslation

LTI Colloquium: What is wrong with my model? Detection and analysis of bugs in NLP models

LTI Colloquium: What is wrong with my model? Detection and analysis of bugs in NLP models

RI Seminar: Shuran Song : Learning Meets Gravity: Robots that Learn to Embrace Dynamics from Data

RI Seminar: Shuran Song : Learning Meets Gravity: Robots that Learn to Embrace Dynamics from Data

RI Seminar: Jitendra Malik : Robot Learning, With Inspiration From Child Development

RI Seminar: Jitendra Malik : Robot Learning, With Inspiration From Child Development

Yann LeCun: World Models: Enabling the next AI revolution

Yann LeCun: World Models: Enabling the next AI revolution

AlphaFold - The Most Useful Thing AI Has Ever Done

AlphaFold - The Most Useful Thing AI Has Ever Done

Inside the Mind of Anthropic CEO Dario Amodei | The Circuit | Extended Interview

Inside the Mind of Anthropic CEO Dario Amodei | The Circuit | Extended Interview

The French Do Not Care About Work

The French Do Not Care About Work

Katerina Fragkiadaki : Teruko Yata Memorial Lecture in Robotics

Katerina Fragkiadaki : Teruko Yata Memorial Lecture in Robotics

LTI Special Seminar by Yi Wu

LTI Special Seminar by Yi Wu

Conan O’Brien Mocks Trump At Harvard Commencement | Crowd Erupts During Viral Speech

Conan O’Brien Mocks Trump At Harvard Commencement | Crowd Erupts During Viral Speech

The Uncomfortable Truth About AI “Reasoning” | World Science Festival

The Uncomfortable Truth About AI “Reasoning” | World Science Festival

November 15th LTI Colloquium Speaker - Yu Zhang

November 15th LTI Colloquium Speaker - Yu Zhang

January 31, 2025 LTI Colloquium Speaker: Daphne Ippolito

January 31, 2025 LTI Colloquium Speaker: Daphne Ippolito

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

What is the BLEU metric?

What is the BLEU metric?

RI Seminar: Shubham Tulsiani : Towards Reconstructing Any Object in 3D

RI Seminar: Shubham Tulsiani : Towards Reconstructing Any Object in 3D

Skill Issue: Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI

Skill Issue: Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI

September 13th LTI Colloquium Speaker - Lei Li

September 13th LTI Colloquium Speaker - Lei Li

[CMU VASC Seminar] Foundation Models for Robotic Manipulation: Opportunities and Challenges

[CMU VASC Seminar] Foundation Models for Robotic Manipulation: Opportunities and Challenges

Systems Thinking for Leaders: Designing Solutions That Work

Systems Thinking for Leaders: Designing Solutions That Work