Results of the Quality Estimation Shared Task 2023

Jump direclty to

Task 1 Sent-level
Task 1 Word-level
Task 2 Error Span Detection

Note: due to Codalab being unstable during the competition (failing submissions due to congested servers, servers down, etc.), the automatic computation of the predictions did not go as planned. As a result, the leaderboards of the "competition" phases are not representative and should not be considered. Instead,

participants are only listed in tasks and language pairs they have officially declared (form) to the organisers wishing to participate in; Participants who did not fill in the form are thus not considered for the official ranking of the shared task.

for a given language pair, each participant was ranked based on their submission with the highest score (primary metric) for that language pair;

only participants who officially participated in and submitted to all language pairs for a given task were considered for the "Multilingual" ranking. In this case, we retained the highest macro-average score (as reported by our scoring programmes) over all submissions which contain predictions over all the language pairs.

Task 1 -- Sentence-level

(top)

Multilingual (Average over all LPs)


English-German (MQM)	Chinese-English (MQM)


Hebrew-English (MQM)	English-Marathi (DA)


English-Hindi (DA)	English-Tamil (DA)


English-Telegu (DA)	English-Gujarati (DA)

Task 1 -- Word-level

(top)

Multilingual (Average over all LPs)


English-German (MQM)	Chinese-English (MQM)


Hebrew-English (MQM)	English-Marathi (PE)


English-Farsi (PE)

Task 2 -- Error Span Detection

(top)

Multilingual (Average over all LPs)


English-German	Chinese-English


Hebrew-English

Results of the Quality Estimation Shared Task 2023

Task 1 -- Sentence-level

Multilingual (Average over all LPs)

English-German (MQM)

Chinese-English (MQM)

Hebrew-English (MQM)

English-Marathi (DA)

English-Hindi (DA)

English-Tamil (DA)

English-Telegu (DA)

English-Gujarati (DA)

Task 1 -- Word-level

Multilingual (Average over all LPs)

English-German (MQM)

Chinese-English (MQM)

Hebrew-English (MQM)

English-Marathi (PE)

English-Farsi (PE)

Task 2 -- Error Span Detection

Multilingual (Average over all LPs)

English-German

Chinese-English

Hebrew-English