Shared Task: Automatic Post-Editing


The 9th round of the APE shared task follows the success of the previous rounds organized from 2015 to 2022. The aim is to examine automatic methods for correcting errors produced by an unknown machine translation (MT) system. This has to be done by exploiting knowledge acquired from human post-edits, which are provided as training material.


The aim of this task is to improve MT output in black-box scenarios, in which the MT system is used "as is" and cannot be modified. From the application point of view, APE components would make it possible to:

Task Description

Similar to last year, this year the language setting focuses on Indian language pair consisting of English --> Marathi. The provided data cover several domains like healthcare, tourism, general/news. In these datasets, the source sentences have been translated into the target language by using a state-of-the-art neural MT system unknown to the participants (in terms of system configuration) and then manually post-edited.

At the training stage, the collected human post-edits have to be used to learn correction rules for the APE systems. At the test stage they will be used for system evaluation with automatic metrics (TER, BLEU, and chrF).


Compared to the previous round, the main differences are:


Training, development and test data consist in (source, target, post-edit) triplets. The source sentences in English come from the healthcare, tourism, and general/news domains. The target sentences are automatic translations to target language/s. The post-edits are human revisions of automatic translations.

To download the data, click on the links in the table below:

Language pair Data Additional Resource
English --> Marathi train, dev#,test Synthetic training data+

+: This synthetic training data is prepared as a part of the 2022 APE shared task. The data is created by taking a parallel corpus, where the source data is translated using an MT system, and the references are considered as post-edits.

#: The training and development data is the same as the one used in 2022 APE shared task

Participants are allowed to use any additional data for systems training.

Data Citation

Please cite the following paper if you use the datasets released in this shared task:
(will be added during the camera-ready period)


Systems' performance will be evaluated with respect to their capability to reduce the distance that separates an automatic translation from its human-revised version.

Such distance will be measured in terms of TER, which will be computed between automatic and human post-edits in case-sensitive mode.

Also, BLEU and chrF will be taken into consideration as a secondary evaluation metrics. To gain further insights on final output quality, a subset of the outputs of the submitted systems will also be manually evaluated.

The submitted runs will be ranked based on the average HTER calculated on the test set by using the tercom software.

The HTER calculated between the raw MT output and human post-edits in the test set will be used as baseline (i.e. the baseline is a system that leaves all the test instances unmodified).

The evaluation script can be downloaded here

Submission Format

Participants' submissions should contain automatic post-edits of the target sentences in the test set. The submission format is as follows (each column is tab separated):


Where: Each field should be delimited by a single tab character.

Submission Requirements

Each participating team can submit at most 2 systems, but they have to explicitly indicate which of them represents their primary submission. In the case that none of the runs is marked as primary, the latest submission received will be used as the primary submission.

Submissions should be sent via email to Please use the following pattern to name your files:


INSTITUTION-NAME is an acronym/short name for your institution, e.g. "UniXY"

METHOD-NAME is an identifier for your method, e.g. "pt_1_pruned"

SUBTYPE indicates whether the submission is primary or contrastive with the two alternative values: PRIMARY, CONTRASTIVE.

You are also invited to submit a short paper (4 to 6 pages) to WMT describing your APE method(s). You are not required to submit a paper if you do not want to. In that case, we ask you to give an appropriate reference describing your method(s) that we can cite in the WMT overview paper.


The official results of the 2023 APE shared task will be available here (TBD)

Important dates

Release of training and development data April 14, 2023
Release of test data July 13, 2023
APE system submission deadline July 20, 2023 July 31, 2023
Manual evaluationAugust
Paper submission deadline(check WMT homepage)
Notification of acceptance(check WMT homepage)
Camera-ready deadline(check WMT homepage)
All deadlines are in AoE (Anywhere on Earth)


Pushpak Bhattacharyya (Indian Institute of Technology, Bombay)
Rajen Chatterjee (Apple Inc.)
Markus Freitag (Google Research)
Diptesh Kanojia (University of Surrey)
Matteo Negri (Fondazione Bruno Kessler)
Marco Turchi (Zoom Video Communications)


For any information or question about the task, please send an email
To be always updated about this year's edition of the APE task, you can also join the wmt-ape group.