Document Level Translation In Moses
Project leader: Liane
Desirable skills for participants: Experience with Moses and/or Machine Learning
Overview
Most current Statistical Machine Translation (SMT) systems translate each sentence in a document in isolation. Whilst this reduces the complexity of translating large documents, it introduces the problem that important discourse-level information is lost. An example of such information includes the sense of previously translated instances of a word, e.g. it is unlikely that the word “bank” refers to a riverbank in one sentence and a financial institution in another, within the same document. Another example is knowledge of the antecedent(s) of a referring pronoun where the antecedent appears in a previous sentence in the text - this is particularly relevant when translating into languages in which agreement must hold between pronouns and their antecedents. The aim of document-level translation is to provide a means to retain such discourse-level information, therefore moving translation beyond the sentence level.
Previous Work
Previous work within the framework of phrase-based SMT has included cache-based approaches which retain bilingual phrase-pairs from the best hypothesis of previously translated sentences. Tiedemann (2010) integrated cache-based language and translation models within a phrase-based SMT decoder and used an exponential decay factor to account for recency - giving greater “weight” to information extracted from recently translated sentences than to older sentences. When a source language phrase is considered for translation, its cache translation score is computed (zero if the phrase is not contained in the cache) using the phrase probabilities of matching phrases found in the cache and the decay factor. Tiedemann reports a small improvement in BLEU score of 0.275 when the cache is used.
Gong et al. (2011) extended the approach taken by Tiedemann by using a three-tiered cache system. In addition to the dynamic cache which is used to retain bilingual phrase pairs from the best hypothesis of each translated sentence, they incorporate a static cache (which stores relevant bilingual phrase pairs from “similar” bilingual documents) and a topic cache (which stores a set of target-language topic words from “similar” documents, relevant to the source-language content of the document). The static cache is most useful at the beginning of the translation process when the size of the dynamic cache is relatively small and gradually loses “weight” as the dynamic cache grows. The topic cache is included to reduce the effects of noisy data in the other caches by restricting the options to those deemed to be relevant to the topic of the document. Rather than compute a decay factor, they simply use the dynamic cache as a lookup to determine whether a bilingual phrase pair matching the source-language phrase exists in the cache. Gong et al. show a BLEU score improvement of 0.66 when all three caches are used in combination.
In both approaches, the information from the best hypothesis of a translated sentence is always retained. There are no constraints placed on the quality of the output hypotheses and both approaches would potentially benefit from the incorporation of confidence estimation to ensure that only the information from those hypotheses deemed to be of high quality is added to the cache.
Other efforts in document-level decoding include the forced-decoding method of Xiao et al. (2011) and the optimisation method of Hardmeier et al. (2012). Xiao et al. (2011) identify ambiguous words in the SMT system output and then re-decode using a filtered set of translation options (using the most frequent translation). They focus on document-level consistency and their method is not general enough to be extended to other discourse-level phenomena. Hardmeier et al. (2012) approach translation as an optimisation task. They first translate using a baseline SMT system and then iteratively apply a number of post-processing steps. If the amended document text receives a higher BLEU score than the current document, it is retained. The process continues until the specified termination criteria are reached.
Previous work in the field of confidence estimation has been focussed on word (Ueffing and Ney, 2007) and sentence level (Blatz et al., 2004) estimation with little effort paid towards phrase level estimation. The 2012 Shared Task on Quality Estimation (Callison-Burch et al., 2012) provides a number of resources that may be used in this project: training and testing data (for training a classifier) annotated with sentence-level quality (human) judgements, a baseline system for extracting features and a set of resources including a language model, phrase translation table and word alignments from which to extract the features.
Project Goals
1) Design and implement a cache-based document-level translation strategy using Moses.
2) Incorporate the notion of confidence estimation to ensure that only those best hypotheses that are deemed to be of “high quality” will be retained. For example, use the tools and resources of the 2012 Shared Task on Quality Estimation to develop a confidence estimation classifier.
Project Tasks
1) Implement a document-level translation strategy making use of the Moses decoder to perform the translations. This may be incorporated within Moses, or implemented externally, using an interface to the Moses decoder.
a) Presence of the phrase / word in the cache or a score based on recency will be used as an additional feature in scoring hypotheses.
2) Investigate ways in which confidence estimation can be used to decide whether the information from the best hypothesis of a translated sentence should be retained.
a)This may be at the sentence or sub-sentence level. The complexity of the method will be influenced by the time available. At the simplest level, it could make direct use of the tools and resources already available from the 2012 Shared Task on Quality Estimation.
3) Devise and conduct an evaluation of the document-level decoder.
4) Possible extensions (if time permits):
a) Devise a suitable method for incorporating the notion of recency - the information from more recently translated sentences is potentially more relevant than that from “older” sentences. If recency is not incorporated, simply check the cache to see if the phrase / word is contained. b) Select discourse-level information used to enrich the translation at the document-level. c) Devise and conduct an evaluation of the performance of the document-level decoder, focussing on the handling of the selected discourse-level information. N.B. the changes may be small and therefore not captured by BLEU.
References
John Blatz, Erin Fitzgerald, George Foster, Simona Gandrabur, Cyril Goutte, Alex Kulesza, Alberto Sanchis and Nicola Ueffing. 2004. Confidence Estimation for Machine Translation. In Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland.
Chris Callison-Burch, Philipp Koehn, Christof Monz, Matt Post, Radu Soricut and Lucia Specia. 2012. Findings of the 2012 Workshop on Statistical Machine Translation. In Proceedings of the Seventh Workshop on Statistical Machine Translation, pages 10-51, Montreal, Canada.
Data and resources available from: http://statmt.org/wmt12/quality-estimation-task.html
Zhengxian Gong, Min Zhang and Guodong Zhou. 2011. Cache-based Document-Level Statistical Machine Translation. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 909-919, Edinburgh, Scotland.
Christian Hardmeier, Joakim Nivre and Jörg Tiedemann. 2012. Document-Wide Decoding for Phrase-Based Statistical Machine Translation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1179-119, 0Jeju Island, Korea.
Jörg Tiedemann. 2010. Context Adaptation in Statistical Machine Translation Using Models with Exponentially Decaying Cache. In Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing, ACL 2010, pages 8-15, Uppsala, Sweden.
Nicola Ueffing and Hermann Ney. 2007. Word-level Confidence Estimation for Machine Translation Using Phrase-Based Translation Models. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 763-770, Vancouver, British Columbia, Canada.
Tong Xiao, Jingbo Zhu, Shujie Yao and Hao Zhang. 2011. Document-Level Consistency Verification in Machine Translation. In Proceedings of MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), pages 19-23, Xiamen, China.