machine translation



Moses is an implementation of the statistical (or data-driven) approach to machine translation (MT). This is the dominant approach in the field at the moment, and is employed by the online translation systems deployed by the likes of Google and Microsoft. In statistical machine translation (SMT), translation systems are trained on large quantities of parallel data (from which the systems learn how to translate small segments), as well as even larger quantities of monolingual data (from which the systems learn what the target language should look like). Parallel data is a collection of sentences in two different languages, which is sentence-aligned, in that each sentence in one language is matched with its corresponding translated sentence in the other language. It is also known as a bitext.

The training process in Moses takes in the parallel data and uses coocurrences of words and segments (known as phrases) to infer translation correspondences between the two languages of interest. In phrase-based machine translation, these correspondences are simply between continuous sequences of words, whereas in hierarchical phrase-based machine translation or syntax-based translation, more structure is added to the correspondences. For instance a hierarchical MT system could learn that the German hat X gegessen corresponds to the English ate X, where the Xs are replaced by any German-English word pair. The extra structure used in these types of systems may or may not be derived from a linguistic analysis of the parallel data. Moses also implements an extension of phrase-based machine translation know as factored translation which enables extra linguistic information to be added to a phrase-based systems.

For more information about the Moses translation models, please refer to the tutorials on phrase-based MT, syntactic MT or factored MT.

Whichever type of machine translation model you use, the key to creating a good system is lots of good quality data. There are many free sources of parallel data which you can use to train sample systems, but (in general) the closer the data you use is to the type of data you want to translate, the better the results will be. This is one of the advantages to using on open-source tool like Moses, if you have your own data then you can tailor the system to your needs and potentially get better performance than a general-purpose translation system. Moses needs sentence-aligned data for its training process, but if data is aligned at the document level, it can often be converted to sentence-aligned data using a tool like hunalign


The two main components in Moses are the training pipeline and the decoder. There are also a variety of contributed tools and utilities. The training pipeline is really a collection of tools (mainly written in perl, with some in C++) which take the raw data (parallel and monolingual) and turn it into a machine translation model. The decoder is a single C++ application which, given a trained machine translation model and a source sentence, will translate the source sentence into the target language.

The Training Pipeline

There are various stages involved in producing a translation system from training data, which are described in more detail in the training documentation and in the baseline system guide. These are implemented as a pipeline, which can be controlled by the Moses experiment management system, and Moses in general makes it easy to insert different types of external tools into the training pipeline.

The data typically needs to be prepared before it is used in training, tokenising the text and converting tokens to a standard case. Heuristics are used to remove sentence pairs which look to be misaligned, and long sentences are removed. The parallel sentences are then word-aligned, typically using GIZA++, which implements a set of statistical models developed at IBM in the 80s. These word alignments are used to extract phrase-phrase translations, or hierarchical rules as required, and corpus-wide statistics on these rules are used to estimate probabilities.

An important part of the translation system is the language model, a statistical model built using monolingual data in the target language and used by the decoder to try to ensure the fluency of the output. Moses relies on external tools for language model building.

The final step in the creation of the machine translation system is tuning, where the different statistical models are weighted against each other to produce the best possible translations. Moses contains implementations of the most popular tuning algorithms.

The Decoder

The job of the Moses decoder is to find the highest scoring sentence in the target language (according to the translation model) corresponding to a given source sentence. It is also possible for the decoder to output a ranked list of the translation candidates, and also to supply various types of information about how it came to its decision (for instance the phrase-phrase correspondences that it used).

The decoder is written in a modular fashion and allows the user to vary the decoding process in various ways, such as:

  • Input: This can be a plain sentence, or it can be annotated with xml-like elements to guide the translation process, or it can be a more complex structure like a lattice or confusion network (say, from the output of speech recognition)
  • Translation model: This can use phrase-phrase rules, or hierarchical (perhaps syntactic) rules. It can be compiled into a binarised form for faster loading. It can be supplemented with features to add extra information to the translation process, for instance features which indicate the sources of the phrase pairs in order to weight their reliability.
  • Decoding algorithm: Decoding is a huge search problem, generally too big for exact search, and Moses implements several different strategies for this search, such as stack-based, cube-pruning, chart parsing etc.
  • Language model: Moses supports several different language model toolkits (SRILM, KenLM, IRSTLM, RandLM) each of which has there own strengths and weaknesses, and adding a new LM toolkit is straightforward.

The Moses decoder also supports multi-threaded decoding (since translation is embarassingly parallelisable), and also has scripts to enable multi-process decoding if you have access to a cluster.

Contributed Tools

There are many contributed tools in Moses which supply additional functionality over and above the standard training and decoding pipelines. These include:

  • Moses server: which provides an xml-rpc interface to the decoder
  • Web translation: A set of scripts to enable Moses to be used to translate web pages
  • Analysis tools: Scripts to enable the analysis and visualisation of Moses output, in comparison with a reference.

There are also tools to evaluate translations, alternative phrase scoring methods, an implementation of a technique for weighting phrase tables, a tool to reduce the size of the phrase table, and other contributed tools.


Moses is an open-source project, licensed under the LGPL, which incorporates contributions from many sources. There is no formal management structure in Moses, so if you want to contribute then just mail support and take it from there. There is a list of possible projects on this website, but any new MT techiques are fair game for inclusion into Moses.

In general, the Moses administrators are fairly open about giving out push access to the git repository, preferring the approach of removing/fixing bad commits, rather than vetting commits as they come in. This means that trunk occasionally breaks, but given the active Moses user community, it doesn't stay broken for long. The nightly builds and tests of trunk are reported on the cruise control web page, but if you want a more stable version then look for one of the releases.

Moses in Use

The liberal licensing policy in Moses, together with its wide coverage of current SMT technology and complete tool chain, make it probably the most widely used open-source SMT system. It is used in teaching, research, and, increasingly, in commercial settings.

Commercial use of Moses is promoted and tracked by TAUS. The most common current use for SMT in commercial settings is post-editing where machine translation is used as a first-pass, with the results then being edited by human translators. This can often reduce the time (and hence total cost) of translation. There is also work on using SMT in computer-aided translation, which is the research topic of two current EU projects, Casmacat and MateCat.


2005Hieu Hoang (then student of Philipp Koehn) starts Moses as successor to Pharoah
2006Moses is the subject of the JHU workshop, first check-in to public repository
2006Start of Euromatrix, EU project which helps fund Moses development
2007First machine translation marathon held in Edinburgh
2009Moses receives support from EuromatrixPlus, also EU-funded
2010Moses now supports hierarchical and syntax-based models, using chart decoding
2011Moses moves from sourceforge to github, after over 4000 sourceforge check-ins
2012EU-funded MosesCore launched to support continued development of Moses
Edit - History - Print
Page last modified on August 13, 2013, at 10:38 AM