Moses
statistical
machine translation
system

Regression Testing

Goals

The goal of regression testing is to ensure that any changes made to the decoder do not break what has been determined to be correct, previously. The regression test suite is fast enough to run often, but still should provide adequate confidence that nothing substantial has changed about the internal workings of moses. The regression test suite is designed to run on most UNIX-like systems. The regression test suite is run as part of the nightly build, so if you have problems with the regression tests you should first check if the nightly build succeeded.

Test suite

The following regression tests are currently implemented (and many more have been added since this list was written):

  • basic-surface-only Tests basic translation, compares output strings and probabality scores.
  • basic-surface-binptable Tests binary phrase table
  • consensus-decoding-surface Basic test of consensus decoding
  • ptable-filtering Tests the filtering of the phrase table by estimated phrase cost, ensures that the estimated phrase cost stays the same and that the same list of phrases is consistent. Matches pharaoh.
  • multi-factor Test that moses can do translation with two factors (Currently does a very basic test- it should be enhanced to at least include OOV words).
  • multi-factor-binptable Tests factored setup with binary phrase table.
  • multi-factor-drop Test of dropping words in a multi-factor model.
  • nbest-multi-factor Tests n-best list generation for multi-factor models
  • n-best Test n-best filtering, ensure consistency of top scores and score components. This will require ensuring that any moses binary is capable of generating n-best lists.
  • lattice-surface Tests lattice decoding
  • lattice-distortion Tests lattice decoding with distortion (?)
  • confusionNet-surface-only Tests confusion network decoding
  • confusionNet-multi-factor Tests confusion network decoding with multiple factors
  • lexicalized-reordering Tests lexical reordering model
  • lexicalized-reordering-cn Tests lexical reordering model in combination with confusion network
  • xml-markup Tests XML Markup in input to specify translations

Running the test suite

Download the regression tests

  git clone https://github.com/moses-smt/moses-regression-tests.git

From the Moses root, run

  ./bjam --with-irstlm=/path/to/irst --with-cmph=/path/to/cmph --with-regtest=/path/to/moses-regression-tests -j8

This will run the regression tests in parallel (-j8) so be sure to set a number of CPUs that your machine can handle.

If all goes well, you will see a list of the tests run, their status (hopefully pass), and a path where the results are archived.

Running an individual test

You can run a specific test by providing the name followed by ".passed"

  ./bjam --with-irstlm=/path/to/irst --with-cmph=/path/to/cmph --with-regtest=/path/to/moses-regression-tests mert.basic.passed

The test name is the same as the directory name in /path/to/moses-regression-tests/tests .

How it works

The test suite invokes moses to decode a few sample phrases with well-known models. The output from these invocations is then scraped for information (for example, the output translation of a sentence or its probability score) which is stored in a file called results.dat. These values are then compared to a ground truth, which was established either by hand, from a prior moses run, or from a pharaoh run.

This will provide a point-by-point analysis of each failure or success in the test as well as information.

Note: Since the test suite relies on the output of moses, changes to the output format may result in broken tests. If you make changes that affect presentation only, you will need to update the testing filters (which convert the raw moses output into the results.dat format).

Writing regression tests

Writing regression tests is easy, but since these tests must be able to be run anywhere, it is important to keep a few things in mind. First, check out the regression-testing module from the Git repository. Settle on what you would like to test in and choose a test name (henceforth, this name will be TEST-NAME). Create a directory for it under regression testing.

Place the following into the directory regression-testing/tests/TEST-NAME:

  • to-translate, which contains the text that will be translated by moses.
  • moses.ini. This moses.ini file should have no absolute paths. All paths should be expressed in terms of the variables ${LM_PATH} and ${MODELS_PATH}.
  • The filter files, filter-stderr and filter-stdout. These files should read from STDIN and write results of the form KEY = value to STDOUT. No other output should be generated. Numeric values (such as times) that do not require exact matches can have the form KEY ~ value. These files are the trickiest part about writing a new regression test. However, they allow great flexibility in verifying specific aspects of a decoding run.
  • truth/results.txt This file should have the values (as produced by filter-stderr and filter-stdout) that are expected from the test run.

If you need to add language models, phrase tables, generation tables or anything like this, you will need to increment the required data version number in MosesRegressionTesting.pm. Then, you will need to create a new .tgz file that contains the data for all the tests (the data dependencies are not checked into the Git repository because they are extremely large). This must then be made available for download.

Edit - History - Print
Page last modified on October 31, 2015, at 10:40 PM