The goal of regression testing is to ensure that any changes made to the decoder do not break what has been determined to be correct, previously. The regression test suite is fast enough to run often, but still should provide adequate confidence that nothing substantial has changed about the internal workings of moses. The regression test suite is designed to run on most UNIX-like systems. The regression test suite is run as part of the nightly build, so if you have problems with the regression tests you should first check if the nightly build succeeded.
The following regression tests are currently implemented (and many more have been added since this list was written):
basic-surface-only
Tests basic translation, compares output strings and probabality scores.
basic-surface-binptable
Tests binary phrase table
consensus-decoding-surface
Basic test of consensus decoding
ptable-filtering
Tests the filtering of the phrase table by estimated phrase cost, ensures that the estimated phrase cost stays the same and that the same list of phrases is consistent. Matches pharaoh.
multi-factor
Test that moses can do translation with two factors (Currently does a very basic test- it should be enhanced to at least include OOV words).
multi-factor-binptable
Tests factored setup with binary phrase table.
multi-factor-drop
Test of dropping words in a multi-factor model.
nbest-multi-factor
Tests n-best list generation for multi-factor models
n-best
Test n-best filtering, ensure consistency of top scores and score components. This will require ensuring that any moses binary is capable of generating n-best lists.
lattice-surface
Tests lattice decoding
lattice-distortion
Tests lattice decoding with distortion (?)
confusionNet-surface-only
Tests confusion network decoding
confusionNet-multi-factor
Tests confusion network decoding with multiple factors
lexicalized-reordering
Tests lexical reordering model
lexicalized-reordering-cn
Tests lexical reordering model in combination with confusion network
xml-markup
Tests XML Markup in input to specify translations
Download the regression tests
git clone https://github.com/moses-smt/moses-regression-tests.git
From the Moses root, run
./bjam --with-irstlm=/path/to/irst --with-cmph=/path/to/cmph --with-regtest=/path/to/moses-regression-tests -j8
This will run the regression tests in parallel (-j8) so be sure to set a number of CPUs that your machine can handle.
If all goes well, you will see a list of the tests run, their status (hopefully pass), and a path where the results are archived.
You can run a specific test by providing the name followed by ".passed"
./bjam --with-irstlm=/path/to/irst --with-cmph=/path/to/cmph --with-regtest=/path/to/moses-regression-tests mert.basic.passed
The test name is the same as the directory name in /path/to/moses-regression-tests/tests .
The test suite invokes moses to decode a few sample phrases with well-known models. The output from these invocations is then scraped for information (for example, the output translation of a sentence or its probability score) which is stored in a file called results.dat
. These values are then compared to a ground truth, which was established either by hand, from a prior moses run, or from a pharaoh run.
This will provide a point-by-point analysis of each failure or success in the test as well as information.
Note: Since the test suite relies on the output of moses, changes to the output format may result in broken tests. If you make changes that affect presentation only, you will need to update the testing filters (which convert the raw moses output into the results.dat
format).
Writing regression tests is easy, but since these tests must be able to be run anywhere, it is important to keep a few things in mind. First, check out the regression-testing module from the Git repository. Settle on what you would like to test in and choose a test name (henceforth, this name will be TEST-NAME
). Create a directory for it under regression testing.
Place the following into the directory regression-testing/tests/TEST-NAME
:
to-translate
, which contains the text that will be translated by moses.
moses.ini
. This moses.ini file should have no absolute paths. All paths should be expressed in terms of the variables ${LM_PATH}
and ${MODELS_PATH}
.
filter-stderr
and filter-stdout
. These files should read from STDIN
and write results of the form KEY = value
to STDOUT
. No other output should be generated. Numeric values (such as times) that do not require exact matches can have the form KEY ~ value
. These files are the trickiest part about writing a new regression test. However, they allow great flexibility in verifying specific aspects of a decoding run.
truth/results.txt
This file should have the values (as produced by filter-stderr and filter-stdout) that are expected from the test run.
If you need to add language models, phrase tables, generation tables or anything like this, you will need to increment the required data version number in MosesRegressionTesting.pm
. Then, you will need to create a new .tgz
file that contains the data for all the tests (the data dependencies are not checked into the Git repository because they are extremely large). This must then be made available for download.