Agenda for close-term work
(in parenthesis I put the responsible and the deadline, if any)
- [DONE] read from gzipped nbest files (Nicola, by Wednesday May 21)
- [PARTIALLY DONE] add check consistency for features and scores files
(Nicola, by Friday May 23)
- modify
mert-moses.pl
and enhanced-mert.pl
to interface with new mert
(Barry, ???)
- check correctness of optimization algorithm
(Jean-Baptiste, ???)
- add NIST score (???, ???)
- performance evaluation (Nicola, by Friday 14, June)
Improvement Of Minimum Error Rate Training
Developers
- Nicola Bertoldi
- Jean-Baptiste Fouet
- Barry Haddow
Goal
- improve Minimum Error Rate Training (MERT) to increase:
- efficiency (speed, disk occupancy)
- modularity
- support:
- distributed computation
- new error measures
- new optimization algorithms
- reranking
- implement:
- new error measures
- new optimization algorithms
- rewrite in C++
- documentation
Brainstorming
- modification of the inner loop of mert moses
- optimization algorithm is independent from error mesure
- store statistics and features in a binary format to speed up I/O
- provide a text format for debugging
- compare performance (results and speed) wrt old code
- ...
Work to do
define new architecture
define new objects
define correlation between objects
define new formats for features and error statistics
- implement more error measures:
BLEU, NIST, WER, PER, METEOR(?), AER, ...
- extract feature scores and statistics for many error measures at once
- combine more error measures
- implement more optimization algorithm: Simplex,
Powell, Sampling, ..., dummy random search
Figure out why random search only works on debug build!
- implement interfaces for more nbest formats (
Moses, BTEC, ...) (if required)
- optimization over a subset of features (not finished)
- extract 1best given a set of feature weights
provide pointers between statistics and actual nbest texts
efficient binarization
- add consistency check for files, ...
add support for reading gzipped files
- modify interface with
mert-moses.pl
and enhanced-mert.pl
- Investigate meteor support within the current interface
- documentation
- regression tests
- evaluation of speed wrt old code
- evaluation of error measures
- evaluation of optimization algorithm
Work done
- defined new architecture
- defined new objects
- defined correlation between objects
- defined new formats for features and error statistics
- created normalise.py, to perform nist-bleu normalisation of nbest file and references
- implemented BLEU4 (multiple references, shortest/closest/average ref length)
and PER (single reference)
- implemented dummy random optimization
- reading gzipped (text) files is now supported (they should have the .gz suffix)
- efficient binarization
User guide
(More details will follow)
In trunk/mert/example there is a toy example
- extraction of feature scores and the statistics of an error measure
extractor --nbest NBEST
--reference REF.0 REF.1 REF.2
--sctype [BLEU4|PER]
--ffile FEATSTAT.out
--scfile SCORESTAT.out
[--prev-ffile FEATSTAT.in]
[--prev-scfile SCORESTAT.in]
[--binary]
--binary: save data in a binary format
--prev-ffile: file with already computed feature scores
--prev-scfile: file with already computed error statistics
FEATSTAT.in and SCORESTAT.in can be either in zipped textual, unzipped textual or binary format
NBEST can be either in zipped or unzipped textual format
- optimization of feature weights given the feature scores and the error statistics
mert --ffile FEATSTAT.in
--scfile SCORESTAT.in
-t Powell
FEATSTAT.in and SCORESTAT.in can be either in zipped textual, unzipped textual or binary format
Formats
- textual format for features scores:
- an header "FEATURES_BEGIN i N_i R f1 f2 ..fR" reporting:
- a string identifying the type of the file "FEATURES_BEGIN"
- the index "i" of the utterance nbest refer to
- the size "N_i" of the nbest list
- the number (R) of features
- the list of the names of the features
- one entry for each nbest reporting the R feature values
- a footer reporting the string "FEATURES_END" identifying the type of the file
FEATURES_BEGIN 0 4 5 w_0 lm_0 lm_1 tm_0 tm_1
0.1 2.1 3.1 0.4 -2.1
2.3 3.1 1.1 -20.4 -2.0
0.9 2.1 3.1 0.4 -2.1
1.2 2.1 3.1 0.4 -2.1
FEATURES_END
FEATURES_BEGIN 1 3 5 w_0 lm_0 lm_1 tm_0 tm_1
0.1 2.1 3.1 0.4 -2.1
....
FEATURES_END
- textual format for error scores:
- an header "SCORES_BEGIN i N_i S name" reporting:
- a string identifying the type of the file "SCORES_BEGIN"
- the index "i" of the utterance nbest refer to
- the size "N_i" of the nbest list
- the number (S) of score statistics
- the name of the score measure (BLEU, NIST, PER)
- one entry for each nbest reporting the S score statistics
- a footer reporting the string "SCORES_END" identifying the type of the file
SCORES_BEGIN 0 4 9 BLEU
3 4 3 3 1 2 1 1 5
2 5 1 4 1 3 1 2 6
9 12 5 11 3 10 4 9 11
6 21 4 20 3 19 2 18 20
SCORES_END
SCORES_BEGIN 1 3 9 BLEU
3 4 3 3 1 2 1 1 5
...
SCORES_END
- keep the two files alignment
- header of error scores file depends on the error measure