Agenda for close-term work

(in parenthesis I put the responsible and the deadline, if any)

[DONE] read from gzipped nbest files (Nicola, by Wednesday May 21)
[PARTIALLY DONE] add check consistency for features and scores files
(Nicola, by Friday May 23)
modify mert-moses.pl and enhanced-mert.pl to interface with new mert
(Barry, ???)
check correctness of optimization algorithm
(Jean-Baptiste, ???)
add NIST score (???, ???)
performance evaluation (Nicola, by Friday 14, June)

Improvement Of Minimum Error Rate Training

Developers

Nicola Bertoldi
Jean-Baptiste Fouet
Barry Haddow

Goal

improve Minimum Error Rate Training (MERT) to increase:
- efficiency (speed, disk occupancy)
- modularity
support:
- distributed computation
- new error measures
- new optimization algorithms
- reranking
implement:
- new error measures
- new optimization algorithms
rewrite in C++
documentation

Brainstorming

modification of the inner loop of mert moses
optimization algorithm is independent from error mesure
store statistics and features in a binary format to speed up I/O
provide a text format for debugging
compare performance (results and speed) wrt old code
...

Work to do

~~define new architecture~~
~~define new objects~~
~~define correlation between objects~~
~~define new formats for features and error statistics~~
implement more error measures: ~~BLEU~~, NIST, WER, ~~PER~~, METEOR(?), AER, ...
extract feature scores and statistics for many error measures at once
combine more error measures
implement more optimization algorithm: Simplex, ~~Powell~~, Sampling, ..., ~~dummy random search~~
~~Figure out why random search only works on debug build!~~
implement interfaces for more nbest formats (~~Moses~~, BTEC, ...) (if required)
optimization over a subset of features (not finished)
extract 1best given a set of feature weights
provide pointers between statistics and actual nbest texts
~~efficient binarization~~
add consistency check for files, ...
~~add support for reading gzipped files~~
modify interface with mert-moses.pl and enhanced-mert.pl
Investigate meteor support within the current interface
documentation
regression tests
evaluation of speed wrt old code
evaluation of error measures
evaluation of optimization algorithm

Work done

defined new architecture
defined new objects
defined correlation between objects
defined new formats for features and error statistics
created normalise.py, to perform nist-bleu normalisation of nbest file and references
implemented BLEU4 (multiple references, shortest/closest/average ref length)
and PER (single reference)
implemented dummy random optimization
reading gzipped (text) files is now supported (they should have the .gz suffix)
efficient binarization

User guide

(More details will follow)

In trunk/mert/example there is a toy example

extraction of feature scores and the statistics of an error measure

 extractor --nbest NBEST
           --reference REF.0 REF.1 REF.2
           --sctype [BLEU4|PER]
           --ffile FEATSTAT.out
           --scfile SCORESTAT.out
           [--prev-ffile FEATSTAT.in]
           [--prev-scfile SCORESTAT.in]
           [--binary]

--binary: save data in a binary format

--prev-ffile: file with already computed feature scores

--prev-scfile: file with already computed error statistics

FEATSTAT.in and SCORESTAT.in can be either in zipped textual, unzipped textual or binary format

NBEST can be either in zipped or unzipped textual format

optimization of feature weights given the feature scores and the error statistics

 mert      --ffile FEATSTAT.in
           --scfile SCORESTAT.in
           -t Powell

FEATSTAT.in and SCORESTAT.in can be either in zipped textual, unzipped textual or binary format

Formats

textual format for features scores:
- an header "FEATURES_BEGIN i N_i R f1 f2 ..fR" reporting:
  - a string identifying the type of the file "FEATURES_BEGIN"
  - the index "i" of the utterance nbest refer to
  - the size "N_i" of the nbest list
  - the number (R) of features
  - the list of the names of the features
- one entry for each nbest reporting the R feature values
- a footer reporting the string "FEATURES_END" identifying the type of the file

 FEATURES_BEGIN 0 4 5 w_0 lm_0 lm_1 tm_0 tm_1
 0.1 2.1 3.1 0.4 -2.1
 2.3 3.1 1.1 -20.4 -2.0
 0.9 2.1 3.1 0.4 -2.1
 1.2 2.1 3.1 0.4 -2.1
 FEATURES_END
 FEATURES_BEGIN 1 3 5 w_0 lm_0 lm_1 tm_0 tm_1
 0.1 2.1 3.1 0.4 -2.1
 ....
 FEATURES_END

textual format for error scores:
- an header "SCORES_BEGIN i N_i S name" reporting:
  - a string identifying the type of the file "SCORES_BEGIN"
  - the index "i" of the utterance nbest refer to
  - the size "N_i" of the nbest list
  - the number (S) of score statistics
  - the name of the score measure (BLEU, NIST, PER)
- one entry for each nbest reporting the S score statistics
- a footer reporting the string "SCORES_END" identifying the type of the file

 SCORES_BEGIN 0 4 9 BLEU
 3 4 3 3 1 2 1 1 5
 2 5 1 4 1 3 1 2 6
 9 12 5 11 3 10  4 9 11
 6 21 4 20 3 19 2 18 20
 SCORES_END
 SCORES_BEGIN 1 3 9 BLEU
 3 4 3 3 1 2 1 1 5
 ...
 SCORES_END

keep the two files alignment
header of error scores file depends on the error measure

Page last modified on August 05, 2008, at 12:43 PM