For the labs, we can use the workstations kindly provided by University of Trento in room 104.
To access, please your lab-account received for the MT Marathon. Please, remember to log out from the machine every time you leave the room. Please, do not lock the workstation for long period.
Machines are automatically switched off every evening at 7:30pm.
Software is pre-installed on the lab machines under /usr/local/smt_software/
/home/nicola.bertoldi_5/software/giza-pp/giza-pp-v1.0.5
/usr/local/smt_software/irstlm/irstlm-r419
/usr/local/smt_software/randlm7/randlm-v0.2
/usr/local/smt_software/srilm/srilm-1.5.10
/usr/local/smt_software/moses/moses-r4163
Using EMS, set the variables to link the correct software as follows:
moses-src-dir = /usr/local/smt_software/moses/moses-r4163
srilm-dir = /usr/local/smt_software/srilm/srilm-1.5.10/bin/i686
decoder = $moses-src-dir/bin/moses
ttable-binarizer = $moses-src-dir/bin/processPhraseTable
training-options = "-bin-dir=/home/nicola.bertoldi_5/software/giza-pp/giza-pp-v1.0.5/bin"
Using EMS, COLLINS-PARSER is NOT available yet. Please, pay attention when using syntax stuff.
Use the following commands to set up an experiment:
cd mkdir experiment cd experiment cp /usr/local/smt_software/moses/moses-r4163/scripts/ems/example/config.toy .
Now edit in config.toy
the following settings:
working-dir = /home/sci-mtm(YOUR-USER-ID)/experiment
moses-src-dir = /usr/local/smt_software/moses/moses-r4163
moses-script-dir = $moses-src-dir/scripts
srilm-dir = /usr/local/smt_software/srilm/srilm-1.5.10/bin/i686
decoder = $moses-src-dir/bin/moses
ttable-binarizer = $moses-src-dir/bin/processPhraseTable
training-options = "-bin-dir=/home/nicola.bertoldi_5/software/giza-pp/giza-pp-v1.0.5/bin"
You are now able to run the experiment:
/usr/local/smt_software/moses/moses-r4163/scripts/ems/experiment.perl -config config.toy -exec
See the Moses documentation for EMS for more details.
We discussed IBM Model 1 in the lecture today. In this lab you will implement the EM algorithm for IBM Model 1 in your favorite programming language. Here are some data sets to train on:
Your program should output two different things:
Pseudo-code of IBM Model 1 as presented in the lecture:
initialize t(e|f) uniformly do until convergence set count(e|f) to 0 for all e,f set total(f) to 0 for all f for all sentence pairs (e_s,f_s) set total_s(e) = 0 for all e for all words e in e_s for all words f in f_s total_s(e) += t(e|f) for all words e in e_s for all words f in f_s count(e|f) += t(e|f) / total_s(e) total(f) += t(e|f) / total_s(e) for all f for all e t(e|f) = count(e|f) / total(f)
See the slides from Maja Popovic
cp config.toy config.hierarchical
Change the following:
1. decoder = $moses-src-dir/bin/moses_chart
2. ttable-binarizer = "$moses-src-dir/bin/CreateOnDiskPt 1 1 5 100 2"
3. Delete or comment out the line
lexicalized-reordering = msd-bidirectional-fe
4. Uncomment line
hierarchical-rule-set = true
5. weight-config = $working-dir/weight_hiero.ini [THIS HAS CHANGED!!]
6. decoder-settings = "-search-algorithm 3 -cube-pruning-pop-limit 5000 -s 5000"
[3 instead of 1]
cp /usr/local/smt_software/moses/moses-r4163/scripts/ems/example/data/weight.ini weight_hiero.ini
Edit weight_hiero.ini : - remove the whole distortion block (including weights) - add one weight (value 0.2) to the # translation model weights block
You are now able to run the experiment:
/usr/local/smt_software/moses/moses-r4163/scripts/ems/experiment.perl -config config.hierarchical -exec