Automatic Translation Error Analysis

Members

Mark Fishel, Dan Zeman, Maja Popović, Jan Berka, Ondřej Bojar, Joachim van den Bogaert, Suhel Jaber, Arianna Bisazza, Sabine Hunsicker, Martin Popel

Links

Project Guide

Addicter Home MTM Slides
Hjerson Home MTM Slides

Summary

  1. Cross-evaluate Addicter and Hjerson on each other's datasets
  2. Get hands dirty with applying both systems to a dataset of your choice
  3. Improve both systems

Some analysis

  • Hjerson is also evaluated on ranking error types and systems (less strict than prec/rec, still very useful)
  • Hjerson abuses lex (and reord); Addicter abuses miss and extra much more seriously -- have to use a less restrictive alignment in Addicter (and possibly restrict Hjerson's alignment a bit)

Result summary:

  • applying both to WMT'11 en-de (Sabine)
  • applying both to IWSLT'11 ar-en (Arianna, Suhel)
  • evaluating both on WMT'09 de-en
    • Berkeley alignment (Arianna)
    • greedy smarter alignment (Martin)
  • evaluating Hjerson on WMT'09 en-cz (Maja)
  • lemma-wer Hjerson (maja)
  • friendlier Addicter (Dan)

De->En data set, overall accuracy, Addicter and Hjerson:


flexible human error analysis

MT systemjanepbtrpbtmix
berk-addicter41.1/41.048.0/48.446.8/46.845.3/45.2
berkwmt-addicter51.2/51.255.4/56.855.0/54.953.9/54.2
greedy-addicter49.7/50.755.9/55.954.4/53.953.3/53.4
hmm-addicter48.9/48.855.4/53.254.4/51.152.9/51.0
default-hjerson51.8/53.656.1/57.354.3/57.354.0/56.0
lemmawer-hjerson51.7/53.756.2/57.753.7/57.353.9/56.1

En->Cz data set, overall accurracy, Hjerson:


free human error analysis

MT systembojar 1/2tectomt 1/2google 1/2pctrans 1/2
standard WER46.9/46.544.0/44.245.3/44.943.3/43.1
lemma WER48.3/48.044.6/45.046.3/46.143.9/43.9

Detailed results

En->Cz data set, Hjerson:


Rank correlations (annotator1/annotator2/Hjerson)

 inflreordmissextlexrho
bojar333/320/45972/66/474149/134/313147/142/240379/309/15270.400/0.100
tectomt310/319/45085/71/450122/108/312207/166/226612/528/18130.475/0.475
google360/341/49481/74/54280/64/175190/172/368369/319/15230.700/0.500
pctrans351/341/42895/89/50669/57/237168/160/326467/412/17860.700/0.700
rho0.400/0.1500.400/0.8000.800/0.800-0.200/0.4000.800/1.000

lemma-WER:

 inflreordmissextlexrho
bojar333/320/45972/66/424149/134/374147/142/296379/309/14620.700/0.500
tectomt310/319/45085/71/447122/108/367207/166/278612/528/17510.600/0.600
google360/341/49481/74/50680/64/216190/172/420369/319/14730.700/0.500
pctrans351/341/42895/89/48269/57/273168/160/354467/412/17410.700/0.700
rho0.400/0.1500.400/0.8000.800/0.800-0.200/0.4000.800/0.600

Confusions (hypothesis only):

bojar 1/2inflreordextlexx
infl154/13514/68/79/10282/253
reord9/621/2010/719/22417/368
ext20/208/821/2447/35149/139
lex149/16140/3497/97284/227968/855
x19/1810/1311/733/291527/1371
tectomt 1/2inflreordextlexx
infl118/11919/119/729/21287/260
reord9/1026/2218/1330/28397/350
ext23/2211/640/2068/5790/104
lex168/17848/49129/117459/4091041/897
x11/119/1212/942/321235/1112
google 1/2inflreordextlexx
infl170/14517/913/99/13297/274
reord15/920/2013/1214/15482/417
ext29/3010/1041/3759/55237/215
lex148/15831/31113/106275/2261034/904
x17/1614/1410/832/251598/1433
pctrans 1/2inflreordextlexx
infl120/12516/1110/1210/12280/239
reord8/329/248/711/18452/387
ext38/3310/1246/3855/48181/176
lex186/18849/44101/101376/3191117/978
x9/1115/194/227/291368/1215

lemma-WER:

bojar 1/2inflreordextlexx
infl154/13514/68/79/10279/250
reord5/520/188/716/20377/325
ext23/249/1035/2952/41184/181
lex145/15639/3282/92277/221922/800
x24/2011/1514/738/311581/1430
tectomt 1/2inflreordextlexx
infl118/11919/119/729/21287/260
reord6/926/2318/1327/27372/324
ext20/2312/640/2977/60127/133
lex167/17147/49129/107449/4061001/866
x18/189/1112/1046/331263/1140
google 1/2inflreordextlexx
infl170/14516/813/99/13297/274
reord13/921/2011/913/17450/385
ext29/3111/1052/5168/62268/244
lex145/15634/31110/90265/217991/862
x22/1714/1514/1334/251642/1478
pctrans 1/2inflreordextlexx
infl119/12516/1110/1210/11279/238
reord7/328/237/513/19428/370
ext28/286/1056/5064/49199/195
lex194/19248/4690/89366/3171094/953
x13/1216/206/426/301398/1239

Recall:

bojar 1, full/lemmaaut inflaut reordaut extaut lexaut x
hum infl43.87/43.872.56/1.425.70/6.5542.45/41.315.41/6.84
hum reord15.05/15.0522.58/21.518.60/9.6843.01/41.9410.75/11.83
hum ext5.44/5.446.80/5.4414.29/23.8165.99/55.787.48/9.52
hum lex2.30/2.304.85/4.0811.99/13.2772.45/70.668.42/9.69
hum x8.44/8.3512.47/11.284.46/5.5028.96/27.5845.68/47.29
bojar 2, full/lemmaaut inflaut reordaut extaut lexaut x
hum infl39.71/39.711.76/1.475.88/7.0647.35/45.885.29/5.88
hum reord7.41/7.4124.69/22.229.88/12.3541.98/39.5116.05/18.52
hum ext4.93/4.934.93/4.9316.90/20.4268.31/64.794.93/4.93
hum lex3.10/3.106.81/6.1910.84/12.6970.28/68.428.98/9.60
hum x8.47/8.3712.32/10.884.66/6.0628.63/26.7945.91/47.89
tectomt 1, full/lemmaaut inflaut reordaut extaut lexaut x
hum infl35.87/35.872.74/1.826.99/6.0851.06/50.763.34/5.47
hum reord16.81/16.8123.01/23.019.73/10.6242.48/41.597.96/7.96
hum ext4.33/4.338.65/8.1719.23/23.0862.02/57.215.77/7.21
hum lex4.62/4.624.78/4.3010.83/12.2673.09/71.506.69/7.32
hum x9.41/9.4113.02/12.202.95/4.1634.13/32.8240.49/41.41
tectomt 2, full/lemmaaut inflaut reordaut extaut lexaut x
hum infl35.00/35.002.94/2.656.47/6.7652.35/50.293.24/5.29
hum reord11.00/11.0022.00/23.006.00/6.0049.00/49.0012.00/11.00
hum ext4.22/4.227.83/7.8312.05/17.4770.48/64.465.42/6.02
hum lex3.84/3.845.12/4.9410.42/10.9774.77/74.225.85/6.03
hum x9.55/9.5512.85/11.903.82/4.8832.94/31.8040.84/41.87
google 1, full/lemmaaut inflaut reordaut extaut lexaut x
hum infl44.85/44.853.96/3.437.65/7.6539.05/38.264.49/5.80
hum reord17.71/16.6720.83/21.8810.42/11.4637.50/35.4213.54/14.58
hum ext6.84/6.506.84/5.5021.58/26.0059.47/55.005.26/7.00
hum lex2.31/2.313.60/3.3415.17/17.4870.69/68.128.23/8.74
hum x8.14/8.1413.21/12.346.50/7.3528.34/27.1743.80/45.01
google 2, full/lemmaaut inflaut reordaut extaut lexaut x
hum infl40.50/40.502.51/2.518.38/8.6644.13/43.584.47/4.75
hum reord10.71/9.5223.81/23.8111.90/11.9036.90/36.9016.67/17.86
hum ext5.23/5.236.98/5.2321.51/29.6561.63/52.334.65/7.56
hum lex3.89/3.894.49/5.0916.47/18.5667.66/64.977.49/7.49
hum x8.45/8.4512.86/11.876.63/7.5227.88/26.5844.19/45.58
pctrans 1, full/lemmaaut inflaut reordaut extaut lexaut x
hum infl33.24/32.962.22/1.9410.53/7.7651.52/53.742.49/3.60
hum reord13.45/14.0424.37/24.568.40/5.2641.18/42.1112.61/14.04
hum ext5.92/5.924.73/4.1427.22/33.1459.76/53.252.37/3.55
hum lex2.09/2.092.30/2.7111.48/13.3678.50/76.415.64/5.43
hum x8.24/8.2113.30/12.605.33/5.8632.87/32.2040.26/41.14
pctrans 2, full/lemmaaut inflaut reordaut extaut lexaut x
hum infl34.72/34.720.83/0.839.17/7.7852.22/53.333.06/3.33
hum reord10.00/10.0021.82/20.9110.91/9.0940.00/41.8217.27/18.18
hum ext7.50/7.504.38/3.1223.75/31.2563.12/55.621.25/2.50
hum lex2.82/2.584.23/4.4611.27/11.5074.88/74.416.81/7.04
hum x7.98/7.9512.92/12.355.88/6.5132.65/31.8240.57/41.37

Precision:

bojar 1, full/lemmahum inflhum reordhum exthum lexhum x
aut infl32.98/33.193.00/3.021.71/1.721.93/1.9460.39/60.13
aut reord1.89/1.174.41/4.692.10/1.883.99/3.7687.61/88.50
aut ext8.16/7.593.27/2.978.57/11.5519.18/17.1660.82/60.73
aut lex9.69/9.902.60/2.666.31/5.6018.47/18.9162.94/62.94
aut x1.19/1.440.62/0.660.69/0.842.06/2.2895.44/94.78
bojar 2, full/lemmahum inflhum reordhum exthum lexhum x
aut infl32.85/33.091.46/1.471.70/1.722.43/2.4561.56/61.27
aut reord1.42/1.334.73/4.801.65/1.875.20/5.3387.00/86.67
aut ext8.85/8.423.54/3.5110.62/10.1815.49/14.3961.50/63.51
aut lex11.72/11.992.47/2.467.06/7.0716.52/16.9962.23/61.49
aut x1.25/1.330.90/1.000.49/0.472.02/2.0695.34/95.14
tectomt 1, full/lemmahum inflhum reordhum exthum lexhum x
aut infl25.54/25.544.11/4.111.95/1.956.28/6.2862.12/62.12
aut reord1.88/1.345.42/5.803.75/3.796.25/6.0382.71/83.04
aut ext9.91/7.044.74/4.2317.24/16.9029.31/27.1138.79/44.72
aut lex9.11/9.372.60/2.646.99/6.6724.88/25.1856.42/56.14
aut x0.84/1.330.69/0.670.92/1.113.21/3.4094.35/93.49
tectomt 2, full/lemmahum inflhum reordhum exthum lexhum x
aut infl28.47/28.472.63/2.631.67/1.675.02/5.0262.20/62.20
aut reord2.36/2.275.20/5.813.07/3.286.62/6.8282.74/81.82
aut ext10.53/9.162.87/2.399.57/11.5527.27/23.9049.76/52.99
aut lex10.79/10.692.97/3.067.09/6.6924.79/25.3954.36/54.16
aut x0.94/1.491.02/0.910.77/0.832.72/2.7294.56/94.06
google 1, full/lemmahum inflhum reordhum exthum lexhum x
aut infl33.60/33.663.36/3.172.57/2.571.78/1.7858.70/58.81
aut reord2.76/2.563.68/4.132.39/2.172.57/2.5688.60/88.58
aut ext7.71/6.782.66/2.5710.90/12.1515.69/15.8963.03/62.62
aut lex9.22/9.392.24/2.207.04/7.1217.12/17.1564.38/64.14
aut x1.02/1.270.78/0.810.60/0.811.92/1.9795.69/95.13
google 2, full/lemmahum inflhum reordhum exthum lexhum x
aut infl32.22/32.292.00/1.782.00/2.002.89/2.9060.89/61.02
aut reord1.90/2.054.23/4.552.54/2.053.17/3.8688.16/87.50
aut ext8.65/7.792.88/2.5110.66/12.8115.85/15.5861.96/61.31
aut lex11.09/11.502.18/2.297.44/6.6415.86/16.0063.44/63.57
aut x1.07/1.100.94/0.970.53/0.841.67/1.6195.79/95.48
pctrans 1, full/lemmahum inflhum reordhum exthum lexhum x
aut infl27.52/27.423.67/3.692.29/2.302.29/2.3064.22/64.29
aut reord1.57/1.455.71/5.801.57/1.452.17/2.6988.98/88.61
aut ext11.52/7.933.03/1.7013.94/15.8616.67/18.1354.85/56.37
aut lex10.17/10.832.68/2.685.52/5.0220.56/20.4261.07/61.05
aut x0.63/0.891.05/1.100.28/0.411.90/1.7896.13/95.82
pctrans 2, full/lemmahum inflhum reordhum exthum lexhum x
aut infl31.33/31.492.76/2.773.01/3.023.01/2.7759.90/59.95
aut reord0.68/0.715.47/5.481.59/1.194.10/4.5288.15/88.10
aut ext10.75/8.433.91/3.0112.38/15.0615.64/14.7657.33/58.73
aut lex11.53/12.022.70/2.886.20/5.5719.57/19.8560.00/59.67
aut x0.86/0.921.49/1.530.16/0.312.27/2.3095.22/94.94

De->En data set, detailed results (confusions/prec-s/rec-s):

Evaluating berk addicter (ref / hyp tables; left: auto / top: manual):

 -infllexmissreord  -extinfllexreord
-16599503444 -16691395253
infl5321172 ext6387069340
lex781528214835 infl5432232
miss7641118027445 lex859571326245
reord5292127469 reord5673031487
precision0.920.250.230.220.10 precision0.930.080.260.210.12
recall0.440.440.540.510.35 recall0.440.400.420.620.38
f1-score0.590.320.320.300.16 f1-score0.600.140.320.320.19

Evaluating berkwmt addicter (ref / hyp tables; left: auto / top: manual):

 -infllexmissreord  -extinfllexreord
-19346412549 -1931394765
infl9328642 ext57010399334
lex53132599413 infl9632913
miss655921133724 lex58035227414
reord43901068107 reord4752218111
precision0.940.210.290.270.17 precision0.940.130.220.300.18
recall0.530.610.490.640.55 recall0.530.620.580.650.49
f1-score0.680.310.360.380.26 f1-score0.680.210.320.410.26

Evaluating default hjerson (ref / hyp tables; left: auto / top: manual):

 -infllexmissreord  -extinfllexreord
-18160113132 -18542002138
infl5942795 ext101360169
lex827439421624 infl6114569
miss20509720217 lex87673336141
reord59901529117 reord58920219130
precision0.960.340.270.390.15 precision0.960.220.370.270.17
recall0.520.910.750.410.60 recall0.530.240.900.850.57
f1-score0.670.500.400.400.25 f1-score0.680.230.520.410.26

Evaluating greedy addicter (ref / hyp tables; left: auto / top: manual):

 -infllexmissreord  -extinfllexreord
-19418362348 -19471093456
infl10534581 ext3987709514
lex5532258923 infl10533643
miss463120133016 lex5914222628
reord66312698127 reord68955432146
precision0.940.220.280.330.14 precision0.950.130.240.290.16
recall0.520.740.490.600.65 recall0.520.410.710.610.64
f1-score0.670.340.360.420.23 f1-score0.670.200.360.390.25

Evaluating hmm addicter (ref / hyp tables; left: auto / top: manual):

 -infllexmissreord  -extinfllexreord
-1947671452 -193997863
infl11135892 ext1143138340250
lex00000 infl11623754
miss1166450245140 lex00000
reord30411120101 reord3102312110
precision0.960.210.000.210.23 precision0.960.080.230.000.25
recall0.550.760.000.910.52 recall0.550.910.740.000.48
f1-score0.700.330.000.340.32 f1-score0.700.150.350.000.33

Evaluating lemmawer hjerson (ref / hyp tables; left: auto / top: manual):

 -infllexmissreord  -extinfllexreord
-18220112933 -18612002038
infl5942795 ext1023501810
lex814439422824 infl6114568
miss21409718917 lex87074335940
reord59701532116 reord58720220131
precision0.960.340.270.370.15 precision0.960.210.370.270.17
recall0.520.910.750.390.59 recall0.530.230.900.850.58
f1-score0.670.500.400.380.24 f1-score0.690.220.530.410.27

De->En data set, detailed results (error ranking):

Ranking evaluation for addicter, hmm (manual/automatic counts)

systeminflreordmissextlexrho
.jane17/6998/173109/69567/645173/0-0.100
.pbt13/5562/138203/74954/554193/00.300
.rpbt16/5735/132175/71929/537158/00.300
sys rank1.001.001.001.000.50

Ranking evaluation for addicter, greedy (manual/automatic counts)

systeminflreordmissextlexrho
jane17/6698/380109/29367/243173/3260.700
pbt13/4962/319203/37854/183193/2930.900
rpbt16/5435/305175/34029/158158/3081.000
sys rank1.001.001.001.00-0.50

Ranking evaluation for addicter, berk (manual/automatic counts)

systeminflreordmissextlexrho
.jane17/3198/289109/40067/350173/4520.900
.pbt13/2662/244203/44154/246193/4120.900
.rpbt16/2835/228175/43329/251158/4090.900
sys rank1.001.001.000.500.50

Ranking evaluation for addicter, berkwmt (manual/automatic counts)

systeminflreordmissextlexrho
.jane17/5498/263109/37567/325173/3280.800
.pbt13/4262/210203/44854/253193/2920.900
.rpbt16/4735/210175/41329/231158/2990.900
sys rank1.000.861.001.00-0.50

Ranking evaluation for hjerson, default (manual/automatic counts)

systeminflreordmissextlexrho
.jane17/4798/289109/14267/85173/4900.900
.pbt13/3262/248203/19254/31193/4980.600
.rpbt16/4335/223175/18729/46158/4770.700
sys rank1.001.001.000.501.00

Ranking evaluation for hjerson, lemmawer (manual/automatic counts)

systeminflreordmissextlexrho
.jane17/4798/289109/14067/86173/4910.900
.pbt13/3262/246203/19454/33193/4940.700
.rpbt16/4335/225175/18329/46158/4790.700
sys rank1.001.001.000.501.00
Page last modified on September 12, 2011, at 08:45 AM