edit · history · print

Noise Experiments

newstest 2017, case-sensitive

Phrase-Based

Baseline [96-2], [96-48]: 24.0

 5%10%20%50%100%
wrong language source[40] 24.0 -.0[41] 23.9 -.1[42] 23.9 -.1[43] 23.8 -.2
wrong language target[36] 24.0 -.0[37] 23.9 -.1[38] 23.9 -.1[39] 23.7 -.3
wrong language source+target[44] 24.0 -.0[45] 23.9 -.1[46] 23.8 -.2[47] 23.8 -.2
wrong language 2 source[75] 24.0 -.0[76] 23.9 -.1[77] 23.9 -.1[78] 23.9 -.1[79] 23.8 -.2
wrong language 2 target[70] 24.0 -.0[71] 23.9 -.1[72] 23.8 -.2[73] 23.5 -.5[74] 23.4 -.6
wrong language 2 source+target[65] 24.0 -.0[66] 24.0 -.0[67] 23.9 -.1[68] 23.5 -.5[69] 23.6 -.4
shuffle words source[16] 24.0 -.0[17] 23.6 -.4[18] 23.9 -.1[19] 23.6 -.4[55] 23.7 -.3
shuffle words target[20] 24.0 -.0[21] 24.0 -.0[22] 23.4 -.6[23] 23.2 -.8[56] 22.9 -1.1
shuffle words source+target[32] 23.9 -.1[33] 23.5 -.5[34] 23.5 -.5[35] 23.1 -.9[57] 22.8 -1.2
shuffle sentences[24] 24.0 -.0[25] 23.9 -.1[26] 23.9 -.1[27] 23.6 -.4[54] 23.4 -.6
domain mismatch[28] 23.9 -.1[29] 24.0 -.0[30] 24.0 -.0[31] 24.1 +.1
domain mismatch 2[58] 24.8 +.8[59] 24.8 +.8[60] 25.1 +1.1[61] 25.6 +1.6[62] 26.0 +2.0
copy source[49] 23.8 -.2[50] 23.9 -.1[51] 23.8 -.2[52] 23.4 -.6[53] 21.1 -2.9
copy target[90] 23.9 -.1[91] 23.9 -.2[92] 23.6 -.4[93] 23.7 -.3[94] 23.5 -.5
short (max 2)[95] 24.1 +.1[96] 23.9 -.1[97] 23.8 -.2--
short (max 5)[98] 24.2 +.2[99] 24.5 +.5[100] 24.5 +.5[101] 24.2 +.2-
paracrawl[80] 24.2 +.2[81] 24.2 +.2[82] 24.4 +.4[83] 24.8 +.8[84] 25.2 +1.2
paracrawl clean (>0)[85] 25.0 +1[86] 25.3 +1.3[87] 25.8 +1.8[88] 26.5 +2.6[89] 27.0 +3

wrong language: Europarl cs-en, de-cs

wrong language 2: EU Bookstore fr-en, de-fr

domain mismatch: EMEA (41%), IT (9%), Acquis (50%)

domain mismatch 2: EMEA (10%), IT (2%), Acquis (13%), Subtitles 2016 (62%), Koran (7%)

Neural

Baselines

ep-nc25.8
ep-nc-rapid27.2
ep-nc-rapid-cc30.0
ep-nc-rapid-cc-eubook29.1
ep-nc-rapid-cc-acquis30.0
ep-nc-rapid-cc-subtitle201630.6
ep-nc-rapid-cc-subtitle2016-paraclean10031.2

Noise

 5%10%20%50%100%150%200%300%
wrong language source   26.4 -0.7
wrong language target   26.1 -1.1
wrong language source+target   25.6 -1.6
wrong language 2 source26.9 -0.326.8 -0.426.8 -0.426.8 -0.426.8 -0.4
wrong language 2 target26.7 -0.526.6 -0.626.7 -0.526.2 -1.025.0 -2.2
wrong language 2 source+target    24.9 -2.3
shuffle words source26.9 -0.326.6 -0.626.4 -0.826.6 -0.625.5 -1.7
shuffle words target27.0 -0.226.8 -0.426.4 -0.826.7 -0.526.1 -1.1
shuffle words source+target   25.8 -1.425.1 -2.1
shuffle sentences26.5 -0.726.5 -0.526.3 -0.926.1 -1.125.3 -1.9
domain mismatch   26.7 -0.5
domain mismatch 2    29.1 +1.9
short (max 2)27.1 -0.126.5 -0.726.7 -0.5  
short (max 5)27.8 +0.627.6 +0.428.0 +0.826.6 -0.6 
copy source17.6 -9.811.2 -16.05.6 -21.63.2 -24.03.2 -24.0
copy target27.2 -0.027.0 -0.226.7 -0.526.8 -0.426.9 -0.3
paracrawl27.4 +0.226.6 -0.624.7 -2.520.9 -6.317.3 -9.9
paracrawl clean28.1 +0.928.5 +1.329.5 +2.330.3 +3.030.8 +3.530.1 +2.930.5 +3.229.9 +2.7
edit · history · print
Page last modified on January 03, 2018, at 10:12 PM