newstest 2017, case-sensitive
Baseline [96-2], [96-48]: 24.0
5% | 10% | 20% | 50% | 100% | |
wrong language source | [40] 24.0 -.0 | [41] 23.9 -.1 | [42] 23.9 -.1 | [43] 23.8 -.2 | |
wrong language target | [36] 24.0 -.0 | [37] 23.9 -.1 | [38] 23.9 -.1 | [39] 23.7 -.3 | |
wrong language source+target | [44] 24.0 -.0 | [45] 23.9 -.1 | [46] 23.8 -.2 | [47] 23.8 -.2 | |
wrong language 2 source | [75] 24.0 -.0 | [76] 23.9 -.1 | [77] 23.9 -.1 | [78] 23.9 -.1 | [79] 23.8 -.2 |
wrong language 2 target | [70] 24.0 -.0 | [71] 23.9 -.1 | [72] 23.8 -.2 | [73] 23.5 -.5 | [74] 23.4 -.6 |
wrong language 2 source+target | [65] 24.0 -.0 | [66] 24.0 -.0 | [67] 23.9 -.1 | [68] 23.5 -.5 | [69] 23.6 -.4 |
shuffle words source | [16] 24.0 -.0 | [17] 23.6 -.4 | [18] 23.9 -.1 | [19] 23.6 -.4 | [55] 23.7 -.3 |
shuffle words target | [20] 24.0 -.0 | [21] 24.0 -.0 | [22] 23.4 -.6 | [23] 23.2 -.8 | [56] 22.9 -1.1 |
shuffle words source+target | [32] 23.9 -.1 | [33] 23.5 -.5 | [34] 23.5 -.5 | [35] 23.1 -.9 | [57] 22.8 -1.2 |
shuffle sentences | [24] 24.0 -.0 | [25] 23.9 -.1 | [26] 23.9 -.1 | [27] 23.6 -.4 | [54] 23.4 -.6 |
domain mismatch | [28] 23.9 -.1 | [29] 24.0 -.0 | [30] 24.0 -.0 | [31] 24.1 +.1 | |
domain mismatch 2 | [58] 24.8 +.8 | [59] 24.8 +.8 | [60] 25.1 +1.1 | [61] 25.6 +1.6 | [62] 26.0 +2.0 |
copy source | [49] 23.8 -.2 | [50] 23.9 -.1 | [51] 23.8 -.2 | [52] 23.4 -.6 | [53] 21.1 -2.9 |
copy target | [90] 23.9 -.1 | [91] 23.9 -.2 | [92] 23.6 -.4 | [93] 23.7 -.3 | [94] 23.5 -.5 |
short (max 2) | [95] 24.1 +.1 | [96] 23.9 -.1 | [97] 23.8 -.2 | - | - |
short (max 5) | [98] 24.2 +.2 | [99] 24.5 +.5 | [100] 24.5 +.5 | [101] 24.2 +.2 | - |
paracrawl | [80] 24.2 +.2 | [81] 24.2 +.2 | [82] 24.4 +.4 | [83] 24.8 +.8 | [84] 25.2 +1.2 |
paracrawl clean (>0) | [85] 25.0 +1 | [86] 25.3 +1.3 | [87] 25.8 +1.8 | [88] 26.5 +2.6 | [89] 27.0 +3 |
wrong language: Europarl cs-en, de-cs
wrong language 2: EU Bookstore fr-en, de-fr
domain mismatch: EMEA (41%), IT (9%), Acquis (50%)
domain mismatch 2: EMEA (10%), IT (2%), Acquis (13%), Subtitles 2016 (62%), Koran (7%)
ep-nc | 25.8 |
ep-nc-rapid | 27.2 |
ep-nc-rapid-cc | 30.0 |
ep-nc-rapid-cc-eubook | 29.1 |
ep-nc-rapid-cc-acquis | 30.0 |
ep-nc-rapid-cc-subtitle2016 | 30.6 |
ep-nc-rapid-cc-subtitle2016-paraclean100 | 31.2 |
5% | 10% | 20% | 50% | 100% | 150% | 200% | 300% | |
wrong language source | 26.4 -0.7 | |||||||
wrong language target | 26.1 -1.1 | |||||||
wrong language source+target | 25.6 -1.6 | |||||||
wrong language 2 source | 26.9 -0.3 | 26.8 -0.4 | 26.8 -0.4 | 26.8 -0.4 | 26.8 -0.4 | |||
wrong language 2 target | 26.7 -0.5 | 26.6 -0.6 | 26.7 -0.5 | 26.2 -1.0 | 25.0 -2.2 | |||
wrong language 2 source+target | 24.9 -2.3 | |||||||
shuffle words source | 26.9 -0.3 | 26.6 -0.6 | 26.4 -0.8 | 26.6 -0.6 | 25.5 -1.7 | |||
shuffle words target | 27.0 -0.2 | 26.8 -0.4 | 26.4 -0.8 | 26.7 -0.5 | 26.1 -1.1 | |||
shuffle words source+target | 25.8 -1.4 | 25.1 -2.1 | ||||||
shuffle sentences | 26.5 -0.7 | 26.5 -0.5 | 26.3 -0.9 | 26.1 -1.1 | 25.3 -1.9 | |||
domain mismatch | 26.7 -0.5 | |||||||
domain mismatch 2 | 29.1 +1.9 | |||||||
short (max 2) | 27.1 -0.1 | 26.5 -0.7 | 26.7 -0.5 | |||||
short (max 5) | 27.8 +0.6 | 27.6 +0.4 | 28.0 +0.8 | 26.6 -0.6 | ||||
copy source | 17.6 -9.8 | 11.2 -16.0 | 5.6 -21.6 | 3.2 -24.0 | 3.2 -24.0 | |||
copy target | 27.2 -0.0 | 27.0 -0.2 | 26.7 -0.5 | 26.8 -0.4 | 26.9 -0.3 | |||
paracrawl | 27.4 +0.2 | 26.6 -0.6 | 24.7 -2.5 | 20.9 -6.3 | 17.3 -9.9 | |||
paracrawl clean | 28.1 +0.9 | 28.5 +1.3 | 29.5 +2.3 | 30.3 +3.0 | 30.8 +3.5 | 30.1 +2.9 | 30.5 +3.2 | 29.9 +2.7 |