Errata

Statistical Machine Translation

Errata

This list was compiled with the help from Juan Antonio Pérez-Ortiz and Felipe Sánchez-Martínez (Universitat d'Alacant, Spain), Sebastien Bratieres (Cambridge University), Hideto Kazawa (Google), Jinyi He, Adam Przepiorkowski, Andrei Arsene Simion, Pierre Lison, Jon Dehdari, Lucas Avan¸o, Joerg Tiedemann, Selçuk Köprü, Lane Schwartz, Till Amelung, and Haotian Zhu.

Chapter 1

p. 21: Google does not use Systran since October 2007.

p. 25: Links to SAMT and METEOR (footnotes 12 and 14) are missing a tilde.

Chapter 2

p. 52: Text at the end of the first paragraph should read the rule VP -> VB NP in English as opposed to VP -> NP VB in German instead of the rule VP -> VB NP in English as opposed to VP -> NP VP in German. The same error is in the caption for Figure 2.11.

Chapter 3

p. 70: Text at very top should read This rule defines a conditional probability distribution p(x|y) (also called the posterior) in terms of its inverse p(y|x) and the two elementaty probability distributions p(x) (called the prior) and p(y) ... instead of This rule a conditional probability distribution p(x|y) expresses in terms of its inverse p(y|x) (called the posterior) and the two elementaty probability distributions p(x) (called the prior) and p(y) ...

p. 71: Text before Equation 3.17 should read variance ... is computed as the *arithmetic* mean of the *squared* difference between ... instead of variance ... is computed as the geometric mean of the difference between ...

p. 71: Final number in Equation 3.18 should read 2.04 instead of 2.4.

p. 72: Part of paragraph 2 should read one-in-a-million chance instead of one-in-a-million change.

p. 73: In caption to Figure 3.4, entropy should be H(X) instead of E(X).

p. 75: In caption to Figure 3.5, joint entropy should be H(X,Y) and conditional entropy should be H(Y|X). The figure itself is correct.

Chapter 4

p. 84: Text before Equation 4.5 should read at position j to a German output at position i. The indexes are erroneously flipped in the book.

p. 87: The result of the computation in Equation 4.8 should be $0.00029\epsilon$ instead of $0.0029\epsilon$ .

p. 90: Text before Equation 4.13 should read the same simplifications as in Equation (4.10). The book makes an erroneous reference to Equation 4.11.

p. 91: The two end for on line 12 and 13 should be one indentation to the right.

p. 91: Left hand side of equation 4.14 should be $t(e|f)$ instead of $t(e|f;\text{\bf e},\text{\bf f})$ .

p. 93: The first sum should be a product.

p. 93: The result of the computation of Equation 4.20 should be $0.25\epsilon$ instead of $0.25$ .

p. 93: The text should clarify that in Equation 4.21 perplexity is computed over the whole corpus, and not the average per sentence.

p. 98: Equation 4.26 should be

$p({\bf e}|{\bf f}) = \cdots = \epsilon \prod_{j=1}^{l_e} \sum_{i=0}^{l_f} t(e_j | f_i) a(i|j,l_e,l_f)$ instead of $p({\bf e}|{\bf f}) = \cdots = \epsilon \prod_{j=1}^{l_e} \sum_{i=0}^{l_f} t(e_j | f_{a(j)}) a(a(j)|j,l_e,l_f)$

p. 98: Equation 4.27 should be

$c(e|f; {\bf e}, {\bf f}) = \sum_{j=1}^{l_e} \sum_{i=0}^{l_f} \frac{t(e|f) a(i|j,l_e,l_f) \delta(e,e_j) \delta(f,f_i)}{\sum_{i'=0}^{l_f} t(e|f_{i'}) a(i'|j,l_e,l_f)}$ instead of $c(e|f; {\bf e}, {\bf f}) =\sum_{j=1}^{l_e} \sum_{i=0}^{l_f} \frac{t(e|f) a(a(j)|j,l_e,l_f) \delta(e,e_j) \delta(f,f_i)}{\sum_{i'=0}^{l_f} t(e|f_{i'}) a(i'|j,l_e,l_f))}$

p. 99: Lines 21 and 24 of Figure 4.7 should read l_e and l_f instead of le and lf.

p. 99: In line 10 of Figure 4.7 should read length(e) instead of length(e).

p. 99: Equation 4.28 should be

$c(i|j, l_e, l_f; {\bf e}, {\bf f}) = \frac{t(e_j|f_i) a(i|j,l_e,l_f)}{\sum_{i'=0}^{l_f} t(e_j|f_{i'}) a(i'|j,l_e,l_f)}$ instead of $c(i|j, l_e, l_f; {\bf e}, {\bf f}) = \frac{t(e_j|f_i) a(a(j)|j,l_e,l_f)}{\sum_{i'=0}^{l_f} t(e_j|f_{i'}) a(i'|j,l_e,l_f))}$

p. 103: The conditional fertility distribution should be $n(\phi_i|f_i)$ instead of $n(\phi_i|e_i)$ . This occurs in Equation 4.32 and 4.33 and in the mention inbetween.

p. 103: Equation 4.33 should be

$p({\bf e}, {\bf f}) = \cdots = \epsilon \sum_{a(1)=0}^{l_f} \cdots \sum_{a(l_e)=0}^{l_f} {l_e - \phi_0 \choose \phi_0} \ \cdots$ instead of $p({\bf e}, {\bf f}) = \cdots = \epsilon \sum_{a(1)=0}^{l_f} \cdots \sum_{a(l_e)=0}^{l_f} \prod_{j=1}^{l_e} {l_e - \phi_0 \choose \phi_0} \ \cdots$

p. 106: Line 13 of the algorithm should make reference to the term $f_{a(j)}$ , not the term $f_{a}(j)$ .

p. 107: Line 3 of function neighboring should start with for instead of for for.

p. 108: In Figure 4.9, the words go and not should be swapped. The alignment links of these words are correct.

p. 108: Text should read each input word f_i instead of each input word f_j.

p. 109: Two lines before item (c) should read backward movement instead of forward movement.

p. 109: Equation 4.39 should be $d_1 (j - \odot_{i-1} | f_{[i-1]}, e_j)$ instead of $d_1 (j - \odot_{[i-1]} | f_{[i-1]}, e_j)$

p. 110: Equation 4.40 should be $d_1 (j - \odot_{i-1} | \cal{A}(f_{[i-1]}), \cal{B}(e_j))$ instead of $d_1 (j - \odot_{[i-1]} | \cal{A}(f_{[i-1]}), \cal{B}(e_j))$

p. 112: Figure 4.10 has the same alignment error as Figure 4.9.

p. 112: In Figure 4.10, the &phi_i,k should start with index 0 instead of index 1.

p. 115: Equation 4.42 should be ${\textrm AER}(S, P; A) = 1 - \frac{|A \cap S| + |A \cap P|}{|A| + |S|}$ instead of ${\textrm AER}(S, P; A) = \frac{|A \cap S| + |A \cap P|}{|A| + |S|}$ .

Chapter 5

p. 129: Section 5.1.2: The value for α cannot be 0, so it's value must be in the range (0;1], not [0;1].

p. 133: Line 4 of function extract in Figure 5.5. should be also include the condition f_start <= f <= f_end .

p. 135: The sentence There are 45 distinct contiguous English phrases and 55 distinct contiguous German phrases should replace There are 36 distinct contiguous English phrases and 45 distinct contiguous German phrases.

p. 138: The second paragraph of Section 5.3.2 should be rewritten as: It may be that in the training data a rare long English phrase e exists that mistakenly gets mapped to a common foreign phrase f. In this case φ(f|e) is very high, maybe even 1. If we encounter the phrase f again in the test data, this erroneous phrase translation may be used to produce the highest probability translation: The translation model likes it --- high φ(f|e) --- and the language model may likes it as well, if e is made up of common English n-grams.

p. 140: Caption to Figure 5.7 should not mention p_w.

p. 148: The second line of the third paragraph should read phrase instead of word.

Chapter 6

p. 162: In Figure 6.4, the hypothesis box for it should have the first word marked off as covered, instead of the second word.

p. 170: In Figure 6.9, line 7 of the pseudo-code should read in part cost(start, i+1) instead of cost(start, i). p. 171: Some of the cells do not have the correct values, the table should read:

first word	future cost estimate for n words (from first)
first word	1	2	3	4	5	6	7	8	9
the	-1.0	-3.0	-4.5	-6.9	-8.3	-9.3	-9.6	-10.6	-10.6
tourism	-2.0	-3.5	-5.9	-7.3	-8.3	-8.6	-9.6	-9.6
initiative	-1.5	-3.9	-5.3	-6.3	-6.6	-7.6	-7.6
addresses	-2.4	-3.8	-4.8	-5.1	-6.1	-6.1
this	-1.4	-2.4	-2.7	-3.7	-3.7
for	-1.0	-1.3	-2.3	-2.3
the	-1.0	-2.2	-2.3
first	-1.9	-2.4
time	-1.6

Chapter 7

p. 183: The third paragraph should read actual number instead of actually number.

p. 195: Equation 7.26 should have still the summation in the second and third line.

p. 200: Equation 7.38 should be $p_{KN}(w) = \frac{N_{1+}(\bullet w)}{\sum_{w_i} N_{1+}(\bullet w_i)}$ instead of $p_{KN}(w) = \frac{N_{1+}(\bullet w)}{\sum_{w_i} N_{1+}(w_iw)}$ .

p. 201: Equation 7.39 should start with p(w_n|...) instead of p(w₁n|...).

p. 198: Equation 7.32 has a superfluous closing parenthesis in the α function.

p. 206: Equation 7.48 should start p_LM(w_n|w₁...) instead of p_LM(w_n|w₀...).

p. 207: The first sentence should read c(...) = 0 instead of c(...) > 0.

p. 207: Equation 7.49 should have a multiplication between the backoff distribution and the p₃ distribution instead of a summation. The summation within the exponential is correct.

p. 208: Equation 7.50 should have a multiplication between the backoff distributions and the p₂ distribution instead of a summation.

Chapter 8

p. 219: The citation should read Koehn and Monz [2006] instead of Koehn and Monz [2005].

p. 224: The last paragraph before Section 8.2.2 should read word-order insensitive instead of word-order sensitive.

p. 224: The figures in the table for the System B are all wrong; the correct ones are: precision 100%, recall 86%, f-measure 92%.

p. 226: The definition of the brevity penalty in Equation 8.8 is wrong, it should be $min$1,\;\exp\(1-\frac{\text{reference-length}}{\text{output-length}}$\)$ instead of $min$1,\;\frac{\text{output-length}}{\text{reference-length}}$$ .

p. 226: The text before Equation 8.9 should read ... are typically set to $\frac{1}{4}$ , which simplifies ... instead of ... are typically set to 1, which simplifies ...

p. 226: Equation 8.9 should have an exponent around the product: $$\prod_{i=1}^4\text{precision}_i$^{\frac{1}{4}}$ .

p. 227: The third paragraph should read the English word "the" instead of the English world "the".

p. 238: TER is also known as translation edit rate and HTER is also known as human-targeted translation edit rate.

p. 241: See comment above.

Chapter 9

p. 269: In Figure 9.11, lines 4-20 should be indented 2 more spaces and line 21 indented one more space. Line 19 should line up with line 16.

p. 270: Equation 9.27 should contain &lambda_j instead of &lambda_i. The same mistake also occurs in the paragraph above the equation.

p. 271: The caption for Figure 9.12 should read the line between the worst point instead of the line between the best point.

p. 271: The last bulleted item should start If R is worse than worst, then instead of If R is worse that worst, then.

p. 280: The sixth paragraph should read agree on how instead of agree. On how.

Chapter 10

p. 294: In the first transducer in Figure 10.3, the probabilities 0.7 and 1.0 should be swapped on the path to circus. In the last paragraph of the figure, the probability for circa should be 0.063 instead of 0.027 and the probability for circus should be 0.00294 instead of 0.00126.

p. 299: The Turkish example sentence should be Sonuçlara dayanılarak bir ortaklık oluşturulacaktır instead of Sonuçlarına dayanılarak bir ortaklışı oluşturulacaktır. The morphological analysis is sonuç +lar +a daya +hnhl +arak bir ortaklık oluş +dhr +hl +acak +dhr.

p. 306: In Figure 10.8 the German word aushändigen is misspelled as aushandigen.

p. 309: The first rule PPER VAFIN -> PPER VAFIN does not reorder.

Chapter 11

p. 335: The second rule should be PPER/PP -> Ihnen | to you instead of PRO/PP -> Ihnen | to you.

p. 335: The second and third tree structures should contain PPER instead of PRO.

p. 339: Equation 11.2 should be $\forall f_j \in \bar{f}: (e_i, f_j) \in A \rightarrow e_i \in \bar{e}$ instead of $\forall f_j \in \bar{f}: (e_i, f_j) \in A \rightarrow e_i$ .

p. 340: Figure 11.4 should have the label PPER instead of PRO as the tag for Ihnen in the second extracted rule.

p. 341: The first example should have the label PPER instead of PRO as the tag for Ihnen.

. p. 342: The last tree should contain RP instead of RB.

p. 343: The sixth paragraph should refer to Equation 11.3 instead of Equation 11.2.

p. 344: The second rule should contain VVINF instead of VAFIN.

p. 347: The last grammar rule in Figure 11.5 should contain VBZ - wants instead of NN - wants.

p. 348: The second tree structure contain VBZ - wants instead of NN - wants.

p. 349: The second paragraph we have to instead of we have to have to.

p. 354: The first line should read DET and NN instead of DET and NP.

p. 355: The rule in the last line should be DET NN • instead of DET NN NN •.

p. 356: The two bottom right chart boxes of the top chart should contain the non-terminal NNP instead of the rule NNP -> NNP.

p. 356: The bottom chart of Figure 11.11 should contain NP: architect Frank Gehry instead of NN: architect Frank Gehry.

p. 356: The rule in the first line should be DET NN • instead of DET DET NN NN •.