Search Descriptions


Neural machine Translation

Statistical Machine Translation

Search Publications





Alignment of Subsentential Units

Sometimes the goal is not to align all the words, but the targeted alignment of specific kinds of words and phrases.

Alignment Of Subsentential Units is the main subject of 42 publications. 33 are discussed here.


Instead of tackling the full word alignment problem, more targeted work focuses on terminology extraction, for instance the extraction of domain-specific lexicons (Resnik and Melamed, 1997), noun phrases (Kupiec, 1993; Eijk, 1993; Fung, 1995), collocations (Smadja et al., 1996; Echizen-ya et al., 2003; Orliac and Dillinger, 2003), non-compositional compounds (Melamed, 1997), named entities (Moore, 2003), technical terms (Macken et al., 2008), or other word sequences (Kitamura and Matsumoto, 1996; Ahrenberg et al., 1998; Martinez et al., 1999; Sun et al., 2000; Moore, 2001; Yamamoto et al., 2001; Baobao et al., 2002; Wang and Zhou, 2002). Translation for noun phrases may be learned by checking automatically translated candidate translations against frequency counts on the web (Robitaille et al., 2006; Tonoike et al., 2006).
There are many methods to extract subtrees from a parallel corpus, aided either by a word-aligned corpus or a bilingual lexicon and a heuristic to disambiguate alignment points. For instance, such efforts can be traced back to work on the alignment of dependency structures by Matsumoto et al. (1993). Related to this are efforts to align syntactic phrases (Yamamoto and Matsumoto, 2000; Imamura, 2001; Imamura et al., 2003; Imamura et al., 2004), hierarchical syntactic phrases (Watanabe and Sumita, 2002; Watanabe et al., 2002), and phrase structure tree fragments (Groves et al., 2004) as well as methods to extract transfer rules, as used in traditional rule-based machine translation systems (Lavoie et al., 2001). The degree to which alignments are consistent with the syntactic structure may be measured by distance in the dependency tree (Nakazawa et al., 2007). Tinsley et al. (2007) use a greedy algorithm that uses a probabilistic lexicon trained with the IBM models to align subtrees in a parallel corpus parsed on both sides. Zhechev and Way (2008) compare it against a similar algorithm. Lavie et al. (2008) use symmetrized IBM model alignments for the same purpose and discuss effects of alignment and parse quality.



Related Topics

New Publications

  • Pal et al. (2013)
  • Lardilleux and Lepage (2008)
  • Bryl and Genabith (2010)
  • Nakazawa and Kurohashi (2009)
  • Sun et al. (2010)
  • Lardilleux et al. (2012)
  • Pal and Bandyopadhyay (2012)
  • Sun et al. (2000)
  • Liu et al. (2004)