Marclator: marker-driven example-based machine translation

Marclator is a free/open-source example-based machine translation (MT) system based on the marker hypothesis, comprising a marker-driven chunker, a collection of chunk aligners, and a simple proof-of-concept monotonic recombinator or "decoder". Marclator is largely comprised of components from MaTrEx, the data-driven machine translation system designed by the Machine Translation group at the School of Computing of Dublin City University (Stroppa and Way 2006, Stroppa et al. 2006). A preliminary version of Marclator will be released just in time for the MT, with a rather long "to do" list (at the address http://computing.dcu.ie/~mforcada/marclator.html.

Here are examples of interesting tasks (with many other possible):

  • reviewing/improving the existing marker files (words included, assignment to classes and subclasses)
  • creating marker files for new languages from free/open-source lexical resources such as those found in Apertium or Freeling.
  • extending the existing documentation
  • testing the different distance functions and chunk alignment modes provided
  • testing Marclator on different architectures
  • integrating inferred chunks in "phrase" based statistical MT
  • using rule-based machine translation (e.g., Apertium) as a feature when aligning chunks

Latest news

  • Project members: Sandipan Dandapat, Mikel Forcada, Declan Groves, John Moran, Sergio Penkale
  • We are working hard to write a Makefile that works on MacOS
  • A couple of changes have been made to the code base to correct some errors found, and a new release of Marclator (0.2.1) has been made available at http://www.openmatrex.info
Page last modified on January 28, 2010, at 10:08 AM