Marclator: marker-driven example-based machine translation
Marclator is a free/open-source example-based machine translation (MT) system based on the marker hypothesis, comprising a marker-driven chunker, a collection of chunk aligners, and a simple proof-of-concept monotonic recombinator or "decoder". Marclator is largely comprised of components from MaTrEx, the data-driven machine translation system designed by the Machine Translation group at the School of Computing of Dublin City University (Stroppa and Way 2006, Stroppa et al. 2006). A preliminary version of Marclator will be released just in time for the MT, with a rather long "to do" list (at the address http://computing.dcu.ie/~mforcada/marclator.html.
Here are examples of interesting tasks (with many other possible):
- reviewing/improving the existing marker files (words included, assignment to classes and subclasses)
- creating marker files for new languages from free/open-source lexical resources such as those found in Apertium or Freeling.
- extending the existing documentation
- testing the different distance functions and chunk alignment modes provided
- testing Marclator on different architectures
- integrating inferred chunks in "phrase" based statistical MT
- using rule-based machine translation (e.g., Apertium) as a feature when aligning chunks
Latest news
- Project members: Sandipan Dandapat, Mikel Forcada, Declan Groves, John Moran, Sergio Penkale
- We are working hard to write a Makefile that works on MacOS
- A couple of changes have been made to the code base to correct some errors found, and a new release of Marclator (0.2.1) has been made available at http://www.openmatrex.info