First Meeting
Grand Themes in Machine Translation
Machine translation is a testing ground for computational models of language. Sometimes we can get away with surface level methods, but for syntactically divergent and semantically distant language, much heavy lifting is required.
- Ambiguity
- Sparse data
- Balancing sparse specific evidence against broad general evidence
- How much "supervision"? E.g., manual word alignments, morphological analyzers, syntactic parsers
Some Major Current Research Directions
- Syntactic / semantic machine translation: a slow climb uphill
- Machine learning / parameter estimation: so many features, so little time
- Scaling up to huge data sets (trillion of words of monolingual, billions of parallel data)
- Low resource challenges (e.g., exploiting comparable data)
- Integration with speech and information extraction
- Collaboration with human translators