mkcls: Training of word classes.
mkcls is a tool to train word classes by using a
maximum-likelihood-criterion. The resulting word classes are
especially suited for language models or statistical translation
models. The program mkcls was written by Franz
Josef Och.
Usage of mkcls:
mkcls [-nnum] [-ptrain] [-Vfile] opt
-V output classes
-n number of optimization runs (Default: 1); larger number => better results
-p filename of training corpus (Default: 'train')
Example:
mkcls -c80 -n10 -pkorpus -Vkats opt
(generates 80 classes for the corpus 'in' and writes the classes in 'out')
In order to compile mkcls you may need:
- a recent version of the GNU compiler (2.95 or higher)
It is released under the GNU Public
License (GPL).
Citation:
Franz Josef Och: »Maximum-Likelihood-Schätzung von
Wortkategorien mit Verfahren der kombinatorischen
Optimierung« Studienarbeit, Universität Erlangen-Nürnberg, Germany,1995.
Franz Josef Och: »An Efficient Method for Determining
Bilingual Word Classes«; pp. 71-76,
Ninth Conf. of the Europ. Chapter of the Association for Computational Linguistics;
EACL'99, Bergen, Norway, June 1999.
Source code:
newest version on code.google.com
mkcls.2003-09-30.tar.gz
- compiles now also with gcc versions 2.95 - 3.3 / MacOS X
mkcls.2001-01-12.tar.gz
(old version)
Last updated: 12 January 2001,
och@isi.edu