Sparse feature functions in Moses allow for thousands of features that follow a specific pattern, typically lexical instantiations of a general feature function. Take for instance the target word insertion feature function, which allows the training of lexical indicators for any word (say, the or fish). Each lexicalized instantiation has its own feature weight, which is typically trained during tuning. Inserting a the should be fine, inserting the word fish not so much, and the learned feature weight should reflect this.
In Moses, all feature functions can contain sparse features and dense features. The number of dense feature has to be specified in advance in moses.ini
file, e.g.,
KENLM num-features=1 ...
The decoder doesn't have to know whether a feature function contains sparse features. And by definition, the number of sparse features is not specified beforehand.
Sparse lexical features require a special weight file that contains the weight for each instantiation of a feature.
The weight file has to be specified in the moses.ini
file:
[weight-file] path/sparse-weights
This file may look like:
twi_fish -0.5 twi_of -0.001 [...]
By convention, the format for sparse features is
InstanceName_SparseFeatureName
Of course, you want to learn these feature weights during tuning, which requires the use of either PRO or kbMIRA - it does not work with plain MERT.
There are three types of lexical feature function:
moses.ini
The following lines need to be added to the configuration file:
[feature] TargetWordInsertionFeature factor=FACTOR [path=FILE] SourceWordDeletionFeature factor=FACTOR [path=FILE] WordTranslationFeature input-factor=FACTOR output-factor=FACTOR \ [source-path=FILE] [target-path=FILE]-path= \ simple=1 source-context=0 target-context=0
Note that there is no corresponding weight setting for these features.
The optional word list files (one token per line) restrict the feature function to the specified words. If no word list file is specified, then features for all words a generated.
experiment.perl
Word translation features can be specified as follows:
TRAINING:sparse-features = \ "target-word-insertion top 50, source-word-deletion top 50, \ word-translation top 50 50"
This specifications includes
Instead of top 50
, you can also specify all
when you do not want to have a restricted word list.
Moreover, for the word translation feature, by specifying factor 1-2
, you can change input and output factor for the feature. For the deletion and insertion features, there is only one factor to specify, e.g., factor 1
.
The phrase length feature function creates three features for each phrase pair:
For instance, when the phrase ein Riesenhaus is translated into a giant house, then the three
features pl_s2
(2 source words), pl_t3
(3 target words), and pl_2,3
(2 source words into 3 target words) are triggered.
moses.ini
The following lines need to be added to the configuration file:
[feature] PhraseLengthFeature
experiment.perl
The inclusion of the phrase length feature is similar to the word translation feature:
TRAINING:sparse-features = "phrase-length"
In case of using both the phrase length feature and the word translation features, you will need to include them in the same line.
Domain features flag each phrase pair on in which domain (or more accurately: which subset of the training data) they occur in.
moses.ini
Domain features are part of the phrase table, there is no specific support for his particular type of feature function. A sparse phrase table may include any other arbitrary features. Each line in the phrase table has to contain an additional field that lists the feature name and its log-probability value.
For example, the following phrase pair contains the domain feature flagging that the phrase pair occurred in the europarl
part of the training corpus:
das Haus ||| the house ||| 0.8 0.5 0.8 0.5 2.718 ||| 0-0 1-1 \ ||| 5000 5000 2500 ||| dom_europarl 1
If a phrase table contains sparse features, then this needs to be flagged in the configuration file by adding the word sparse
after the phrase table file name.
experiment.perl
TRAINING:domain-features = "[sparse ](indicator|ratio|subset)"
There are various settings for domain adaptation features. It requires a domain file
that indicates at which lines in the parallel corpus cover lines that stem from
different [CORPUS]
blocks (default, when used in experiment.perl, but a different
domain-file can be also specified.
These features may included as sparse features or as core features in the phrase table, depending in having the prefix Sparse in the parameter.
There are three kind of features:
The frequency of a phrase pair in the training data may be a useful to determine its reliability. The count bin features are integrated into the phrase table, just like the domain features, so please check that documentation.
experiment.perl
The counts of phrase pairs get very sparse for frequent phrases. There are just not that many phrase pairs that occur exactly 634,343 times. Hence, we bin phrase pairs counts, for instance phrase pairs that occur once, twice, three to nine times, and more often.
In experiment.perl
this is accomplished with an additional switch in score settings
. For the example above this looks like this:
TRAINING:score-settings = "--[Sparse]CountBinFeature 1 2 3 10"
Based on the values that are given, different indicator features are included, depending on which interval count the phrase pair falls, e.g., ]2;3] = third bin.
TODO
Models with target syntax require an exact match between nonterminals in a rule and the left-hand-side label of rules that can be substituted into it. With the following rules, a model could be used to decode 'she slept here', but not 'she slept on the floor'.
S --> she slept AVP1 ||| sie schlief AVP1 AVP --> here ||| hier PP --> on the floor ||| auf dem boden
With soft matching, we can allow substitutions of nonterminals even if they do not match.
moses.ini
The following lines need to be added to the configuration file:
[feature] SoftMatchingFeature path=FILE
with FILE containing a user-defined list of allowed substitutions. For the example above, the file needs to contain the following line:
PP AVP
Each substitution (even exact matches) triggers a sparse feature which can be used to prefer some substitutions over others.
The SoftMatchingFeature operates on the target-side labels and is not (yet) implemented for the Scope3 and OnDisk phrase tables.