This is a design discussion document about the best way to incorporate sparse features into the Moses experimental pipeline. Any design for sparse features may need to trade off performance (both decoding and training speed) with ease of implementation and experimentation.
New features can be added to the decoder by implementing the FeatureFunction
interface, and adding appropriate initialisation in Moses' God class (Static Data
). They can also be added to the phrase table during scoring (in score.cpp
) but this is currently only possible for phrase-based Moses.
The information required for a feature function dictates where and how it is implemented.
What would be the ideal way of configuring extra features in EMS? Ideally, each one would be turned on with a single configuration, either in the EMS config file itself or in a separate configuration file. The advantage of having features configured in the EMS config file is that it makes it easier for EMS to know what to rebuild if the configuration changes.
Many of the feature functions depend on having certain other options on at other points in the training pipeline, or having other files created that they can use. For example some feature functions require alignments included in the phrase table, and some require a list of (say) the 50 most common source words. All the features require certain options on at tuning time to make sure that the sparse values are included in the n-best list, and they require an extra sparse-weights
file. I think this in principle this could all be taken care off with EMS, but it can be headache keeping track of which information is required by which feature.
The difficulty with EMS integration is that adding a feature may trigger several things to be added at several points, and ems does not support this very well. Let's see what is required by each of the extra features:
build-sparse-lexical-features.perl
, handles building of vocab files etc
additional-ini
to be passed in to create-config
build-sparse-lexical-features.perl
, so it would have to know about the wt configuration options
report-sparse-features
in the ini file - could use -additional-ini
report-sparse-features
to the ini file
So that gives the following EMS extension points for adding extra features:
experiment.meta
and possibly a perl function in experiment.perl
additional-ini
to create-config
report-sparse-features
to the ini file - always required for sparse features. For sparse features added in scoring, this is "stm".
In addition, we must ensure that the correct steps are rerun if the feature configuration is changed. This is automatic if the configuration is inserted through the standard EMS mechanisms.
The idea is to make most of the work in adding a new feature function to EMS declarative. So there would be a new file for experiment.perl
to process, called features.meta
. It would have a section for each feature, specifying (optionally) what needs to be added for the feature at each extension point mentioned in the previous section.
In the config of experiment.perl
, there would be an additional section (say, [EXTRA-FEATURES]
), listing the features and their associated options. For example:
[EXTRA-FEATURES] features = wordtranslation domain phrase-pair domain-type = subset domain-sparse = yes wordtranslation-factor = 1
In this case, there are three extra features added: word translation, domain and phrase pair. The domain feature receives the additional configuration {type=subset, sparse=yes}
, and the word translation feature gets {factor=1}
. The phrase pair features gets the default configuration.
Why doesn't this work?
The problem is that a lot of the information is not really declarative. For example, the domain feature needs to construct arguments like --SparseDomainSubset
. Also figuring out how many phrase features to add requires counting the domains. There's also the problem of ensuring that the correct steps get rerun when the feature configuration changes.
Ideally, what I'd like to be able to do is specify an "interface" which should be implemented for each new feature. However it's not clear to me what the perl idiom is for this.
There's really three types of information that get passed to the decoder at runtime:
Actually the division into those three types is debatable, but it is somewhat useful. The first type of configuration is what you set when you're designing the model. The second is what gets set during discriminative training . And the third is things you might vary in a trained model.
Distinguishing different types of configuration information is useful because they are generated and used differently, and maybe should be configured separately. In particular, should all the weights be stored in their own (separate) file? At the moment there's a distinction between core and sparse features in the way that they are configured and this makes handling the weights during tuning awkward. On the other hand (as Eva found) different types of weights do sometimes need to be treated differently.
Hieu has made some progress in moving towards a common weight file, in the mert-new
branch, but this is now going to have to be merged with the sparse feature code. Moses used to support a weights file for core features, but it didn't work properly and got removed.
This is done by the increasingly omnipotent StaticData
object. Really, feature management (and weight management?) should be offloaded to another class. In fact TranslationSystem
already contains pointers to all the feature functions so maybe it could be co-opted (and renamed) for this purpose? Using extra feature functions interacts badly with the multiple models functionality (which it was added to support) but then perhaps it's time to retire the multiple models feature? There's much less need for it now that we have kenlm and memory mapping.