Several papers have proposed a maximum entropy model for the selection of phrase translations, and the training side of this is pretty straight-forward: Look at the (source!) context of phrases and use it to train a maximum entropy classifier. Note this is done on the training corpus, so no online training. The harder part is an efficient representation of the model, since it will be larger than the traditional phrase-based model and hence requires to be stored on disk. So, the main task is to devise a on-disk data structure, its creation and its querying from the decoder.