It works 2008/05/17 @ 12:24 AM :)
Cube pruning is implemented and works!
http://mosesdecoder.svn.sourceforge.net/viewvc/mosesdecoder/branches/mtm2_cube_pruning/
Cube Pruning
- The cube pruning algorithm is implemented in two stages. For each stack (from left to right)
- first look forward to create new bitmap coverages reachable from all hyps in the current stack
- then use the cube-pruning k-best algorithm to enumerate the k-best for each coverage in the current stack
This is implemented in Manager.cpp, in processSentence.
We are still thinking about how to pull this out so that we can also allow the use of the original search algorithm. The idea is to have a manager-like class for each algorithm.
Moore and Quirk speed ups.
Moore and Quirk MT Summit 2007 showed that you can preload distortion costs, so that any costs to jump backwards are paid immediately when you jump forward (and not deferred until later).
We implemented this by changing the DummyScoreProducer::CalculateDistortionScore. We also added additional functionality to WordsRange and disabled setting the boolean parameters UseDistortionFutureCosts to true (probably implemented wrong anyway).
The early pruning stuff may be redundant with cube pruning, but I think there might still be something to being more aggressive with looking at the future distortion cost that will be incurred (see below).
Current Progress
- BitmapContainer class contains BackwardsEdges to "generating" BitmapContainers
- Any BackwardsEdge will contain the "pruning square", i.e. the k-best hypotheses (taken from the BitmapContainer that the Edge is pointing to) and the k-best translation options (which are derived from the phrase table using the "complementary" bitmap)
- We use a priority queue to perform the "square pruning". Payload is a pointer to a Hypothesis object and the coordinates in the "square" (which we will need to expand the frontier).
- The actual cube pruning will be done as follows:
- iterate over all BitmapContainers for a given Stack
- get the BackwardsEdge with the highest score in the priority queue, i.e. the best unconsumed new hypothesis.
- add this best hypothesis to the corresponding Stack and expand the two frontier hypotheses.
- put those two unconsumed hypotheses into the priority queue of the edge.
- continue with the (now) best edge (may be a different one)
- Frontier nodes: we are using a hash to block inserting an expanded hypothesis into the frontier priority queue (this can happen once when it is expanded by the node above it, and once when it is expanded by the node to its left). There is nothing in the 2007 paper's algorithm that shows a trick of how to do this without this memory.
- Distortion limits: We need to remember to check that distortion limits aren't violated. There were bugs in this check in the previous version of Moses. Hieu has put a "necessary" check when creating new bitmap coverages, which helps to limit how many bitmap coverages he needs to create. But we still have to block extension of hypotheses that violate the distortion limit. I think the right place to put this might be in BackwardsEdge::PushSuccessors. The distortion distance of all translationOptions for a particular hypothesis (in a particular square!) will be the same, so we only need to check this once per hypothesis and square combination.
- Using Distortion in the sorting function for the columns in Cube-Pruning. The distortion cost for extending one hypothesis to a particular coverage will be the same for all translationOptions in that coverage. If we add this distortion cost to the hypothesis TotalCost, we have a better estimate of what we will pay; we could consider resorting the hypotheses on this new cost but it is probably too expensive.
- Recombination: we want k-best hypotheses from each cube. Right now I think that means we are pulling k hypotheses off of the cube. But some of these can be recombined. So we need to not increment the k-counter if a hyp is recombined.
Random Thoughts (add your ideas or thoughts here, guys)
- cfedermann:
- Can we delete a BackwardsEdge once the priority queue is empty?
- We should add an attribute m_notExpanded to the BackwardsEdge class which tells us whether the edge has already been expanded or not...
- can C++ priority queues contain the same value more than once? If that is the case we have to take that into account when expanding frontier hypotheses...
- what the heck, we should just keep track of all "used" square positions, way simpler :)
- sbpoggel
- afraser
- emacs tabs: M-x set-variable RETURN tab-width RETURN 2
Papers & Slides
Debugging info
CubeDebug