The chart decoder is a recursive variant of CKY+ parse/decoding which is able to process arbitrary context free grammars with no limitations on the number of terminals or non-terminals in a rule. During decoding, all contiguous spans over the input spans are filled with hypotheses (partial translations). Rules are stored in a prefix tree, which is processed incrementally, in a way akin to Early parsing. Once rules are looked up, cube pruning is applied to pick off the most likely applicable rules and hypotheses from underlying spans.
The main loop of the decoding process fills up the stack bottom up: first looping of the width of the span, and then over the starting position of the span.
for (size_t width = 1; width <= size; ++width) { for (size_t startPos = 0; startPos <= size-width; ++startPos) {
For each span, first the applicable rules are created and then the rules are applied.
m_transOptColl.CreateTranslationOptionsForRange(startPos, endPos); ChartCell &cell = m_hypoStackColl.Get(range); cell.ProcessSentence(m_transOptColl.GetTranslationOptionList(range) ,m_hypoStackColl);
Processing a span concludes with pruning, cleaning, and sorting of the hypotheses that were placed into the span.
cell.PruneToSize(); cell.CleanupArcList(); cell.SortHypotheses();
ChartTranslationOptionCollection::CreateTranslationOptionsForRange
Get existing data (or pointer?) from global rule collection
ChartTranslationOptionList &chartRuleColl = GetTranslationOptionList(startPos, endPos);
Multiple rule tables may be consulted. In fact, in most setups there will be a main rule table and a rule table for glue rules. So, we need to consult each of them.
ChartRuleLookupManager &ruleLookupManager; ruleLookupManager.GetChartRuleCollection(wordsRange, true, chartRuleColl);
ChartRuleLookupManager::GetChartRuleCollection
There are multiple implementations of the rule table. At the time of this writing, there are two: one that is kept entirely in memory and a second one that keeps the data on disk. There is an obvious RAM/speed trade-off here: the in-memory rule table is faster, but for some models there may not be sufficient RAM. Both are implemented very similarly, however.
For a given span, there may be many rules that could apply. Each rule may consume input words directly or build on any number of hypotheses that were generated for sub-spans. There is a combinatorial explosion of sub-spans and input words that could be combined, if there is a rule in the rule table.
The implementation of rule lookup is inspired by Early parsing, which allows for incremental lookup. Consider the English-German rule:
PP/NP -> of the JJ_1 problem of NP_2 ; des ADJ_1 Problems NP-GEN_2
This is a good time to clarify some terminology:
PP
is the source side parent non-terminal
NP
is the target side parent non-terminal
JJ
and NP
are the source side child non-terminals
of
, the
, problem
and of
are the source side child terminals (or words)
ADJ
and NP-GEN
are the target side child non-terminals
des
and Problems
are the target side child terminals (or words)
Instead of child, the term right hand side, and correspondingly instead of parent, the term left hand side is often used.
To check if the rule matches, we have to see if there are indeed English words of
, the
, problem
, and of
in the input and the intervening spans have the label JJ
and NP
. In addition, the rule can only apply if we have hypotheses with the constituent labels ADJ
and NP-GEN
in the corresponding spans.
We store the rule in a prefix structure with the following nodes:
of -> the -> JJ/ADJ -> problem -> of -> NP/NP-GEN -> des ADJ_1 Problems NP-GEN_2
The key insight is the following: If there is such an applicable rule for the span under consideration, then a sub-span with the same start position matched part of this prefix path, namely:
of -> the -> JJ/ADJ -> problem -> of
If, by the time we build the prefix, we know all possible terminal or non-terminal extensions (such as NP/NP-GEN
) that match the prefix and start at the end of the sub-span, we can recursively find all rule applications, visiting each prefix and sub-span exactly once. Note: It does not matter if there is a translation rule associated with the prefix path, all it matters that it matches the sub-span.
We traverse the chart in a depth-first right-to-left order to ensure that all chart cells starting at the end position of the sub-span under considerations have been processed before (and that we can immediately find all applicable extensions of the prefix, such as NP/NP-GEN
).
The code in
ChartRuleLookupManagerMemory::GetChartRuleCollection
implements the lookup that we just described. It is extensively commented, so that is should be understandable without further explanation.
From the outside, each time the function is called for a span, it updates its internal data structures for processed rules and returns applicable rules for the span (with their target side) as a ChartTranslationOptionList
.
Let us take a closer look at the data structures used in rule lookup in the function
ChartRuleLookupManagerMemory::GetChartRuleCollection
.
Recall that processed rules are lookup up using a prefix tree, for example:
of -> the -> JJ/ADJ -> problem -> of -> NP/NP-GEN -> des ADJ_1 Problems NP-GEN_2
To use the rule, we need to know how each of these nodes in this path matches the chart, in other words: how each node matches a span. This is encoded in a linked list of CoveredChartSpan
s.
Each CoveredChartSpan
contains the start and end position in the span (store in a WordsRange
, which is essential a pair of integers with additional utility functions), the source word or source span label that is matched (as a Word
), and a back-pointer to the previous CoveredChartSpan
(hence the linked list).
The linked list of CoveredChartSpan
s contains all the necessary information about applying the rule to the source side. Target side information is stored elsewhere. Note that target-side non-terminals are not stored anymore, since they will be contained in the target side information.
The core operations of the data structure are GetSourceWord
, GetWordsRange
, and GetPrevCoveredChartSpan
. There is also the utility function IsNonTerminal
which checks if the last word in the linked list is a non-terminal.
A ChartTranslationOptionList
is a vector of ChartTranslationOption
s, with additional utility functions.
Rules are added to this in batches, since rules are stored in the prefix tree first by matching the source words and child non-terminals, pointing towards a list of target sides.
The list contains fully fledged out rules with target sides (opposed to the cube pruning algorithm described by Chiang, where rules are groups modulo the target side words). The list also observes rule limits, i.e., the maximum number of rules considered for a span. It sorts the rules by the future score of the target side (weighted translation model costs and language model estimates), and prunes out the worst ones when the limit is exceeded. Internally a score threshold is kept, to not even add rules that would be pruned out anyway.
Note that when adding ChartTranslationOption
ultimately some additional processing has to be done --- the computation of the alignment between non-terminals by calling ChartTranslationOption::CreateNonTermIndex
. This is done lazily, once the list finalized and pruned.
A ChartTranslationOption
contains all the information about a rule and its application to the span. It contains a linked list of CoveredChartSpan
which details how the rule matches the input side, a TargetPhrase
with the output, and the span it applies to (in a WordsRange
).
This information has to be specified at instantiation and can be queried with the functions GetLastCoveredChartSpan
, GetTargetPhrase
, and GetSourceWordsRange
.
Once created, the mapping between the source and target non-terminals is computed by calling CreateNonTermIndex
and can be queried with GetCoveredChartSpanTargetOrder
. That information is already stored with the TargetPhrase
, but it is reformated here for easier querying (at the cost of a higher memory footprint).
This class implements the prefix tree that contains the rules.
Above, we described how all the rules that apply to the current span are retrieved. In fact, more than that is done: we also annotate each rule how it applies to the span, especially how the non-terminals match the sub-spans.
Applying a rule now only requires the selection of the hypotheses in the specified sub-spans that match the non-terminals in the rule.
To repeat our example:
PP/NP -> of the JJ_1 problem of NP_2 ; des ADJ_1 Problems NP-GEN_2
We have already identified which sub-spans the target non-terminals ADJ
and NP-GEN
apply to. For each of these, however, we may have multiple choices (note that there will be at least one for each, otherwise we would not consider the rule).
The term cube pruning derives from the fact that we explore for each rule a multi-dimensional space, with one dimension for each non-terminal (and, in the original Chiang algorithm, but not here, one dimension for the target phrase). This space is not always a cube, only if there are three dimensions (two non-terminals in the Chiang algorithm), and even then it is not a cube because the dimensions typically have differing lengths. And it is not technically a pruning algorithm (which removes bad examples after the fact), but a greedy search for the best rule applications. But hey, what's in a name?
Given the cube, we sort each dimension, so that the most promising rule application is in the top left front corner. Most promising, because for each of the non-terminals, we use the matching hypothesis with the best score (and the target phrase with the best future cost estimate).
Finally recall that there are multiple cubes, one for each applicable rule.
The cube pruning algorithm is given two data structures, a ChartTranslationOptionList
that contains all applicable rules (they are now called ChartTranslationOption
) and the ChartCellCollection
that contains the chart as it has been filled in so far. The cube pruning algorithm is housed in the ChartCell
that corresponds to the span we are now filling.
First, we have to build the RuleCubeQueue
. Recall, how each applicable rule has a cube. Well, we throw them all together into one big cube (not really a regular geometrical shape, since the dimensions differ for each rule application). Be that as it may, the first part of the algorithm loops through the ChartTranslationOptionList
and creates a RuleCube
and adds it to the cube.
The RuleCubeQueue
keeps the RuleCube
s sorted, so that we can always pop off the most promising rule application with its most promising underlying sub-span hypotheses. For a specified number of times (staticData.GetCubePruningPopLimit()
)
RuleCube
(by calling ruleCubeQueue.Pop()
)
ChartHypothesis
and calculate its score (hypo->CalcScore()
)
ChartCell
RuleCubeQueue
(ruleCube->CreateNeighbors(ruleCubeQueue)
)
A chart cell contains the hypothesis that were created for a span. These hypothesis are grouped by their target side non-terminal.
RuleCubeQueue
is a priority queue of RuleCube
s (candidate rule applications). Initially it contains the top-left-front rule application for each ChartTranslationOption
. When these are expanded, their neighbors are added to the RuleCubeQueue
.
Note that the same neighbor might be reached in multiple ways. If the rule applications (0,0,0), (1,0,0) and (0,1,0) are popped off, then the latter two point to (1,1,0). This is checked in RuleCubeQueue
, which does not allow insertion of duplicates.
This class contains the cube for a rule. It contains information about the ChartTranslationOption
and the a list of underlying sub-span hypotheses for each non-terminal in the rule. The latter is represented as a vector of ChildEntry
s, which are essentially ordered lists of hypotheses with additional utility functions.
When the RuleCube
is created from a ChartTranslationOption
, the vector of ChildEntry
s is assembled from the information in the chart. Also, the estimated score of the top-left-front rule application is computed and stored. Note that this is a estimated score, it does not have the real language model cost.
A RuleCube
always points to a particular rule application (i.e., particular sub-span hypotheses) in the cube. If it is picked to create an hypothesis, then its neighbors are added to the RuleCubeQueue
--- this is implemented in the function CreateNeigbors
. Consequently, for a particular ChartTranslationOption
, there may be multiple RuleCube
s in the RuleCubeQueue
.
New hypotheses are build in the ChartCell::ProcessSentence
function from a RuleCube
.
ChartHypothesis *hypo = new ChartHypothesis(*ruleCube, m_manager); hypo->CalcScore(); AddHypothesis(hypo);
A ChartHypothesis
contains various type of information about a entry in the chart, i.e., a translation of the covered span.
size_t m_id
hypothesis ID
Manager& m_manager
reference to manager
WordsRange m_currSourceWordsRange
covered span
ChartTranslationOption &m_transOpt
rule that created it
vector<size_t> &m_coveredChartSpanTargetOrder
covered sub-spans
ScoreComponentCollection m_scoreBreakdown
all scores
ScoreComponentCollection m_lmNGram
language model scores
ScoreComponentCollection m_lmPrefix
estimated language model scores for prefix
float m_totalScore
total weighted score
Phrase m_contextPrefix
first words (not yet fully LM-scored)
Phrase m_contextSuffix
last words (affect later attached words)
size_t m_numTargetTerminals
length of phrase (number of words)
ChartHypothesis *m_winningHypo
points to superior hypothesis if recombined away
ArcList *m_arcList
all arcs that end at the same trellis point as this hypothesis
vector<const ChartHypothesis*> m_prevHypos
underlying hypotheses
When a hypothesis is created, the book-keeping and information relevant for recombination and later use is set.
Scores are computed by the function CalcScore
, by adding up the scores from the underlying hypothesis, the rule application, and language model scoring of the resulting phrase. Language model scoring (in function CalcLMScore
) is a bit complex, since we do not want to re-compute any of the language model scores that we already computed for the underlying hypotheses. See the documented code for details.
Hypothesis recombination is handled by ChartHypothesisCollection
. The function Add
is called to check if the hypothesis is recombinable with anything already in the collection (the class ChartHypothesisRecombinationOrderer
handles state matching and calls ChartHypothesis::LMContextCompare
). AddHypothesis
calls Add
, and handles the recombination by potentially replacing the existing hypothesis and setting arcs (ChartHypothesis::AddArc
).