ccb: I'm taking notes while Chris D is coding up the interface. He's not writing comments so I'm taking them down here.
FinishingAgenda
The FinishingAgenda stores Vertices that are not fully formed; they may not make the cut and be added to the Hypergraph. It contains a set of grouped vertices (for instance, in a phrase-based decoder the groups of vertices could represent the beam stacks).
ExplorationAgenda
Is an object which does not need to be subclassed for a particular search strategy. It provides methods for getting the next active vertex to be processed, and adding pairs of active and passive vertices which you will explore (called a traversal in the parsing lit / Klein paper).
It's up to the user to put things on the ExplorationAgenda properly. For instance, if something results in both an active and passive state the user must put both on. As with S -> NP VP *, S -> NP VP * PP.
VertexGroup
This represents the equivalence classes used by the FinishingAgenda. It maintains a heap of the vertices within it, keeping them sorted by whatever comparator functions they define. Because the VertexGroups themselves should be comparable, it has to define this method. For instance, it could keep track of the score of the vertex at the top of its heap. Alternately it could first sort by some feature of the Signature like the number of source words covered in translation.
It should contain the information necessary to determine the compatible passive edges.
The VertexGroup may by a singleton set if there aren't equivalence classes associated with a search method.
VertexSignatureCreator
Maps from Strings to Objects. The String is a query about some particular aspect of the Vertex, such as "language model state" or "coverage vector".
SearchStrategy
Three methods:
This class will also keep all of the basic expansions like the translation options.
Decoder
Initializes a finishing agenda. While the finishing agenda or the exploration agenda still have elements, it gets the next pair of Vertexes (a traversal) from the ExplorationAgenda. It retrieves the combinable vertices from the SearchStrategy
(Lane just asked "So, CKY is an inherently parallelizable algorithm will this support parallelization". Answer: "Yes, and you don't have to change the search strategy; just the general code").
Seemingly noteworthy sleep deprived quotes:
Today we are revising the design of our Hypergraph prototype code. We have decided to have a Hyperarc replace the AndNode.
Here's the outline of a Hypergraph for the case of parsing.
Hypergraph pseudo code:
Parse for(f_vertex in finishing) { consume(e_edge in explore) { if(!finishing.vertexExists(e_edge) { finishing.addNewVertex(e_edge); } else { // in the viterbi case we will collapse // and keep the best one... combine(e_edge) { } HG.addNode(f_vertex) // expand if(f_vertex.isActive) { for(passive) { if(combine(active, passive)) // in the phrase-based case an active // edge is a coverage vector... explore.add(new AHA(active, passive)) } } }
With inspiration from the Klein and Manning 2004 paper.
Evening note on terminology problem. In chart parsing we talk about active and passing edges. In hypergraphs these are vertices.
Algorithms that we want our abstract Hypergraph to be able to compute:
Some useful papers
Toy example:
OrNode
AndNodes
State
Terminal case: an OrNode with no children Axiom?: basic information (rule in Heiro, phrase translation in Moses)
For any particular application, you'll have to write your own specific AndNode implementation, and your own Decoder which operates on them, although the phrase-based and CKY will both operate identically.
Objective for tonight: two groups, both prototyping the HyperGraph interface, tomorrow will try doing phrase-based v. cky