Second Machine Translation Marathon | Main / GenericDecodingProject browse

Meeting at ACL

Who will be at ACL and when?

Adam Lopez -- Sunday through Friday (also Coling in August)
Chris D -- Sunday through Friday
Phil B -- Sunday through Friday
CCB -- unfortunately missing ACL because of DARPA CS Study Panel (will be at CoLing & EMNLP)
Josh -- Monday through Thursday
Lane -- Monday through Friday

Followup Discussion

Some thoughts on the prototype -- the overall data structure design is reasonable, but I'm less certain about the Klein & Manning algorithm (which is basically the Dijkstra/Knuth algorithm).

It requires all traversals to be non-negative, something that may be hard to ensure with typical linear models used in MT
I'm also a little concerned that some of the details of the K&M algorithm are a little too specific to CFG parsing, e.g. the idea of a "fundamental rule". Is there a fundamental rule for more complicated models like Model 4? Or, for a real challenge, can you identify one in these complicated TAG parsing algorithms?

In any case their algorithm is just a particular method of traversing the hypergraph under certain semirings / assumptions, so in general as long as the data structures work with other semirings and assumptions that we might reasonably make, this is fine.

Relevant Papers

Here are some pointers to relevant papers that give background on the problem. This is surely not a complete list, so feel free to correct any omissions, errors, or misattributions.

Algorithm schemata and data structures in syntactic processing. Kay, 1980.: Unifies top-down and bottom-up CFG parsing algorithms under the notion of chart parsing (I couldn't fid this online but it's in the local library where I'm currently located and worth a read).
Parsing as Deduction. Pereira & Warren, ACL 1983.: This paper points out the connection between chart parsing and logic programming (using Prolog), showing how parsers can be specified as deductive systems. Note that we're just talking about parse existence at this point, not probabilistic parsing.
Principles and implementation of deductive parsing. Shieber, Schabes, & Pereira, Journal of Logic Programming, 24(1-2):3-36, 1995.: Expands on Pereira & Warren (1983), giving deductive systems for a wide variety of parsing algorithms and grammars, and discussing quite a bit of implementation detail (in Prolog).
Semiring Parsing. Goodman, CL 1999.: Unifies probabilistic and non-probabilistic parsing under the deductive framework. Essentially this shows that probabilistic and non-probabilistic algorithms are identical by factoring out the computed quantity into a semiring. Additionally, it describes several novel semirings for computing, e.g. inside probabilities, derivation forests, etc., showing that these algorithms are essentially the same as well. This is a nice theoretical paper but there are some details that are not addressed.

Efficiency -- the algorithm given assumes exhaustive search where every item is the space is computed. This is especially bad for probabilistic models. However, the search strategy is underspecified, so this can be fixed (and is discussed in some of the papers below).
Non-local features -- if there are features that are essentially useful only for the probabilistic model, and not for the structure, this bends the formalism quite a bit.

That being said, there is a lot of good stuff in this paper, and the observation that many different quantities can be computed under essentially the same algorithm is very useful. The key issue that's hidden is that search strategy is actually dependent on the semiring -- this is addressed in some of the papers below.

Generalized Parsers for Machine Translation, Melamed & Wang, unpublished: Extends Goodman's formalism somewhat, and recognizes that a search strategy is a key element of a complete implementation. Unfortunately not a lot of additional detail is given. Much of the other stuff in the paper is specific to a specific synchronous grammar formalism.
Compiling Comp Ling. Eisner, Goldlust, & Smith, HTL-EMNLP 2005: Describes the implementation of Dyna, a very high-level programming language that aims to be an implementation of Goodman's formalism, and then some. This paper is very ambitious and a number of things are not described in a lot of detail, but there are a lot of good hints and references, particularly for implementation of tricky things such as infinite cycles. Interesting note: in Dyna, the search strategy can be specified by the user as a C++ function.
Parsing and Hypergraphs. Klein & Manning, IWPT 2001.: Identifies one of the weaknesses in the Goodman model by focusing on the search strategy for probabilistic semirings. This is connected to hypergraphs, using Dijkstra's algorithm to obtain efficient exact search for this particular semiring. This procedure was subsequently extended to the A* Algorithm in a 2003 NAACL paper by the same authors.
An Efficient A* Search Algorithm for Statistical Machine Translation. Och, Ueffing, & Ney, 2001.: This describes an empirical heuristic function for nearly-A* search. The basic idea is to decode, then compute an A* heuristic for every possible search node based on the empirical costs seen in the remainder of the graph. Then redecode with the new heuristic, which is admissible w.r.t. the explored space. This process can be iterated to convergence to reduce search errors. It seems like this is actually a really general algorithm.
A Search in the Forest: Efficient Algorithms for Parsing and Machine Translation based on Packed-Forests. Liang Huang, dissertation proposal: Identifies a number of key search problems for particular semirings, including k-best, and approximate search with non-local features (i.e. cube pruning and forest rescoring). The proposal includes all of the major results from Huang's recent papers, including the k-best paper, the forest rescoring paper, and the upcoming ACL 2008 (best) paper on parsing with non-local features. It also serves as a nice primer on dynamic programming, to boot, and points out that hypergraphs, and-or graphs, deductive systems are all essentially different views on the same problem.
An Efficient Two-Pass Approach to Synchronous-CFG Driven Statistical MT. Venugopal, Zollman, and Vogel, NAACL 2007.: Describes a two-pass decoding strategy that serves as an alternative to the forest rescoring method of Chiang. This paper is a bit difficult to read -- it goes through too many contortions to fit the algorithm into the deductive framework (and ends up breaking it in the process). The algorithm itself is less obviously general than forest rescoring, and doesn't have any provably optimal forms as the former does. That being said, the main takeaway message -- that it is possible to parse synchronous grammars and pay for the language modeling expense only one side (rather than two as in cube pruning) -- is useful. I think it might be worth trying to see if this algorithm can be generalized in some way or if the LM trick can be thought of in terms of the "hook trick" for parsing.

Page last modified on June 13, 2008, at 12:09 AM