We propose FailFinder, a debugger for MT decoders and syntactic models of translation. It includes a graphical analysis tool for the Joshua (or perhaps cdec) decoder that provides the user with information about why specific hypotheses were not chosen, given a reference. A reference can be 1) a token sequence 2) a sequence of target phrases and their source spans or 3) a target parse tree and its source spans. Further, we can allow the user to create custom desired hypotheses on-the-fly by allowing the user to select a partial hypothesis in a given cell and substitute it into the reference for the given span. This increases the chance that the new reference is still reachable since it is composed of pieces already in the hypergraph. A failure occurs if the reference is not the topbest hypotheses according to vanilla decoding. We would like to classify these failures into one of the following categories:
is lower than topbest hypothesis from model decoding
Once classified, FailFinder can then provide more information in each context. In the case of a reachability error, FailFinder could suggest what derivation rule(s) could be used to reach the reference (giving preference to simpler rules). In the case of a search error, FailFinder could specify at which cell(s) the a desirable partial hypothesis was pruned in the vanilla decode even though it was present in the forced decode and show the difference between the feature vectors of the force-decoded partial hypothesis and the vanilla-decoded partial hypothesis. In the case of a scoring error, we can display the difference in the feature vectors to highlight which features vary most.
Notes:
tool can help you know if it's even worth fixing the search errors or if fixing them will eventually just result in another type of error
oracle decoding, making it problematic here
actual hypotheses
Features already in FailFinder:
Proposed work:
Subtasks for group members: