Not Logged In

Tuning the Selection of Correction Candidates for Garbled Tokens using Error Dictionaries

Full Text: ranlpCandselecterrordic.pdf PDF

In previous work, we introduced a method for efficiently selecting from a background dictionary suitable correction candidates for an malformed token of a given input text. In order to select small and meaningful candidate sets, refinements of the Levenshtein distance with restricted sets of substitutions, merges and splits were used. In these experiments, the subset of possible substitutions, merges and splits was determined via training, using ground truth data representing corrected parts of the input text. Here we show that an appropriate set of possible substitutions, merges and splits for the input text can be retrieved without any ground truth data. In the new approach, we compute an error profile of the erroneous input text in a fully automated way, using error dictionaries. From this profile, suitable sets of substitutions, merges and splits are derived. Error profiling with error dictionaries is simple and very fast. We obtain an adaptive form of candidate selection which is very efficient, does not need ground truth data and leads to small candidate sets with high recall.

Citation

S. Mihov, P. Mitankin, A. Gotscharek, U. Reffle, K. Schulz, C. Ringlstetter. "Tuning the Selection of Correction Candidates for Garbled Tokens using Error Dictionaries". Finite-State Techniques and Approximate Search, February 2008.

Keywords: machine learning
Category: In Workshop

BibTeX

@misc{Mihov+al:FSTAS08,
  author = {Stoyan Mihov and Petar Mitankin and Annette Gotscharek and Ulrich
    Reffle and Klaus Schulz and Christoph Ringlstetter},
  title = {Tuning the Selection of Correction Candidates for Garbled Tokens
    using Error Dictionaries},
  booktitle = {Finite-State Techniques and Approximate Search},
  year = 2008,
}

Last Updated: February 02, 2008
Submitted by Nelson Loyola

University of Alberta Logo AICML Logo
userErrorHandler("2", "Unknown: Write failed: No space left on device (28)", "Unknown", "0")
line 0, file: unknown
include path: /home/papersdb/web_docs/includes:/home/papersdb/web_docs:/home/papersdb/web_docs/pear:.:/usr/share/php