miercuri, 8 martie 2017

CS51 Spellchecker - Final Project 2015

http://bit.do/dgZnY Enter the secret sale page at Ibik.
Catherine Tuntiserirat, Erika Puente, Mick Kajornrattana, Billie Wei We built a spell-checking program that takes in a typed word from user's keyboard and returns a list of top-suggested words for each of the misspelled words, line by line. We built this program using the OCaml language and implementing a Burkhard-Keller Tree to store the words in our dictionary. The Burkhard-Keller Tree is a tree-based data structure engineered for quickly finding near-matches to a string, for example, returning "seek" and "peek" for "aeek" using edit distance. In order to index and search our dictionary (BK-Tree) we will be using some other algorithms to help calculate edit distance between two strings. We implemented three versions of Levenshtein's Distance calculator, one of which unfortunately cannot be used with BK Tree for reasons to be discussed later. The distance calculating function takes in two strings and returns a number representing the minimum number of insertions, deletions and replacements (and transposition in Damerau-Levenshtein) required to translate one string into the other. Each node of the tree has an arbitrary number of children (an n-ary tree), and each edge has a weight determined by the Levenshtein Distance. For extension feature, we optimized ways to calculate the closest string. A method we are using will use both Levenshtein's distance and probability that derived from the frequency of the word being used. For example, if input is "thew", then the suggested word should be ranked starting at "the" before "thaw" because "the" is more commonly used. We obtained these frequency values through Project Guttenberg, and used those values to narrow the list of suggested spellings that are returned to our user.

Niciun comentariu:

Trimiteți un comentariu