Tuesday, August 17, 2010

Spelling suggestions with Marklogic

 

When you search a word in Youtube and if Youtube takes it as misspelled then it returns a “Did you mean: suggested keyword” suggestion. Marklogic too provides spell suggestion words that you can use in your application.

Marklogic provides two functions relating to spelling correction.

1) spell:suggest() – To return a set of correct spellings for the word entered.

2) spell:suggest-detailed() – To return a set of elements describing each suggestion, including the suggested word, the distance, the key distance, the word distance, and the levenshtein distance.

You will use spell:suggest() to get the suggested words unless you want to compare the properties of the returned suggested words for which you will use spell:suggest-detailed()

To return a correct spelling of the given word, use

spell:suggest("spellingDictionary.xml","keyword")


It will return all the words which are close to keyword entered.


Here spellingDictionary.xml is a dictionary file that contains all the spellings.


The format of the dictionary is


<dictionary xmlns="http://marklogic.com/xdmp/spell">


<metadata>


</metadata>


<word></word>


<word></word>


......


</dictionary>



you may either create your own dictionary or download the dictionary provided by marklogic at http://github.com/marklogic/dictionaries/tree/master/dictionaries/


You may also edit the dictionary provided by marklogic and add/delete words in element <word></word>.


for a search query “welcme compter” you may use the below xquery to return spelling suggestions.


---------


declare variable $spellWords := <words><word>welcme</word><word>compter</word></words>;


<spellcheckerResult>

{


for $word in $spellWords//*:word


return  spell:suggest("/dictionary/large-dictionary.xml", $word)[1]


}


</spellcheckerResult>




----------


RESULT


<spellcheckerResult>welcome computer</spellcheckerResult>


spell:suggest returns a sequence of suggestion words. The first([1]) word returned is of the most closest one. so use “[1]” to  return the first closest word.


If word is correct then the same word will be returned for eg.


declare variable $spellWords := <words><word>welcome</word><word>compter</word></words>;


result would be


<spellcheckerResult>welcome computer</spellcheckerResult>



No comments:

Post a Comment