Wednesday, June 25, 2008

Enabling Arabic Search in MOSS

Reference: http://blogs.technet.com/aqa/archive/2007/05/30/enabling-arabic-search-in-moss-2007.aspx



Here are some facts to summarize the above:


  • Word breaker is the component of MOSS Search that does stemming.

  • Word stemming (morphology) is composed of two things: morphological analysis, and morphological generation.

  • In turn, morphological analysis/generation is further composed of two things: inflectional, and derivational.


Word breakers for different languages come shipped with MOSS 2007. They are NOT part of language packs.


Stemming is off by default for Arabic (and some other languages). You need to enable stemming (at query time) from the Search Center. [see below]
Stemming for a specific language is triggered by the language used in the client browser. [see below]

Mike Taghizadeh's blog covers MOSS 2007 Search Capabilities. Two excellent blogs are worth reading about word stemming:

Moss Search Word Stemming Part I
Moss Search Word Stemming Part II


Enabling word stemming in MOSS 2007 - Search Center:


  • Being the owner of administrator of the site, go to Search Center results page. (you can issue a query to go there, or navigate to results.aspx page)

  • Edit the page: Site Actions -> Edit Page

  • Got to the Core Results Web Part, and choose to modify this web part

  • In the Web Part settings panel, check the option that reads "Enable word stemming …" under the Results Query options.

  • Click OK, and save the page or publish it.

Now you have your search ready for word stemming at query time.


Testing Arabic word stemming by setting Arabic as the default language in IE:
MOSS 2007 has been designed to choose word breakers according to the language of the client browser. Browser will send HTTP_ACCEPT_LANGUAGE, with the default language set to MOSS. MOSS will in turn invoke word breaker for that specific language.
So to trigger Arabic word breaker, you need to change the IE language settings (Tools -> Internet Options -> Languages). If Arabic is set as the default, MOSS will invoke Arabic Word Breaker, and will do the stemming at query time. See the multilingual whitepaper above from more details.

No comments: