Friday, June 18, 2010

SharePoint Search 101: Why do I need to extract content?

A while ago, I’ve started writing search vignettes for the MSFT Enterprise Search Product team, but the project got stalled and I’ve decided (with MSFT permission) to start the “SharePoint Search 101” series of mini articles on my blog. All content is related either to SharePoint Search 2010 and/or Fast Search for SharePoint 2010 (FS4SP). These mini articles are intended to be small, easily digestible snippets of content that answer the What, How and Why of a given enterprise search feature of SharePoint Server 2010 and/or FAST Search Server 2010 for SharePoint. The HOW portion is usually covered by the short demo or list of steps.

Why do I need to extract content?

(Entity Extraction, Managed Properties, Refiners)

Entity extraction is a way of pulling out meaningful information that might not be otherwise explicitly defined by end users as metadata. Managed Properties that are created through entity extraction and surfaced in the search interface as refiners are defining alternative structure or visual presentation of structure that can be used to narrow down the search results set.

When you present refiners based on managed properties created from entities extracted from content or metadata, users can easily filter out the result set based on the values of those extracted entities. Refiners can be shallow or deep, where shallow refiners are based on top 50 results brought back by query, and deep refiners is when all results are brought back with exact count of the number of results.

While managed properties are available in SharePoint Search 2010 OOTB, you can only map crawled properties that are exposed in lists and libraries metadata.

In the FAST Search for SharePoint there are built in entity extractors such as People, Companies and Locations. You can define your own list of terms to be extract from the content by building a dictionary or you can create a content processing stage that will extract entities based on a specific business rule or a need as well as extract entities through matching them to regular expressions. For example you can extract client names from document where this information is not available as metadata and expose it as managed property.

Note: Deep refiners are available only in FAST Search for SharePoint 2010

Enjoy :-)

No comments: