Just recently I had to go through troubleshooting of content being indexed by FAST and/or SharePoint Search and wanted to share experience and some collected wisdom.
Here is the case, very common: the client has branded master page that includes menu navigation items, these items were being indexed. Where in case of searching for “Vacation form” would bring every page that is using this master page with menu items, simply because link to vacation form was included into the navigation menu on this master page. After the client included ”noindex”, which works for FAST as well as for SharePoint it seemed to work. Few weeks later when I came onsite, I noticed that the Search SSAs for SharePoint as well as for FAST had their content sources misconfigured and without even knowing the client was using just SharePoint search for everything. At this point I’ve reconfigured the content sources so only “people” search is being server by SharePoint search and the rest of the content is searched by FAST. This is when we have noticed the same problem where menu items were being indexed again without respecting the NOINDEX tag. Once we purged the index and reindexed everything again, it all worked.
Here are the tags and the explanation of how they work:
1) <meta name="robots" content="noindex" /> : Supported using both crawlers, although differently. FAST crawler will drop these items. SP crawler will not drop the items, they will be dropped by the FAST pipeline. But the result is the same for both.
2) <noindex> This text will not be indexed </noindex> : Not supported regardless of crawler.
3) <div class =”noindex”> This text will not be indexed </div> : Supported. This is transparent to both crawlers, will be filtered in the FAST pipeline.
4) <span class =”noindex”> : Supported by FAST Search back-end but not SharePoint search.
5) SP crawler fix to ensure that the <meta name="xxx" content="noindex" /> was passed to the FAST pipeline (KB 2276336)
Part of the August 2010 CU.
Hope this helps to others.