Wednesday, June 23, 2010

Siteworx Makes Finding Answers Easy

How do you search over 30 years’ worth of data located in multiple content repositories on server farms around the world? Siteworx faced this fascinating challenge and their solution generated much attention from the Sitecore Outstanding Solutions committee (see Sitecore Recognizes Outstanding CMS Projects for more information).

Siteworx’ approach to the requirement involved the use of the Solr search server (built with Lucene). Through Solr, Siteworx indexes content from the Sitecore solution, a legacy Wordpress blog and even Google Analytics data. By using Solr with with a very content-rich site, Siteworx provides essential end-user facilities such as search, faceted navigation and rich metadata options.

What’s the story with indexing Google Analytics data? It’s a really cool idea: Solr pulls in the top 5,000 records from the last 24 hours using the Google Analytics API. This associates Google’s browsing data with Sitecore content items. With this data in the index, Siteworx provides a powerful “Most Popular” search facet on the web site.

The Wordpress integration is also interesting. The Solr index is populated using a push approach. Siteworx developed a Wordpress plug-in that pushes content to the Solr index when a new blog post is published. This allows for the blog content to appear alongside Sitecore content when a visitor performs a search.

Another key piece of the search puzzle was metadata. Siteworx’ client had a complex site taxonomy with 500 available metadata elements. The challenge here was making sure that content was properly tagged using this extremely rich taxonomy. Before content is published, editors can click a “Regenerate” command in the Ribbon that parses the text of an article and identifies keyword matches. These keywords are then associated with the article content and actually appear as inline tags in the article.

Overall, what I admire here is Siteworx’ ingenuity in tackling some challenging requirements. The site they were building had 30,000 substantive articles. Visitors to the site depend on being able to find relevant articles in the topic areas they are researching. Through content aggregation using Solr and automated, rich meta-tagging, they were able to develop a highly usable site with outstanding searchability.

You can learn more about Siteworx at http://www.siteworx.com.

No comments:

Post a Comment