Posted by Chris Keating, Senior Manager (Natural Search)
Google, Yahoo! and Microsoft recently announced that spiders will recognize a tag that enables them to select one authoritative page from a collection of identical pages. This tag should be considered a valuable tool, but not a miracle cure.
Duplicate content is a fundamental issue that needs to be addressed for effective optimization. The existence of duplicate content results in uncertain indexation. When a search engine spider confronts duplicate content it does one of four things: it keeps all of it, throws it all away, picks a few pages to crawl or simply stops crawling the domain. Duplicate content is also a concern because it dilutes a Web site’s content through multiple URLs, which affects relevancy. Finally, if other sites link to the same content hosted on different URLs it diminishes the combined effects of these links, which affects page rank.
The canonical tag is important because it is a quick way to let the spiders know what page should be crawled. The tag is easy to insert into a site’s meta data and the spider acknowledges it as a recommendation. In a sense, the canonical tag is like a light-house. It reveals a primary page and saves the spider from the depths of duplicate content. This can improve indexation and consolidate the power of your inbound links.
The canonical tag is only one of many tools to help the spider navigate through duplicate content. The Robots.txt file, no-index instructions, “no follow” link attributes and 301 redirects are other tools at our disposal.
Every site is unique and every tool is particularly effective in its own way. The key to addressing duplicate content problems is to understand a site’s specific needs and apply the right combination of techniques and tools.