If you are developing any of your domains, you may already be aware of the developer’s nightmare: duplicate content. Now Google, Yahoo and Microsoft have come together to try to help address this issue by introducing a new attribute that web developers can use. This attribute allows web developers to declare a canonical URL in the head section of the page and the search engines have said they will take this as a strong suggestion. This canonical URL would then become the indexed URL for that page in most cases. Let me break this down further because it is an important development.
Duplicate content
Duplicate content arises when you have two URLs with the same content on it. This happens more often than you think and it creates significant issues for web sites and search engines. A simple example might be if you have a web page with the URL http://www.example.com/brilliant.html but you also make a printer friendly version of this page available with the URL http://www.example.com/brilliant-printer.html.
To a search engine, these are two different URLs with the same content. What usually happens is that the search engine chooses one of the two pages to index and ignores the second.
So why do you care? The problem for you comes when people link to your website. If 10 people link to the first URL and 20 people link to the second URL, then the second URL is likely to get indexed. But worse, you are now ranking in the search engine for 20 links and not for 30 links which you could have got if there was only one URL to link to.
Spreading incoming links over multiple duplicate pages is called link dilution. The indexed page will have a lower score and therefore a lower ranking in the search engine. SEO experts spend a lot of time tracking down duplicate content problems to help improve web sites rankings. So what is this tag and how does it help?
The canonical attribute
The attribute allows you to specify the URL you would like to be indexed no matter which duplicate page the search engine finds. The link is simple:
<link rel=”canonical“ href=”http://www.example.com/brilliant.html”/>
The attribute goes in the head section of your web page. This is usually specified by <head> and ending with </head>. The search engines recommend that you use absolute links rather than relative links. Absolute URLs show the complete URL. The example above is an absolute URL.
Now the search engine will know which URL to index and can count all incoming links to all duplicate pages towards that one canonical URL. This could help alleviate the biggest problem with duplicate content.
Problems with this attribute?
There is some risk of potential abuse of this attribute but it is reported that the search engines have put safeguards in place as they have done for 301 redirects. If there are conflicts between attributes, then the search engine will sort those conflicts using the rules they have used prior to the new attribute.
If you are going to be developing web sites, then you need to know this attribute. Scripts and content management systems are especially prone to duplicate content but plugins are already appearing to help add this attribute to pages.
Joost de Valk seems to be the first out the gate with the WordPress canonical plugin.
I should caution you that I have not tried out these plugins yet.















{ 4 comments… read them below or add one }
So you’re telling me that people with content-scraping sites can just easily claim the content as their own and get the search engine ranking they don’t deserve just by placing that new code in the heading of their HTML web pages?
That’s great.
NOT.
There really needs to be some way to verify ownership of original unique content; this is going to kill new websites that have high quality unique content if existing websites can abuse this code and just copy/paste the unique content onto theirs!
Good move.
Cannonical error have caused true disasters, it’s really great they unite to come with a solution.
More detailed information at:
http://www.seomoz.org/blog/canonical-url-tag-the-most-important-advancement-in-seo-practices-since-sitemaps
Justice,
No thats not what happens. The canonical attribute has to be on the same domain. So for example:
htt-p://scraper1.com/mypage cannot put htt-p://example.com/mypage and claim that content.
Francois,
It makes it a lot easier for many people. Wow, that article you linked to is overkill. Its not that complicated.