Robots.txt Blogspot Optimization: Avoiding Duplicate Content
Robots.txt optimization for Blogger blogs (that is, Blogspot blogs) is a little tricky. Unlike Wordpress (which is the god of SEO for free blog hosting), Blogger blogs are more likely to be marked by Google as having duplicate content. Why is this so, and how would you use or optimize the robots.txt for Blogger or Blogspot blogs?
If you look around the internet, there are not too many guides on how you can optimize your robots.txt file for Blogspot blogs, because of one main reason: Blogger does not allow you to upload any .txt file to its root domain. In other words, you can’t add just any file into your Blogspot blog, even if you have your own custom domain. However, you can use meta tags for robots.txt Blogspot optimization.
Optimizing Robots.txt for Blogspot or Blogger: Why this is important
It’s important to optimize your robots.txt file if you are hosting your blog on Blogspot because if Google sees that you have duplicate content, then you will get penalized. As a consequence, your PageRank may go down, you will not come up as part of the top search results, and Google just might think of your blog as a spam blog.
This happens because Blogger has an archive. Remember how Blogger archives your blog according to month or week? For instance, in this blog, my blog posts are archived according to month. For example, all my blog posts last January are archived under this page:
The downside of this is that the content of ALL your blog posts for that month will be indexed TWICE: FIRST, under its own URL (for instance, my blog post on how blogging is more fun in the Philippines is under the URL http://www.lifeandfever.com/2012/01/blogging-more-fun-in-philippines.html), and AGAIN as part of the month’s archive (under the URL http://www.lifeandfever.com/2012_01_01_archive.html, since this URL shows all my posts for January).
How To Optimize Robots.txt for Blogspot or Blogger: Block Crawlers from Accessing Archive
Forget about customizing and generating your own robots.txt file IF you are hosting your blog on Blogspot. Instead, read the easy steps below on how to optimize robots.txt meta tags for Blogger blogs. That way, your content does not get marked as duplicate content by Google.
- Log on to your Blogger account. Then go to Dashboard > Design > Edit HTML. Download your full template BEFORE you start messing around with your HTML to make sure you can revert to your original code IF something goes awfully wrong later.
- Find this script on your Blogger blog’s HTML:
and below it, copy and paste this code:
<b:if cond='data:blog.pageType == "archive"'>
<meta content='noindex,follow' name='robots'/>
- Click on the Preview button before saving your template to make sure your HTML has been properly parsed.
And you’re done! To block crawlers from accessing and indexing Blogspot archive, all you have to do is paste that small script into your HTML. Now, you have optimized your robots.txt so that crawlers will no longer index your Blogspot archives and you will no longer have duplicate content! Give it a couple of days or so and try searching Google for all the indexed pages of your blog by typing “site:www.yourdomainname.com”. (Just get rid of the quotes and replace “yourdomainname” with your own domain. For instance, I would type “site:www.lifeandfever.com” but without the quotes to find out which pages on my blog were indexed by Google.)
Hope this very simple walkthrough has helped you optimize your Blogspot blog and avoid duplicate content by optimizing robots.txt! If you are having any problems or if you think the script I provided is incorrect, please share your suggestions and comments below. If you know of a better way on how to block crawler access to Blogspot archive to prevent duplicate content, do share them below as well.