Robots.txt BIG Mistake – Blocking css and js

When running larger sites, especially ones build on platforms such as Magento and WordPress it’s good practice for the webmaster to give search engines a little help with the crawling and indexing your site. As well as some other advanced tactics, using robots.txt and sitemap.xml files are common ways to do this.

What is robots.txt?

A file called robots.txt placed in the root of your website’s folder structure is basically the gate keeper to your site that search engines need to pass through. It’s a file where you can list instructions to search engines as to what they can (and more importantly) can’t look at.

Why Block Search Engines?

When a search engine crawls a site it’s making certain judgements… what the site’s about, is it full of useful information, is it an eCommerce site, etc. For this reason you want to make sure the information it sees is relevant to what you do or offer, and you don’t want this signal clouded with irrelevant noise. When using systems like Magento, lots of pages ate generated, a site with just a few products could have over 1000 URL’s (pages) listed in Googles index. Only 30 of these URL’s might be the real ‘meat’ of the site. In this instance it could make sense to look at some of these URL’s that are indexed and block search engines from seeing them any more, thus conveying  a more concentrated view of your site.

Why Blocking JavaScript and CSS is a Mistake

As well as blocking URL’s, webmasters will also commonly block access to directories that contain scripts. The intent might be good… this information is useless to Google right?

Wrong!

JavaScript and CSS plays a big part in how your website is presented to a user. If you block search engines from seeing these files you are denying them the ability to render pages as the user would see them. With responsive designs becoming more common this is a bigger issue than ever. If you have a great site that scales perfectly to a mobile device Google will want to know, this way it can give your site a boost in the rankings when a search is using a mobile device.

Block these important files and it’s only prudent to think Google will presume the worst. This is either that your site contains no layout and style information at all, or even worse, you’re up to something shady that you’re trying to hide. Either way it will have a negative impact on your rankings.

Recently we’ve done an audit on a suffering Magento site, we unblocked the JavaScript and CSS folders and, hey presto, ranking dramatically improved within 48 hours.

So our advice is basically to be very careful with what you block. If it’s important to a user then it’s more than likely just as important to a search engine.

Resources:

The Google help files are a good place to start form more information on robots.txt.

There are lots of features in Google Webmaster Tools that will help you see how Google views your site. One in particular is ‘Fetch as Google’ in the ‘Crawl’ section.

3 Comments

  • matias says:

    Mark, I got here following your comment on Inchoo’s blog. This is a great article and it helped us a lot. Thanks for sharing.

  • JH says:

    Hi Mark

    I have a site which is currently deindexed completely (intentionally) apart from the homepage and 1 other page.

    so its:
    User-agent: *
    Disallow: /
    Allow: /$
    Allow: /indexed-page-url

    The problem is that the CSS & Javascript on the indexed page are blocked by robots so it’s rendering poorly in fetch as google. Is it ok to then add:
    Allow: /css/

    Thanks!

    • Mark says:

      Yes, I can’t see why allowing spiders to crawl your CSS and JavaScript would be an issue. It’s not like search engines are going to actually index the files… they’ll just be used to render the pages that are already in their indexes. If in doubt test your robots.txt using the ‘robots.txt Tester’ in Google’s ‘Webmaster Tools’.

      I can see why webmasters might have been blocked in the past… by trying to be too precise in trying to assist spiders to only crawl the ‘content’, probably an evolution of ‘pagerank sculpting’. But imo it was always on over kill.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>