Your Robots.txt
file is what tells search engines which pages to access and index on your
website on which pages not to. So if you specify in your Robots.txt file that
you don’t want the search engines to be able to access your forum page, as an
example, that page won’t be able to show up in the search results and web users
won’t be able to find it. Keeping the search engines from accessing certain
pages on your site is essential for both the privacy of your site and for your
SEO strategy.
Search
engines send out tiny programs called spiders or robots to search your site and
bring information back to the search engines so that the pages of your site can
be indexed in the search results and found by web users. Your Robots.txt file
instructs these programs not to search pages on your site which you designate
using a “disallow” command.
To
demonstrate, the following Robots.txt command:
User-agent:
*
Disallow:
/forum
It would
block all search engine robots from visiting the following page on your
website:
http://www.yoursite.com/donotcomeinhere
Notice that
before the disallow command, you have the command:
User-agent:
*
The “User-agent:”
part specifies which robot you want to block and could also read as follows:
User-agent:
Googlebot
This command
would only block the Google robots, while other robots would still have access
to the page:
http://www.yoursite.com/donotcomeinhere
However, by
using the “*” character, you’re specifying that the commands below it refer to
all robots. Your Robots.txt file would be located in the main directory of your
site.
For example:
http://www.yoursite.com/robots.txt
http://www.yoursite.com/robots.txt
There are
three reasons why you might want to block a page using the Robots.txt file.
First, if you have a page on your site which is a duplicate of another page,
you don’t want the robots to index it because that would result in duplicate
content which can hurt your SEO.
Second, if
you have a page on your site which you don’t want users to be able to access
unless they take a specific action, like if you have a forum page where users
get access to specific information because of the fact that they gave you their
email address, you wouldn’t want people being able to find that page by doing a
Google search.
And most
important, you’ll want to block pages or files is when you want to protect
private files in your site such as your cgi-bin.
User-agent:
*
Disallow:
/images/
Disallow: /cgi-bin/
Disallow: /cgi-bin/
In all of
these cases, you’ll need to include a command in your Robots.txt file that tells
the search engine spiders not to access that page, not to index it in search results
and not to send visitors.