Posts Tagged ‘Search Engines’

Robots.txt file

Have you heard of a robots.txt file? Do you have a robots.txt file on your website?  robots Search engine spiders and similar robots will look for a robots.txt file, located in your main web directory. It’s very simple and can help with your site’s ranking in the search engines.

What is a robots.txt file?

A robots.txt file is a small text file that you place in the root directory of your web site. This file is used to fence off robots from sections of your web site, so they won’t read files in those areas.  Search engines often call these spiders and send them out to look for pages to include in their search results.

How do I create a robots.txt file?

Using a text editor such as Notepad, start with the following line:

User-agent: *

This specifies the robot we are referring to. The asterisk addresses all of them. You can be more specific by entering the bot name  but in most cases you would use the asterisk.

The next line tells the robot which parts of your website to omit in their crawl:

User-agent: *
Disallow: /cgi-bin/

This would fence off any path on your website starting with the string /cgi-bin./

Multiple paths can be added using additional disallow lines:

User-agent: *
Disallow: /cgi-bin/
Disallow: /private/
Disallow: /test/

This robots.txt file instructs all robots that any files in directories /cgi-bin/, /private/ and /test/ are off limits.

If you want search engines to index everything in your site, you don’t need a robots.txt file (not even an empty one).

Using meta tags to block access to your site.

You can also use Meta tags to the same effect.

To prevent search engines from indexing a page on your site, place the following meta tag into the header of the page:

<meta name=”robots” content=”noindex”>

Not all robots may support the robots.txt directive or the Meta tag, so it is advisable to use both.

You can learn more about robots.txt files here.