Warning: session_start(): Cannot send session cookie - headers already sent by (output started at /home/quillspi/public_html/articles/Harnessing-the-Power-of-Robotstxt.php:2) in /home/quillspi/public_html/templates/qsnet/header.html on line 3

Warning: session_start(): Cannot send session cache limiter - headers already sent (output started at /home/quillspi/public_html/articles/Harnessing-the-Power-of-Robotstxt.php:2) in /home/quillspi/public_html/templates/qsnet/header.html on line 3
Website Redesign, Hosting, and Search Engine Optimization without the hype

Quill Spirit & Creativity - since 1999. (logo.gif) 

Quillspirit.net Articles - <!---BEGIN Title--->Harnessing the Power of Robotstxt<!---END Title--->
Google
 
Web Quillspirit.net

Harnessing the Power of Robotstxt
By: Bruce Hearder


Once we have a website up and running, we need to make sure that all visiting search engines can access all the pages we want them to look at.

Sometimes, we may want search engines to not index certain parts of the site, or even ban other Search Engines from the site all together.

This is where a simple, little 2 line text file called robots.txt comes in.

Robots.txt resides in your websites main directory (on LINUX systems this is your /public_html/ directory), and looks something like the following:

User-agent: *
Disallow:

The first line controls the "bot" that will be visiting your site, the second line controls if they are allowed in, or which parts of the site they are not allowed to visit?

If you want to handle multiple "bots", then simple repeat the above lines. So an example:

User-agent: googlebot
Disallow:

User-agent: askjeeves
Disallow: /

This will allow Goggle (user-agent name GoogleBot) to visit every page and directory, while at the same time banning Ask Jeeves from the site completely.
To find a "reasonably" up to date list of robot user names this visit http://www.robotstxt.org/wc/active/html/index.html

Even if you want to allow every robot to index every page of your site, it?s still very advisable to put a robots.txt file on your site. It will stop your error logs filling up with entries from search engines trying to access your robots.txt file that doesn't exist.

For more information on robots.txt see, the full list of resources about robots.txt at http://www.websitesecrets101.com/robotstxt-further-reading-resources/

Bruce Hearder owns and run http://www.WebsiteSecrets101.com a site packed full with tips and tricks to getting the most out of your existing website. Sign up for the WebSiteSecrets101 newsletter today and squeeze more out of your website.

 


Return to Index

Server powered by:
Need to report Spam from our server? Contact Us! We maintain a very strict anti-spam policy... we don't like it, and we know you don't either! kiosk.ws Web hosting

Website design ©: Shawn M. J. Mann - Quillspirit.net