Robots.txt ~tutorial

lair360

New Member
Messages
200
Reaction score
0
Points
0
Version: 15.2
Revision: 65 Build 32

Robots.txt ~tutorial

Introduction:
this tutorial will help you to block a specific robot or multiple robots from indexing your files, folders, documents and other private extension. It will also reduce the risk of private data from being seen or collected by "search engines" .

"Robots.txt" is a regular text file. It also has special meanings to the majority of "honourable" robots on the web. By defining a few rules in the text file, you can instruct or command robots to stop crawling and indexing certain files, directories within your site, or none at all. For example, you may not want "Google" to crawl the "/images" directory of your site, as it's both meaningless to you and a waste of your site's bandwidth. "Robots.txt" lets you tell Google just that...

Notes: before you create a regular "text file" called "robots.txt", you must make sure it's named exactly as its written! This file must also be uploaded to the root (accessible) directory of your site, not a subdirectory...

Example: http://www.mysite.com but NOT http://www.mysite.com/sub_folder/

Syntax
----------------------------------------


User-agent - the robots and the following rule applies to...
Disallow - the URL you want to block...

----------------------------------------

1.] To block all robots from looking at everything and crawl your website, you can use this following codes.
---Copy Source Code---
Code:
User-agent: *
Disallow: /
---End Source Code---

2.] To block a directory and everything in it, you can use this following codes.
---Copy Source Code---
Code:
User-agent: *
Disallow: /random-directory-one/
Disallow: /random-directory-one/random-directory-two/
---End Source Code---

3.] To block a page, just list the page that you want to block.
---Copy Source Code---
Code:
User-agent: *
Disallow: /private_file.html
Disallow: /random-directory-one/style.css
---End Source Code---

4.] To remove a specific image from a search image engine, add the following codes.
---Copy Source Code---
Code:
User-agent: Googlebot-Image
Disallow: /image1.gif
Disallow: /random-directory-one/image2.png
---End Source Code---

5.] To remove all images on your site, just this source code as an example
---Copy Source Code---
Code:
User-agent: *
Disallow: /image_folder/
---End Source Code---

6.] To block files of a specific extension, just use this example.
---Copy Source Code---
Code:
User-agent: *
Disallow: /*.gif$
Disallow: /*.jpeg$
Disallow: /image_folder/*.png$
Disallow:  /image_folder/*.jpeg$
---End Source Code---

7.] To prevent pages on your site from being crawled, while still displaying on other search engines, you'll need to use this example...
---Copy Source Code---
Code:
User-agent: *
Disallow: /folder1/

User-agent: Google
Allow: /folder1/
---End Source Code---

8.] To match a sequence of characters, use an asterisk
[*]. For example, to block access to all subdirectories that begin with "file_directories".
---Copy Source Code---
Code:
User-agent: Googlebot
Disallow: /file_directories*/
---End Source Code---

9.] To specify matching the end of a URL, you'll need to use $ symbols. For instance, to block any URLs that end with .zip...
---Copy Source Code---
Code:
User-agent: Googlebot 
Disallow: /*.zip$
---End Source Code---

10.] You can conditionally target multiple robots in "robots.txt." For instance, you want to block all search engines and only allow Google to index or crawl your website without looking at "cgi-bin" and "privatedir".
---Copy Source Code---
Code:
User-agent: *
Disallow: /
User-agent: Googlebot
Disallow: /cgi-bin/
Disallow: /privatedir/
---End Source Code---

11.] To block multiple extention, you can use this example...
--Copy Source Code---
Code:
User-agent: *
Disallow: /*.xls$
Disallow: /*.gif$
Disallow: /*.jpg$
Disallow: /*.jpeg$
Disallow: /*.pdf$
Disallow: /*.rar$
Disallow: /*.zip$
---End Source Code---
Copyright 2008 ~Lair360
 

DarkDragonLord

New Member
Messages
782
Reaction score
0
Points
0
Great tutorial!

+rep
reputation.gif
 
Last edited:

RRJJMM

New Member
Messages
41
Reaction score
0
Points
0
Thanks for the information. This is good stuff for us control freaks that like to "pull the shades" every now and then.

Cheers,
 

lair360

New Member
Messages
200
Reaction score
0
Points
0
Thanks for the information. This is good stuff for us control freaks that like to "pull the shades" every now and then.

Cheers,

Thank you very much for your feedback! However, the "robots.txt" is very powerful, you'll have to be very careful when you assign something to order the robots...
 
Top