Version: 15.2
Revision: 65 Build 32
Robots.txt ~tutorial
Introduction: this tutorial will help you to block a specific robot or multiple robots from indexing your files, folders, documents and other private extension. It will also reduce the risk of private data from being seen or collected by "search engines" .
"Robots.txt" is a regular text file. It also has special meanings to the majority of "honourable" robots on the web. By defining a few rules in the text file, you can instruct or command robots to stop crawling and indexing certain files, directories within your site, or none at all. For example, you may not want "Google" to crawl the "/images" directory of your site, as it's both meaningless to you and a waste of your site's bandwidth. "Robots.txt" lets you tell Google just that...
Notes: before you create a regular "text file" called "robots.txt", you must make sure it's named exactly as its written! This file must also be uploaded to the root (accessible) directory of your site, not a subdirectory...
Example: http://www.mysite.com but NOT http://www.mysite.com/sub_folder/
Syntax
----------------------------------------
User-agent - the robots and the following rule applies to...
Disallow - the URL you want to block...
----------------------------------------
1.] To block all robots from looking at everything and crawl your website, you can use this following codes.
---Copy Source Code---
---End Source Code---
2.] To block a directory and everything in it, you can use this following codes.
---Copy Source Code---
---End Source Code---
3.] To block a page, just list the page that you want to block.
---Copy Source Code---
---End Source Code---
4.] To remove a specific image from a search image engine, add the following codes.
---Copy Source Code---
---End Source Code---
5.] To remove all images on your site, just this source code as an example
---Copy Source Code---
---End Source Code---
6.] To block files of a specific extension, just use this example.
---Copy Source Code---
---End Source Code---
7.] To prevent pages on your site from being crawled, while still displaying on other search engines, you'll need to use this example...
---Copy Source Code---
---End Source Code---
8.] To match a sequence of characters, use an asterisk
[*]. For example, to block access to all subdirectories that begin with "file_directories".
---Copy Source Code---
---End Source Code---
9.] To specify matching the end of a URL, you'll need to use $ symbols. For instance, to block any URLs that end with .zip...
---Copy Source Code---
---End Source Code---
10.] You can conditionally target multiple robots in "robots.txt." For instance, you want to block all search engines and only allow Google to index or crawl your website without looking at "cgi-bin" and "privatedir".
---Copy Source Code---
---End Source Code---
11.] To block multiple extention, you can use this example...
--Copy Source Code---
---End Source Code---
Copyright 2008 ~Lair360
Revision: 65 Build 32
Robots.txt ~tutorial
Introduction: this tutorial will help you to block a specific robot or multiple robots from indexing your files, folders, documents and other private extension. It will also reduce the risk of private data from being seen or collected by "search engines" .
"Robots.txt" is a regular text file. It also has special meanings to the majority of "honourable" robots on the web. By defining a few rules in the text file, you can instruct or command robots to stop crawling and indexing certain files, directories within your site, or none at all. For example, you may not want "Google" to crawl the "/images" directory of your site, as it's both meaningless to you and a waste of your site's bandwidth. "Robots.txt" lets you tell Google just that...
Notes: before you create a regular "text file" called "robots.txt", you must make sure it's named exactly as its written! This file must also be uploaded to the root (accessible) directory of your site, not a subdirectory...
Example: http://www.mysite.com but NOT http://www.mysite.com/sub_folder/
Syntax
----------------------------------------
User-agent - the robots and the following rule applies to...
Disallow - the URL you want to block...
----------------------------------------
1.] To block all robots from looking at everything and crawl your website, you can use this following codes.
---Copy Source Code---
Code:
User-agent: *
Disallow: /
2.] To block a directory and everything in it, you can use this following codes.
---Copy Source Code---
Code:
User-agent: *
Disallow: /random-directory-one/
Disallow: /random-directory-one/random-directory-two/
3.] To block a page, just list the page that you want to block.
---Copy Source Code---
Code:
User-agent: *
Disallow: /private_file.html
Disallow: /random-directory-one/style.css
4.] To remove a specific image from a search image engine, add the following codes.
---Copy Source Code---
Code:
User-agent: Googlebot-Image
Disallow: /image1.gif
Disallow: /random-directory-one/image2.png
5.] To remove all images on your site, just this source code as an example
---Copy Source Code---
Code:
User-agent: *
Disallow: /image_folder/
6.] To block files of a specific extension, just use this example.
---Copy Source Code---
Code:
User-agent: *
Disallow: /*.gif$
Disallow: /*.jpeg$
Disallow: /image_folder/*.png$
Disallow: /image_folder/*.jpeg$
7.] To prevent pages on your site from being crawled, while still displaying on other search engines, you'll need to use this example...
---Copy Source Code---
Code:
User-agent: *
Disallow: /folder1/
User-agent: Google
Allow: /folder1/
8.] To match a sequence of characters, use an asterisk
[*]. For example, to block access to all subdirectories that begin with "file_directories".
---Copy Source Code---
Code:
User-agent: Googlebot
Disallow: /file_directories*/
9.] To specify matching the end of a URL, you'll need to use $ symbols. For instance, to block any URLs that end with .zip...
---Copy Source Code---
Code:
User-agent: Googlebot
Disallow: /*.zip$
10.] You can conditionally target multiple robots in "robots.txt." For instance, you want to block all search engines and only allow Google to index or crawl your website without looking at "cgi-bin" and "privatedir".
---Copy Source Code---
Code:
User-agent: *
Disallow: /
User-agent: Googlebot
Disallow: /cgi-bin/
Disallow: /privatedir/
11.] To block multiple extention, you can use this example...
--Copy Source Code---
Code:
User-agent: *
Disallow: /*.xls$
Disallow: /*.gif$
Disallow: /*.jpg$
Disallow: /*.jpeg$
Disallow: /*.pdf$
Disallow: /*.rar$
Disallow: /*.zip$
Copyright 2008 ~Lair360