robots.txt ??

mikel2k3

New Member
Messages
748
Reaction score
0
Points
0
heya...

How many of you contain a robots.txt file for your websites?

And do i need it? i seem to be getting a lot of trhem visiting and im confused about it.

I have no idea what to write in the .txt file so if anybody cvould tell me or help me out with that, it would be great.

thanks, mike
 

Chris Z

Active Member
Messages
5,603
Reaction score
0
Points
36
I'm not really sure about the syntax for the file. But I'm pretty sure that it's just a file that allows or disallows the specified robots.
 

t2t2t

New Member
Messages
690
Reaction score
0
Points
0
Heres my robots.txt for one of my sites:

Code:
User-agent: * 
Disallow: /admin/ 
Disallow: /contrib/ 
Disallow: /doc/ 
Disallow: /lib/ 
Disallow: /modules/ 
Disallow: /plugins/ 
Disallow: /scripts/ 
Disallow: /tmp/

Robots.txt syntax
 

Cubeform

New Member
Messages
339
Reaction score
0
Points
0
Here is a good page on how to author Robots.txt files and what they are:
http://www.robotstxt.org/wc/robots.html

Note you have to place it at the root of your directory (in the public_html folder).

The CMS I use comes with a Robots.txt. It's quite long, so here's part of it:
Code:
User-agent: *
Crawl-delay: 10
# Directories
Disallow: /database/
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /sites/
Disallow: /themes/
Disallow: /scripts/
Disallow: /updates/
Disallow: /profiles/
# Files
Disallow: /xmlrpc.php
Disallow: /cron.php
Disallow: /update.php
Disallow: /install.php
Disallow: /INSTALL.mysql.txt
Disallow: /INSTALL.pgsql.txt
Disallow: /CHANGELOG.txt
Disallow: /MAINTAINERS.txt
Disallow: /LICENSE.txt
Disallow: /UPGRADE.txt

# I've cut off the rest of the file from this point forward #
 

mikel2k3

New Member
Messages
748
Reaction score
0
Points
0
ok thanks for the help...

what id like to know really, is do i really, really need a robots.txt file?
 

Chris Z

Active Member
Messages
5,603
Reaction score
0
Points
36
You don't absolutely need one. But it limits what the robots can read. So if you want them to be able index all of your directories, delete the robots.txt.
 

Micro

Retired staff <i> (11-12-2008)</I>
Messages
1,301
Reaction score
0
Points
36
Be warned though that some bots do not comply (Or even read) the robots.txt file. So dont use it for security from bots...
 

dest581

New Member
Messages
348
Reaction score
0
Points
0
robots.txt isn't useful for anything but controlling what search engines index. Beyond that, it's useless.
 

Cubeform

New Member
Messages
339
Reaction score
0
Points
0
robots.txt isn't useful for anything but controlling what search engines index. Beyond that, it's useless.

Robots.txt doesn't really control, either--like Micro pointed out, it is only a suggestion.

For those who do take the suggestion, it's great if a particular search bot starts bombarding your site with requests; you can just block it with robots.txt.
 
Last edited:

mikel2k3

New Member
Messages
748
Reaction score
0
Points
0
i just used a robot.txt generator thing and the results came up with this:

HTML:
User-agent: Googlebot
Disallow: 
User-agent: Googlebot-Image
Disallow: 
User-agent: MSNBot
Disallow: 
User-agent: Slurp
Disallow: 
User-agent: Teoma
Disallow: /
User-agent: Gigabot
Disallow: /
User-agent: Scrubby
Disallow: 
User-agent: Robozilla
Disallow: /
User-agent: Nutch
Disallow: /
User-agent: ia_archiver
Disallow: /
User-agent: baiduspider
Disallow: /
User-agent: yahoo-mmcrawler
Disallow: /
User-agent: psbot
Disallow: /
User-agent: asterias
Disallow: /
User-agent: yahoo-blogs/v3.9
Disallow: /
User-agent: *
Disallow: 
Crawl-delay: 5
Disallow: /cgi-bin/
Disallow: /

All this ok? or is there something simpler i could use?

And would it be a bad thing if i just Dis-Alloud ALL robots??
 

dest581

New Member
Messages
348
Reaction score
0
Points
0
I'm not sure, but blocking the adsense bot might mess up adsense, if you use it.
 
Top