Robots.txt

Status
Not open for further replies.

dale.black

New Member
Messages
11
Reaction score
0
Points
0
Cpanel Username: dale
Url: http://good.dale.be or daleblack.x10hosting.com

Google Webmaster Tools warned me that me site's "URLs restricted by robots.txt". Text of http://good.dale.be/robots.txt:
User-agent: *
Disallow: /
Disallow: /ispcp/

I checked the robots.txt at http://good.dale.be/robots.txt. It was just different from the Google one. It seems that Google couldn't see the real robots.txt. Googlebot-Mobile, Googlebot-Image and Mediapartners-Google were all blocked, but Adsbot-Google could see the real robots.txt.

I have also tried to change the public_html directory at the Index Manager in Cpanel to fancy index, but Google was still blocked.

Can anyone help? Thanks in advance.

ADD: Have a look at post #3, please.
 
Last edited:

zero5854

New Member
Messages
126
Reaction score
0
Points
0
Im pretty sure when theres a disallow sytax u have to have a "allow" syntax
Right now that script is telling me that you are blocking ur root folder...which pretty much means all folders...so you should add the allow syntax to the folder where ur site is.
 

dale.black

New Member
Messages
11
Reaction score
0
Points
0
Thank you zero5854. You can have a look at the one at http://good.dale.be/robots.txt. It is different from the one that Google sees. I don't know why this happens.
Did you mean that you saw the same one as Google's going through that URL?
 

YamiKaitou

Member
Messages
636
Reaction score
0
Points
16
I think that saying "disallow /" is basically saying don't come to my site
 

dale.black

New Member
Messages
11
Reaction score
0
Points
0
Thank all of you for the help!
The problem is still not solved. Can it be something wrong with Google's webmaster tools? Can anyone tell me how my robots.txt(http://good.dale.be/robots.txt) looks like on your end? I don't know why Google think that my robots.txt only has three lines that disallow the indexing of all the pages. The robots file loads well on my end and it should be like this:
Code:
 # $Id: robots.txt,v 1.7 2007/01/08 12:02:18 dries Exp $
#
# robots.txt
#
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these "robots" where not to go on your site,
# you save bandwidth and server resources.
#
# This file will be ignored unless it is at the root of your host:
# Used:    [URL]http://example.com/robots.txt[/URL]
# Ignored: [URL]http://example.com/site/robots.txt[/URL]
#
# For more information about the robots.txt standard, see:
# [URL]http://www.robotstxt.org/wc/robots.html[/URL]
#
# For syntax checking, see:
# [URL]http://www.sxw.org.uk/computing/robots/check.html[/URL]

User-agent: *
#Crawl-delay: 10
# Directories
Disallow: /database/
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /sites/
Disallow: /themes/
Disallow: /scripts/
Disallow: /updates/
Disallow: /profiles/
# Files
Disallow: /xmlrpc.php
Disallow: /cron.php
Disallow: /update.php
Disallow: /install.php
Disallow: /INSTALL.mysql.txt
Disallow: /INSTALL.pgsql.txt
Disallow: /CHANGELOG.txt
Disallow: /MAINTAINERS.txt
Disallow: /LICENSE.txt
Disallow: /UPGRADE.txt
# Paths (clean URLs)
Disallow: /admin/
Disallow: /aggregator/
Disallow: /comment/reply/
Disallow: /contact/
Disallow: /logout/
Disallow: /node/add/
Disallow: /search/
Disallow: /user/register/
Disallow: /user/password/
Disallow: /user/login/
# Paths (no clean URLs)
Disallow: /?q=admin/
Disallow: /?q=aggregator/
Disallow: /?q=comment/reply/
Disallow: /?q=contact/
Disallow: /?q=logout/
Disallow: /?q=node/add/
Disallow: /?q=search/
Disallow: /?q=user/password/
Disallow: /?q=user/register/
Disallow: /?q=user/login/

Thank you for your time!
 

Chris Z

Active Member
Messages
5,603
Reaction score
0
Points
36
I get the same as you, but one question. Why are you disallowing so many directories and files? I don't think I've ever seen a Robots.txt list this extensive.
 

dale.black

New Member
Messages
11
Reaction score
0
Points
0
Thank you Chris Z! I am running Drupal CMS. That is the default robots.txt file provided. If you are interested in Drupal, visit: drupal.org | Community plumbing. I love Drupal! Feel free to visit my personal website at Dale Black Online .

I think my robots.txt is all right. That would be something wrong with Google, but I am not sure. Thank you!
 

lambada

New Member
Messages
2,444
Reaction score
0
Points
0
There's your problem. As google is checking dale.be rather than good.dale.be it only sees the three line one. Robots.txt can only be in the root of the domain - hence why it sees the three liner rather than the full file.
 

dale.black

New Member
Messages
11
Reaction score
0
Points
0
You got it lambada! Thank you.

I registered the domain(good.dale.be) for free at afraid.org. I am afraid that the only thing I can to is try to contact the owner of dale.be.

So, robots.txt must be put under top level domains? I think it's time for me to buy a TLD...
 

lambada

New Member
Messages
2,444
Reaction score
0
Points
0
They can be under subdomains, as long as you tell google webmaster tools to look at the subdomain and NOT the domain. I,.E> telling it your site is good.dale.be . So in the first page of google webmaster tools - where it provides you the option to add a site then add good.dale.be not dale.be. That should then work. I haven't used webmaster tools in a while so it may not.
 

dale.black

New Member
Messages
11
Reaction score
0
Points
0
Have a look at the screenshot in the attachment. You can see that the site I was managing was Dale Black Online. The robots.txt URL was http://good.dale.be/robots.txt, but the content was dale.be's. Google seems to be smart all the time. However, this is really a stupid mistake to make. Their scripts need to be updated.

Thank you lambada! I am in P.R.China. The .cn domains are really cheap. 1 CNY for the first year and 55 CNY/year to renew. 1 USD = 7.6 CNY. I think I would buy one as soon as I finish testing my website. It's a pity that dale.cn is already in use. I have to think of another name. I hope that it can be short and easy to remember.
 

Attachments

  • sreenshot.jpg
    sreenshot.jpg
    190.3 KB · Views: 16

lambada

New Member
Messages
2,444
Reaction score
0
Points
0
Now that IS weird. I'm really not sure why it's doing that. May I suggest contacting Google or looking through their help documents. As this is definitely a problem on their end, I can't think of much that we can do.
 
Status
Not open for further replies.
Top