Enjoy unlimited access to all forum features for FREE! Optional upgrade available for extra perks.
Sedo.com

Wanted: Service spiders, how to block?

Status
Not open for further replies.

stevey

DNF Regular
Legacy Exclusive Member
Joined
Aug 23, 2004
Messages
679
Reaction score
0

kokopelli

Level 8
Legacy Platinum Member
Joined
Jul 21, 2004
Messages
1,001
Reaction score
1
I'm not too sure your suggested robots syntax is correct. I did a test robots.txt file as per your suggestion and ran it through an online validator and this is what I got:
Disallow: http://www.mysite.com/index.php?url=*
The "*" wildchar in file names is not supported by (all) the user-agents addressed by this block of code. You should use the wildchar "*" in a block of code exclusively addressed to spiders that support the wildchar (Eg. Googlebot).
You can't use an absolute URL. Please remove the "http://" and the domain name and insert just a file/directory full path, starting from the root directory (Example: /pagename.html).

The Disallow field has an inherent wildcard nature. The standard dictates that /bob would disallow /bob.html and /bob/index.html (both the file bob and files in the bob directory will not be indexed). Another example, Disallow: /help disallows both /help.html and /help/index.html, whereas Disallow: /help/ would disallow /help/index.html but allow /help.html

So perhaps your robots.txt file should rather just read:
User-agent: *
Disallow: /index.php?

References:
 
Status
Not open for further replies.

The Rule #1

Do not insult any other member. Be polite and do business. Thank you!

Members Online

Sedo - it.com Premiums

IT.com

Premium Members

MariaBuy

Our Mods' Businesses

UrlPick.com

*the exceptional businesses of our esteemed moderators

Top Bottom