2003-04-15 09:03:17.659455-07 by Dan Lyke 2 comments
I finally figured it out! Weblogs.com blocks wget. The error message is entirely non-intuitive,
Your crawler is hitting our servers too hard. Please slow down, it's hurting the service we provide to our customers. Thanks.
and I'd been trying to figure out why I was getting the error sometimes and not others, and from different IP addresses. Sigh. Oh well, lacking a clue bat I guess I'll have to code up something simple that gives a personalised client name in Perl.
[ related topics: Software Engineering Perl Weblogs Open Source ]
comments in ascending chronological order (reverse):
#Comment made: 2003-04-15 09:36:51.456214-07 by: Mark A. Hershberger
Thanks! I was getting this from doc.weblogs.com
Shouldn't you be able to cloak with "-U"?
#Comment made: 2003-04-15 10:06:11.091444-07 by: Dan Lyke
It was simple enough to use LWP::UserAgent
, and that's probably the right way to do things anyway. I didn't bother making my wget command line any more complex. Oddly, the LWP::UserAgent
"GET" command, with a default string, worked just fine, but now I've got a unique user agent string and I don't hit the server more frequently than 2 hours.
We will not edit your comments. However, we may delete your comments, or cause them to be hidden behind another link, if we feel they detract from the conversation. Commercial plugs are fine, if they are relevant to the conversation, and if you don't try to pretend to be a consumer. Annoying endorsements will be deleted if you're lucky, if you're not a whole bunch of people smarter and more articulate than you will ridicule you, and we will leave such ridicule in place.
Connectivity provided by highertech.net , awesome bandwidth, well away from fault lines and other potential for natural disasters, reliable, and run by cool people.
Questions, comments, flames: contact Dan Lyke
Flutterby™ is a trademark claimed by
Dan Lyke for the web publications at www.flutterby.com and www.flutterby.net.