Flutterby™! : virtual hosts and bots

Next unread comment / Catchup all unread comments User Account Info | Logout | XML/Pilot/etc versions | Long version (with comments) | Weblog archives | Site Map | | Browse Topics

virtual hosts and bots

2000-10-06 12:18:42+00 by Dan Lyke 2 comments

Dave Winer asks about search engines: About web crawlers, they are getting vicious. Tens of thousands of hits a day on our servers. We have a theory that they don't know about virtual domains. When they decide to go back to a server, they should use the IP address, not the domain name. Discounting that I think the addition "Host:" field in HTTP 1.1 was evil and destroyed the 'net, and that I don't know what the real patterns that Dave's seeing are, I don't understand how this would work. They're looking for files accessible on that virtual domain, right? I mean, maybe there's some load balancing to be done, but I don't want Flutterby of the 5 or so virtual domains hosted on this box to be treated better than the rest. When you set up 10,000 virtual domains, you're saying "I want you to think that this is 10,000 different computers", and the search engines are treating it as such. Anything else seems just "you should read my mind, not my actions" posturing.

[ related topics: Dave Winer ]

comments in ascending chronological order (reverse):

#Comment made: 2002-02-21 05:30:23+00 by: anser

It makes no sense whatsoever to expect search engines to just use IP addresses. Named virtual hosting is a powerful feature, and that's what users will see if they visit. If Winer's servers are misconfigured, that's his problem. Any config person worth his salt could fix an excessive-hits condition in an hour or two.

#Comment made: 2002-02-21 05:30:24+00 by: Dan Lyke

I did a little checking on the HTTP that *.editthispage.com sites were sending, and sent an email to Dave based on that. It appears to be fixed today, but the problem still exists at www.weblogs.com, ie:
> bash-2.03$ HEAD www.weblogs.com
> 200 OK
> Connection: close
> Date: Sun, 08 Oct 2000 04:56:22 GMT
> Server: UserLand Frontier/7.0b26-WinNT
> Content-Length: 18031
> Content-Type: text/html
> Expires: Mon, 01 Jan 1990 01:00:00 GMT
> Client-Date: Sun, 08 Oct 2000 04:54:08 GMT
> Client-Peer:
> Title: Weblogs.Com Home
> bash-2.03$