Security Basics mailing list archives

Re: IIS Logfile


From: Miles Stevenson <miles () mstevenson org>
Date: Tue, 26 Oct 2004 01:27:26 -0400

Hello mfernandez,

<snip>
2004-10-25 04:16:46 64.246.165.10 - W3SVC1 FILESERVER xxx.xxx.xxx.xxx 80
GET /robots.txt - 401 5 0 www.mydomain.com SurveyBot/2.3+(Whois+Source)
http://www.whois.sc/
2004-10-25 04:16:46 64.246.165.10 - W3SVC1 FILESERVER xxx.xxx.xxx.xxx 80
GET / - 401 5 0 www.mydomain.com SurveyBot/2.3+(Whois+Source)
http://www.whois.sc/mydomain.com
<snip>

First I'll explain what this means, then I'll answer your questions:

This log is being generated by a script identifying itself as 
"SurveyBot/2.3+". These scripts are web "crawlers" or "robots". Basically, 
they follow links throughout the internet, grabbing information, and indexing 
the information for search engine use. Expect to be hit regularly by the 
bigger search engines such as Google and Yahoo, especially if your site is 
large and popular. If you want to find out more about this particular bot, 
then Google is your friend (and mine too: 
http://www.whois.sc/info/webmasters/surveybot.html).

You will notice that the first file they are looking for is called 
"robots.txt". This means that this particular "crawler" (SurveyBot), is 
playing nice, and behaving the way it should. The "robots.txt" file is a 
standard way for website administrators to tell these "robots" about 
particular pages that you do NOT want indexed. For example, if you didn't 
want "network-info.html" to end up in search engines, you would add this file 
to your "robots.txt" and the crawlers that behave nicely will ignore your 
network-info.html page.

Take note though, that the badguys will NOT honor this (duh!). In fact, the 
bad guys know that any web URL you put in your "robots.txt" file, are pages 
that you don't want lots of people to see, which is exactly why the badguys 
want to see them. Lots of "newbie" blackhats will scan for robots.txt files, 
looking for interesting web pages (there are lots of automated scripts that 
do this for the "kiddies"). Of course, every skilled administrator should 
know better than to put sensitive material on publicly accessible web pages, 
robots.txt or no robots.txt!

Moral of the story: If you don't want people to see it, don't make it public, 
and you won't need to worry about it in the first place.

Here is a fun trick that my company uses (I wish I could take credit for the 
idea but I can't)  and finds very effective: use the robots.txt concept as a 
"honeytoken". Here is what you do: 

1. Set up a dummy html page publicly accessible on your site, and give it an 
interesting but hard to guess filename, such as "admin44687-secret.html". You 
don't even need to put any info on the page. You can just leave it blank. But 
you should have it call a script (we'll get to that script in step 3).

2. Add this to your robots.txt file. You now know that anyone who accesses 
"admin44687-secret.html" is trying to look at something they KNOW they are 
not supposed to. There are NO false positives here (hence "honeytoken").  
Anyone who accesses this is BAD. Period. Valid web-crawlers will ignore this 
page since you listed it in robots.txt. When the kiddies DO go to this page, 
your script is called:

3. Your magic script gets the bad guys source IP address, and automatically 
adds the IP to a temporary "blacklist". Maybe he gets blocked at your 
firewall for a week, a month, whatever you want (although it sounds like you 
are a Windows shop, which limits your flexibility quite a bit. I wouldn't 
even begin to know how to accomplish this with Windows, anyone else on the 
list care to make a suggestion?). You could even have your fake "admin" page 
display a message along the lines of:

 "You are now blocked from our site. If you were just screwing around and 
don't want to be blocked, send us an email and we MIGHT let you back in by 
our good graces."

This is fun stuff!

And now for your question (which you can probably answer yourself by now if 
you've been paying attention):

I understand that some "whois" site is checking my server, but Is this
dangerous? Should I block this IP?

Dangerous: No, not really. Not unless you are actually putting VALID pages in 
your robots.txt file that you really DON'T want others to see. Remember, you 
shouldn't be doing this. If you don't want people to see it, don't put it on 
the Internet! Otherwise, treat this as normal web traffic.

Should you block the IP's? No, not really. Most of these are valid web 
crawlers like the Googlebot. You DO want people to be able to find your site 
via Google and Yahoo and all the others don't you? Again, if you WANT to get 
fancy and set up a "honeytoken" with this, then it can be a lot of fun. But 
it seems to me that this would be difficult or near impossible on Windows 
platforms. And while fun, this is definitely NOT a "necessary" defense tool. 
This is more like "icing on the cake", a cake made out of a very solid 
foundation of effective security measures. Concentrate on the fundamental 
stuff first, like strong firewall filters, good network design, system 
hardening, patching, anti-virus, and all the other REALLY important stuff 
that really boring and geeky security people (like me) keep trying to drill 
into the public. Don't get fancy until you are really good at the fundamental 
stuff, because this is where your biggest "bang for your buck" is. The fancy 
stuff on top are much smaller gains.

Have fun.
-- 
Miles Stevenson
miles () mstevenson org
PGP FP: 035F 7D40 44A9 28FA 7453 BDF4 329F 889D 767D 2F63

Attachment: _bin
Description:


Current thread: