Vulnerability Development mailing list archives

RE: Possible DOS against search engines?


From: "jasonk" <jasonk () swin edu au>
Date: Tue, 4 Feb 2003 11:28:29 +1100


Responses inline...

jasonk

-----Original Message-----
From: Rob Shein [mailto:shoten () starpower net]
Sent: Tuesday, 4 February 2003 10:45 AM
To: 'Philip Stoev'; vuln-dev () securityfocus com
Subject: RE: Possible DOS against search engines?

I see a few problems here.  Problems are listed below each concept,
for
clarity, and assume a decent webcrawler.


1. You create a generator for fake web pages, whose purpose
is to spit out HTML containing a huge amount of (pseudo)
random _non-existing_ words, as well as links to other pages
within the generator;

I doubt this would make even a slight dent in things.  Seeing as how
webcrawlers already walk the entire internet, with its various
languages,
enormous expanse, and endless misspellings, I think anything you could
create would end up being a drop in the bucket.


Agreed; I imagine most other "words" would be already indexed as
initials, abbreviations, etc etc

2. You place that generator somewhere and submit the URL to
search engines for crawling;

3. The search engines then crawls the site, possibly reaching
their pre-defined maximum of crawling depth (or, if badly
broken, crawl the site indefinitely, jumping from one freshly
generated page to another);

But they don't crawl indefinitely.  What do they do if they hit two
sites
that link to each other?  They notice this, and move on.
This can be addressed by a dynamic generator.
http://www.evilserver.com/dynamicwordgenerator/adsf97erncv

This page would link to a randomly generated series of characters that
are all in the directory of /dynamicwordgenerator/ and hence the server
just replies to anything in /dynamicwordgenerator/ with another dynamic
random load of rubbish and a few more randomly generated links.

4. Upon adding the gathered words to the search engine's
index, the index becomes heavily overloaded with the newly
added words, as they are outside of the real-language words
already present in the index. The following should be
theoretically possible:

But who would search on them?

Irrelevant; if the search engines are so heavily overloaded, searches
will take some time to trawl through huge databases.  But as said above,
it will be nothing more than a drop in the ocean.
 
    - craft fake words so that they attack a specific hash
function. Make a bunch of fakes that hash to the same value
as a legitimate word in the English language. This will
possibly impact the performance of search engines using that
particular hash function when they try to look up the
legitimate words that are being targeted.
I don't understand this one ?

This would be noticed by the search engine long before it became a
real
problem, and it would be addressed.  This is how they deal with many
things,
including people who try to influence their ranking using various
means.
Yep.

    - craft fake words so that they disbalance a b-tree
index, if one is used. I am not entirely sure, however it
appears to me that it is possible to craft words in such a
way as to alter the shape of the b-tree and thus impact the
performance on the lookups where it used.

    - craft fake words randomly so that the index just grows.
To the best of my understanding, most search engines will
index and retain keywords that are only seen on one web page
in the entire Internet. However, I think the capacity of the
search engines to keep track of such one-time non-English
letter sequences is limited and can be eventually exhausted.

It is my belief that, again, they will notice the impact on their
database
and quickly address the issue.  What about a bit of code that states
that
if
more then 5% of the words in a page are unique in the database, that
that
page is dropped?

If the above-mentioned things are feasible, then one can even
construct a worm of some sort, that will auto-install such
fake page generators on valid sites, thus increasing the
traffic to the crawler even more. Writing an short Apache
handler meant to be silently installed in httpd.conf at
root-kit installation should not be that difficult. When is
the last time your reviewed the module list of your Apache?
Will you spot a malicious module if it is called
mod_ip_vhost_alias, loaded inbetween two other modules that
you never knew are vital or not?

No, but I'd notice an abrupt lack of space on my web server.  And the
sudden
oddly-named URLS in my logs.  And the corresponding oddly-named pages
in
my
site.  And if I didn't notice, my hosting provider would.

Dynamic.  No lack of space, and no oddly-named pages.  Ff it were a old
vuln based worm such as the recent sql worm, I doubt that many of those
admins would be looking at their logs... 

Please note that the setup described differs from the
practice of generating fake pages containing a lot of real
(mostly adult) keywords. After all, such real-language words
already exist in the index, whereas I suggest bombing the
index with a huge number of not-previously-existing
freshly-generated random letter sequences. Also, please note
that the purpose of the attack is to damage the index, and
not to make the crawler consume bandwidth by going in an
endless loop or something like that (though, the crawler has
to scan the pages first so that the generated keywords are
ultimately delivered to the index).

I will appreciate any and all thoughts on the issue.
As you said, you'd have to have bandwidth -- though I don't see it
having the same effect on the internet as the sql worm did -- but as
spiders and the like are (and if they're not, they should be)
deliberately limited as to the rate of requests they make, there should
be little issue.

Another option to counter the issue of words is to use a dictionary and
just pump random words in; this will clog the databases.  Though you'd
have to do it *mighty* quickly for them not to notice.  I think google
takes over a month before it ends up getting back to indexing the same
site.

Maybe, since it's a worm, you'd have the 'source' web server
installation which sends it's worm code to the 'destination' web server.
Each time you get a successful infection, that address is added to the
list of servers, and you can use this address to generate bad pages as
well?  So as well as www.evilserver.com generating pages, you've got
some increasing number of servers doing so ... maybe as a side effect
you'd increase the "backlog" of sites needing to be indexed.  Again I
doubt it'd be long before they noticed this.

Philip Stoev





Current thread: