Vulnerability Development mailing list archives

Re: Another new worm??? (technical)


From: pierre () DATARESCUE COM (Pierre Vandevenne)
Date: Thu, 22 Jun 2000 18:02:13 +0200


Hoping purely technical comments are ok :-)

On Wed, 21 Jun 2000 09:12:49 -0400, Bennett Todd wrote:

databases publicly available in a publicly-documented format, for

It would be useless - let me try to explain why

You have developed filters. That is great for the moment, and the best
thing to do.

At some point, you'll notice that when you have to apply 100 or 1000
filtering rules to each and every message - there will be a performance
hit on your server. This is exactly what happened with conventional
signature based conventional anti-virus scanning. Let's not even talk
about a situation where you have to apply 10.000 filtering rules to
10.000 e-mails each day. Don't forget that most harmless e-mails will
by definition have to go through the entire set of rules.

Could the situation become bad enough that it requires that many rules
?

I believe it could

1) that is what happened with simple and then complex dos based
infectors
2) that is what happened with macro-viruses
3) it is extremely easy to implement simple variations, using an array
of topics for example (if you scan all the message, the performance
problem worsens.

What can be done about it ?

The best way to improve performance is to use some kind of hashing.
Calculating a hash on the incoming topic and checking if the hash
collides with the hashes of the known worms is much faster. That is how
a-v scanners worked with conventional viruses (and the binary
representation of macro-viruses). That is why a scanner hunting 40.000
virus in 10.000 files is not 400 times slower than a scanner hunting
1000 viruses in 1000 files. That is also why virus generator's
production could often be recognized globally by one hash. But that is
more complex than taking a string from the topic. Takes more time and
more checks for false alerts. More work and sweat in perspective.
Anyway, lets assume it can be done.

End of the story ?

Not at all

"Love Letter", "Lover  Letter", "Love _ Letter", "Re:Love Letter",
"FWD: Love _Letter"

are different, yet sufficiently close to be virtually identical for a
human being. We have the basics of polymorphism here, which means that
you have to adapt your hashing algorithm to take that kind of variation
into account. For example, virus scanners keep a map of the location of
the constant bytes along with the hash. Now this really becomes to look
like a major project. Consider that if the pattern matching method is
totally open source, worm writers will also understand how it works and
will try to make it complex or slow for the writer.... Now, you really
have a second job, have to hire people, pay them and may consider
starting an anti-virus company :-)

If we have a public table of virus detection  "strings", anyone can
consult it and purposefully avoid detection in a new virus or modify an
existing one.

In the case of executable viruses, advanced polymorphism was such a
problem that it could only be solved through emulation - the fact that
the virus writer community as a whole never quite understood its fine
points, strength and weaknesses helped decide the war - a war which the
anti virus side clearly won. Advanced polymorphism is hard to implement
in e-mails worms and the papers I have seen recently on the topic have
left me less than convinced.... but who knows.

What I described above is just one way of tackling the problem. Some
will prefer this hashing method, others will use this mapping method,
etc... These issues will have to be adressed

Even if they wanted, anti-virus companies could not release a database,
because each anti-virus relies on a different set of

* simple string matching
* procedural detection
* hash based method or similar applied to data maps
* emulation
* hints : when to emulate, when to stop, what to do with the result
etc...
* ole parsing methods

etc...

You might find interesting to know that in the late 80s and the early
90s there was an anti-virus program that did just that - have a public
set of detection strings - the program and its set of strings can still
be found on file archives through search engines ( look for HTSSCAN ) -
The dutch guy and the community of people who developed it dropped out
of the race at the second stage (many viruses, early polymorphism).
They just could not keep up. They were competent, they were given virus
samples, but they had started their project on a set of assumptions
that the evolution proved to be wrong..

Maybe the time is right for an open source project, maybe it will work
because more people will join it, maybe the evolution will be
different... but it will not be a simple "give me the sample and I will
add a line to my filter" story.


---
http://www.datarescue.com/idabase/ida.htm
IDA Pro 4.1 - Yes, we have done it again !



Current thread: