WebApp Sec mailing list archives

Re: What are we trying to "Benchmark" anyway? Report color, length, number of red exclamation points....

From: Eoin Keary <eoinkeary () gmail com>
Date: Thu, 6 Oct 2005 08:47:34 +0000

Hiya Arian,
If you choose a "test" application such as Foundstones Hackmebooks to
run a number of applications against the benchmark is the bake off
against each of the tools.
We can evaluate which tools produced the most false positives and even
worse False negatives (biggest problem IMHO).

I did a bake off between ScanD* and AppSca* before and we compared to
a manual pen test performed by a "good" tester.

What we got was a comparison between the two tools and also a
comparison between the tools and a human.

This was done on a number of "real" applications picked for complexity and size.

Without the third control (manual test) how do we know if the results
are accurate for either tools?

If we don't have a manual aspect to compare against we can approach
the bake off by testing an application with known vulnerabilities such
as the apps mentioned before.

SP* Dy**** give you the option to test against their webapp but this
is useless as the app is probably tailored to the tool (which is
easier than the tool being tailored to the app).

In general tools are for "the easy stuff" they do not understand
workflow or the logic of the app. They can not handle dynamic URL's:
One app i tested recently with a number of tools got no results at all
but after testing manually it had many XSS issues. These were
exploited only by understanding the workflow.


I don't have any "big words" to explain this carry-on but blame the
marketing staff who know much about very little and very little about
much.

Eoin (OWASP - Ireland).


On 05/10/05, Evans, Arian <Arian.Evans () fishnetsecurity com> wrote:

"In the computer industry, there are three kinds of lies: lies, damn lies, and benchmarks."

The Jargon File, 4.4.7, Eric S. Raymond
http://www.catb.org/~esr/jargon/html/B/benchmark.html

I agree entirely with Ofer's first paragraph below, but the example
in the second paragraph isn't a "benchmark".

If you want to know the "quality" of a web application security tool,
scanner, WAF, source code analyzer, pixie dust, "enterprise anomaly
detector", you mean the term 'quality' in regards to the ability
of that tool to accomplish some goal.

My goal in evaluating automated testing tools may be roughly defined:

"Ability to accurately identify software defects that lead to specific
security implications of the kind that a tool claims to detect."

I /am/ interested in a global catalogue of software security defects
that lead to security implications, and evaluating tools against
the kind and degree of catalogue defects they detect.

However, at this state in the industry there is no global catalogue,
no consensus, and I am more interested in evaluating whether or
not a vendor that claims to detect/block XSS attacks does do so.

There are several types of "benchmarks" that have value:

1) What does a widget do on the applications you need to use it on?
2) What does a widget do in relation to its claims?
3) What does a widget do in relation to its competitive landscape?

"Additionally, you will not know in advance what the security problems
are (but than this is the reason to choose this method: neither will
the tool makers)."

Downloading an opensource (src='random') package and running a tool
against it to analyze results provides a benchmark of the type:

4) What unquantified results can this widget produce from random input?

Unless you go through that application manually and know *exactly*
what it is you are looking for I find limited value in those results.

Again, though, we need to more clearly define the target here before
we go off shooting holes in random source.

note: I think most of us on this list agree here, but certain vendors
are still performing their tautological magic shows on pre-built apps,
or pulling the (src=$random) stunts for the uninitiated buyers.

-ae

-----Original Message-----
From: Ofer Shezaf [mailto:Ofer.Shezaf () breach com]
Sent: Tuesday, October 04, 2005 5:12 PM
To: Eoin Keary; Peine,Holger
Cc: webappsec () securityfocus com
Subject: RE: Good benchmark application for web security
testing tools?

Any single application that you select, especially a well known
benchmark application, would achieve biased results, as it is
VERY easy
to make a testing software work fine with a specific application.

A somewhat better solution would be to select (yourself) a web
application on sourceforge (neither the most popular nor the least
popular) and test against it. This approach has its problems. For
example, you will probably find a PHP application. Additionally, you
will not know in advance what the security problems are (but than this
is the reason to choose this method: neither will the tool makers).

~ Ofer

Ofer Shezaf
OWASP Israel Chair
http://www.owasp.org/local/israel.html

CTO, Breach Security
Phone (US): +1 (760) 268.1924 ext. 702
Phone (Israel): +972 (9) 956.0036 ext.212
Cell: +972 (54) 443.1119
ofers () breach com
http://www.breach.com

-----Original Message-----
From: Eoin Keary [mailto:eoinkeary () gmail com]
Sent: Tuesday, October 04, 2005 5:39 PM
To: Peine,Holger
Cc: webappsec () securityfocus com
Subject: Re: Good benchmark application for web security testing

tools?


hackmebank Or hackmebooks from foundstone?


On 04/10/05, Peine,Holger <Holger.Peine () iese fraunhofer de> wrote:

The idea of reviewing the available (free or commercial) web

application

security testing tools has been mentioned several times on this

list.

However, what would a good benchmarking application for

these tools
be,

i.e. a "typical" web application with a number of known

vulnerabilities?


Initially I was thinking of Webgoat, which at least has a nice

variety

of vulnerabilities, but Webgoat's structure is not very

representative

of your typical web application's structure and workflow

(and apart
from


that, Webgoat is somewhat small, too). So, what application would

you

suggest?

Thanks for your opinion,
Holger Peine

--
Dr. Holger Peine, Security and Safety
Fraunhofer IESE, Fraunhofer-Platz 1, 67663 Kaiserslautern, Germany
Phone +49-631-6800-2134, Fax -1299 (shared)
www.iese.fraunhofer.de/Staff/peine -- PGP key on request or via
http://pgp.mit.edu

Current thread:

What are we trying to "Benchmark" anyway? Report color, length, number of red exclamation points.... Evans, Arian (Oct 05)
- Re: What are we trying to "Benchmark" anyway? Report color, length, number of red exclamation points.... Eoin Keary (Oct 06)
- <Possible follow-ups>
- RE: What are we trying to "Benchmark" anyway? Report color, length, number of red exclamation points.... Evans, Arian (Oct 07)