Dailydave mailing list archives

Re: AI


From: Anton Chuvakin <anton () chuvakin org>
Date: Thu, 31 Mar 2016 13:14:41 -0700

.... but don't you guys [used generally] agree that ML and friends brings
up the challenge of "non-verifiability" to our domain  [that I whine about
here, if you are curious
http://blogs.gartner.com/anton-chuvakin/2015/03/03/killed-by-ai-much-a-rise-of-non-deterministic-security/].
Specifically, "because ML" argument is sometimes made not just by the
marketing droid [eh...I guess we can't use the word "droid" anymore,
because...hey...what if that marketing "person" is actually a narrow AI?],
but legitimately due to the algorithms/models pointing at a particular
outcome [like say "this binary is soooo bad"] without any explanation. So,
whether you are in ElJefe/GRR/MIG + free ML library camp or in
super-uber-hyper-expensive EDR product camp, the result is the same: the
system is telling me something and I don't understand why.....

On Wed, Mar 30, 2016 at 10:58 AM, Smoak, Christopher <
Christopher.Smoak () gtri gatech edu> wrote:

Sven,

I definitely understand your point. Approaching the "when you have a
hammer…" phenomenon is most certainly an issue in the machine learning
field, especially to your .fit() point below. As sure as I am that such an
issue exists, I also think there's room for, improperly phrased,
"non-traditional" applications of these types of techniques in order to
achieve some goal. I just don't want to make the blanket statement that "if
it isn't an image, <insert technique>" won't work. I realize that's not
necessarily your point, but I wanted to add some conversation fodder to
what I consider to be a really interesting thread.

Agreed 100% on the "because ML" argument; I see it way too often. Frankly,
it hurts all "legitimate" (used liberally here) uses of ML in that
everything gets wrapped up in the jargon/marketing lingo and can't see
beyond it. We seem to live in an industry fraught with those types of
things. My point is simply that I don't want to over-punish the terminology
enough so as to devalue the real contributions that can be made to the
field using ML, as an example. Employed carefully, there are definitely
ways to use it for great justice. :)

Anyway, just wanted to get some more thoughts going on this topic, as I
think it's worth a longer discussion, albeit a slight digression.

Regards,

Chris Smoak
Georgia Tech Research Institute

From: Sven Krasser <sven () crowdstrike com>
Date: Wednesday, March 30, 2016 at 1:31 PM
To: Christopher Smoak <Christopher.Smoak () gtri gatech edu>, dave aitel <
dave () immunityinc com>, "dailydave () lists immunityinc com" <
dailydave () lists immunityinc com>
Subject: Re: [Dailydave] AI

Hey Chris,

Carefully phrased, I am very skeptical that transforming your instances
into images and then using CNNs will give you an out-of-the-box performance
bump over other traditional techniques. To me this looks like a classic
“When you have a hammer every problem looks like a nail” approach. Can we
develop representations of input data that will allow deep architectures to
successfully learn the instance space? Yes, I’m sure we can — but that will
require more work than downloading TF and running it over the data as Dave
described in his email.

As far as technology in commercial products goes, my point is that
primarily it is important that a product performs to a specific objective
standard, regardless of the technologies used. Explaining why something
performs is indeed important, but the answer to this cannot simply be
“because Machine Learning” as we see presently (and which I assume prompted
Dave to send his initial email). Everyone with rudimentary Python knowledge
can go download sklearn right now and call .fit() on the Iris dataset.
Congratulations, you just used Machine Learning. That doesn’t make for a
compelling product, however.

Best,
-Sven

--
Sven Krasser, Ph.D.
Chief Scientist, CrowdStrike, Inc.
http://www.crowdstrike.com | http://tinyurl.com/cs-svenk

From: "Smoak, Christopher" <Christopher.Smoak () gtri gatech edu>
Date: Wednesday, March 30, 2016 at 10:03 AM
To: Sven Krasser <sven () crowdstrike com>, dave aitel <dave () immunityinc com>,
"dailydave () lists immunityinc com" <dailydave () lists immunityinc com>
Subject: Re: [Dailydave] AI

Sven,

Your general point is well taken, however I'd contend that while most
problems in security don't boil down to simple image classification tasks,
there are certainly valid ways of using the unique spatial nature of CNNs
to apply to security problems. Namely, mapping data that is not
traditionally visual in nature to that of an image representing that data
(e.g. binary -> png) can—and in my experience, has—yielded very promising
results. Granted, it's debatable whether it's better to utilize a technique
more suited to the original data set in lieu of transforming it into an
image, but that's a conversation for another day. The bottom line is
finding a model that consistently gives good results in context of the
question being answered.

On the point just caring about the results and not about the
technology/process involved, I'm not sure I agree. When we get into
extremely complex technologies that give us binary, "good/bad" answers to
not-so-simple questions, I think it's imperative to understand the basis
upon which the technology arrived at the answer. It may not be feasible
with commercial (read: intellectual property) solutions but is nonetheless
important. An example can be found in dynamic malware analysis systems,
where understanding the perspective from which data is collected helps
frame the efficacy of the result with respect to potential detection by
malware.

Just some food for thought.

Chris Smoak
Georgia Tech Research Institute

From: <dailydave-bounces () lists immunityinc com> on behalf of Sven Krasser
<sven () crowdstrike com>
Date: Wednesday, March 30, 2016 at 10:49 AM
To: dave aitel <dave () immunityinc com>, "dailydave () lists immunityinc com" <
dailydave () lists immunityinc com>
Subject: Re: [Dailydave] AI

Hey Dave,

You got some things right and some things wrong. In security, most
problems are not image classification related and do not benefit at the
same level from the recent advances in Convolutional Neural Networks. Also,
TensorFlow is not the first freely available Deep Learning library nor is
it the first freely available Machine Learning classification library by a
long shot. Take a look at e.g. some of the presentations that the MLSec
Project made available, ML has been in security products for decades (and I
worked on shipping products with it back in the day working at CipherTrust
before people cared what technology stopped the threats as long as they
were stopped). What’s new is that Machine Learning now also appears on
marketing materials. So the question one should ask oneself is whether you
still have a product once the ML hype wore off.

Best,
-Sven

--
Sven Krasser, Ph.D.
Chief Scientist, CrowdStrike, Inc.
http://www.crowdstrike.com | http://tinyurl.com/cs-svenk

From: <dailydave-bounces () lists immunityinc com> on behalf of dave aitel <
dave () immunityinc com>
Date: Wednesday, March 30, 2016 at 5:56 AM
To: "dailydave () lists immunityinc com" <dailydave () lists immunityinc com>
Subject: [Dailydave] AI

There are only a few real computers in the world, and I think we are just
beginning to feel their influence. For example, here is a sample project I
am working on now that image classification is a solved problem.

Like many of you on this list, I dabble in brazilian jiu jitsu. In fact,
in a week we are doing an open mat at INFILTRATE for both newcomers who've
always wanted to try to choke me out, to people in the community who are
already very good at choking people.

Like many sports, BJJ is typically scored according to a ruleset based on
the different positions you end up in. Being on top is usually better.
Being able to get on top after you are on the bottom is worth 2 points.
Being able to completely mount someone is worth three points. Getting on
their back is four points. Generally a tournament will hire judges and they
will award points based on their understanding of the rules and their
personal feelings towards the contestants and whatever other factors are
floating in their heads.

What I'm working on is collecting a set of images of BJJ, then annotating
them as to what positions the different people are in. This essentially
maps every image into a vector space - and after training a neural network
using modern techniques you can have a program that looks at an image and
then outputs "Blue is in top mount".

Part of the key here is that you don't have to tell it that the picture is
BJJ. Every picture that program sees is two people doing BJJ. All it has to
do is output what positions they are in.

And in the end, by assigning point values to transitions between
positions, you will have an automatic BJJ judge. I've applied for a
TensorFlow API key from Google since although this is not a hard problem by
ML standards I want to do it the right way and get good scalable results on
video later.

And of course, the same thing is true for the process information El Jefe
<https://eljefe.immunityinc.com/> will give you. All those "behavioral
analysis machine learning intrusion detection" startups are about to be
crushed by simple open source projects that use Google and MS and Amazon's
exported Machine Learning APIs.

-dave



_______________________________________________
Dailydave mailing list
Dailydave () lists immunityinc com
https://lists.immunityinc.com/mailman/listinfo/dailydave




-- 
Dr. Anton Chuvakin
Site: http://www.chuvakin.org
Twitter: @anton_chuvakin <https://twitter.com/anton_chuvakin>
Work: http://www.linkedin.com/in/chuvakin
_______________________________________________
Dailydave mailing list
Dailydave () lists immunityinc com
https://lists.immunityinc.com/mailman/listinfo/dailydave

Current thread: