Dailydave mailing list archives

Re: AI

From: Sven Krasser <sven () crowdstrike com>
Date: Thu, 31 Mar 2016 21:00:01 +0000

Your article has it right, we will need to ask ourselves that question in what cases black box detections are 
desirable. In very general terms, in some cases there’s value in bringing instances to an analyst’s attention without 
further reasoning (specifically if there is context available e.g. in the form of more forensic information that can be 
accessed). In other cases interpretability is important (a very common problem in e.g. FinTech), and there are ML 
algorithms that allow some level of human interpretation.
-Sven
-- 
Sven Krasser, Ph.D.
Chief Scientist, CrowdStrike, Inc.
http://www.crowdstrike.com | http://tinyurl.com/cs-svenk

From:  <anton.chuvakin () gmail com> on behalf of Anton Chuvakin <anton () chuvakin org>
Date:  Thursday, March 31, 2016 at 1:14 PM
To:  "Smoak, Christopher" <Christopher.Smoak () gtri gatech edu>
Cc:  Sven Krasser <sven () crowdstrike com>, dave aitel <dave () immunityinc com>, "dailydave () lists immunityinc com" 
<dailydave () lists immunityinc com>
Subject:  Re: [Dailydave] AI

.... but don't you guys [used generally] agree that ML and friends brings up the challenge of "non-verifiability" to 
our domain  [that I whine about here, if you are curious 
http://blogs.gartner.com/anton-chuvakin/2015/03/03/killed-by-ai-much-a-rise-of-non-deterministic-security/]. 
Specifically, "because ML" argument is sometimes made not just by the marketing droid [eh...I guess we can't use the 
word "droid" anymore, because...hey...what if that marketing "person" is actually a narrow AI?], but legitimately due 
to the algorithms/models pointing at a particular outcome [like say "this binary is soooo bad"] without any 
explanation. So, whether you are in ElJefe/GRR/MIG + free ML library camp or in super-uber-hyper-expensive EDR product 
camp, the result is the same: the system is telling me something and I don't understand why.....

On Wed, Mar 30, 2016 at 10:58 AM, Smoak, Christopher <Christopher.Smoak () gtri gatech edu> wrote:
Sven,

I definitely understand your point. Approaching the "when you have a hammer…" phenomenon is most certainly an issue in 
the machine learning field, especially to your .fit() point below. As sure as I am that such an issue exists, I also 
think there's room for, improperly phrased, "non-traditional" applications of these types of techniques in order to 
achieve some goal. I just don't want to make the blanket statement that "if it isn't an image, <insert technique>" 
won't work. I realize that's not necessarily your point, but I wanted to add some conversation fodder to what I 
consider to be a really interesting thread.

Agreed 100% on the "because ML" argument; I see it way too often. Frankly, it hurts all "legitimate" (used liberally 
here) uses of ML in that everything gets wrapped up in the jargon/marketing lingo and can't see beyond it. We seem to 
live in an industry fraught with those types of things. My point is simply that I don't want to over-punish the 
terminology enough so as to devalue the real contributions that can be made to the field using ML, as an example. 
Employed carefully, there are definitely ways to use it for great justice. :)

Anyway, just wanted to get some more thoughts going on this topic, as I think it's worth a longer discussion, albeit a 
slight digression.

Regards,

Chris Smoak
Georgia Tech Research Institute

From: Sven Krasser <sven () crowdstrike com>
Date: Wednesday, March 30, 2016 at 1:31 PM
To: Christopher Smoak <Christopher.Smoak () gtri gatech edu>, dave aitel <dave () immunityinc com>, "dailydave () lists 
immunityinc com" <dailydave () lists immunityinc com>
Subject: Re: [Dailydave] AI

Hey Chris,

Carefully phrased, I am very skeptical that transforming your instances into images and then using CNNs will give you 
an out-of-the-box performance bump over other traditional techniques. To me this looks like a classic “When you have a 
hammer every problem looks like a nail” approach. Can we develop representations of input data that will allow deep 
architectures to successfully learn the instance space? Yes, I’m sure we can — but that will require more work than 
downloading TF and running it over the data as Dave described in his email.

As far as technology in commercial products goes, my point is that primarily it is important that a product performs to 
a specific objective standard, regardless of the technologies used. Explaining why something performs is indeed 
important, but the answer to this cannot simply be “because Machine Learning” as we see presently (and which I assume 
prompted Dave to send his initial email). Everyone with rudimentary Python knowledge can go download sklearn right now 
and call .fit() on the Iris dataset. Congratulations, you just used Machine Learning. That doesn’t make for a 
compelling product, however.

Best,
-Sven

-- 
Sven Krasser, Ph.D.
Chief Scientist, CrowdStrike, Inc.
http://www.crowdstrike.com | http://tinyurl.com/cs-svenk

From: "Smoak, Christopher" <Christopher.Smoak () gtri gatech edu>
Date: Wednesday, March 30, 2016 at 10:03 AM
To: Sven Krasser <sven () crowdstrike com>, dave aitel <dave () immunityinc com>, "dailydave () lists immunityinc com" 
<dailydave () lists immunityinc com>
Subject: Re: [Dailydave] AI

Sven,

Your general point is well taken, however I'd contend that while most problems in security don't boil down to simple 
image classification tasks, there are certainly valid ways of using the unique spatial nature of CNNs to apply to 
security problems. Namely, mapping data that is not traditionally visual in nature to that of an image representing 
that data (e.g. binary -> png) can—and in my experience, has—yielded very promising results. Granted, it's debatable 
whether it's better to utilize a technique more suited to the original data set in lieu of transforming it into an 
image, but that's a conversation for another day. The bottom line is finding a model that consistently gives good 
results in context of the question being answered.

On the point just caring about the results and not about the technology/process involved, I'm not sure I agree. When we 
get into extremely complex technologies that give us binary, "good/bad" answers to not-so-simple questions, I think 
it's imperative to understand the basis upon which the technology arrived at the answer. It may not be feasible with 
commercial (read: intellectual property) solutions but is nonetheless important. An example can be found in dynamic 
malware analysis systems, where understanding the perspective from which data is collected helps frame the efficacy of 
the result with respect to potential detection by malware.

Just some food for thought.

Chris Smoak
Georgia Tech Research Institute

From: <dailydave-bounces () lists immunityinc com> on behalf of Sven Krasser <sven () crowdstrike com>
Date: Wednesday, March 30, 2016 at 10:49 AM
To: dave aitel <dave () immunityinc com>, "dailydave () lists immunityinc com" <dailydave () lists immunityinc com>
Subject: Re: [Dailydave] AI

Hey Dave,

You got some things right and some things wrong. In security, most problems are not image classification related and do 
not benefit at the same level from the recent advances in Convolutional Neural Networks. Also, TensorFlow is not the 
first freely available Deep Learning library nor is it the first freely available Machine Learning classification 
library by a long shot. Take a look at e.g. some of the presentations that the MLSec Project made available, ML has 
been in security products for decades (and I worked on shipping products with it back in the day working at CipherTrust 
before people cared what technology stopped the threats as long as they were stopped). What’s new is that Machine 
Learning now also appears on marketing materials. So the question one should ask oneself is whether you still have a 
product once the ML hype wore off.

Best,
-Sven

-- 
Sven Krasser, Ph.D.
Chief Scientist, CrowdStrike, Inc.
http://www.crowdstrike.com | http://tinyurl.com/cs-svenk

From: <dailydave-bounces () lists immunityinc com> on behalf of dave aitel <dave () immunityinc com>
Date: Wednesday, March 30, 2016 at 5:56 AM
To: "dailydave () lists immunityinc com" <dailydave () lists immunityinc com>
Subject: [Dailydave] AI

There are only a few real computers in the world, and I think we are just beginning to feel their influence. For 
example, here is a sample project I am working on now that image classification is a solved problem.

Like many of you on this list, I dabble in brazilian jiu jitsu. In fact, in a week we are doing an open mat at 
INFILTRATE for both newcomers who've always wanted to try to choke me out, to people in the community who are already 
very good at choking people.

Like many sports, BJJ is typically scored according to a ruleset based on the different positions you end up in. Being 
on top is usually better. Being able to get on top after you are on the bottom is worth 2 points. Being able to 
completely mount someone is worth three points. Getting on their back is four points. Generally a tournament will hire 
judges and they will award points based on their understanding of the rules and their personal feelings towards the 
contestants and whatever other factors are floating in their heads.

What I'm working on is collecting a set of images of BJJ, then annotating them as to what positions the different 
people are in. This essentially maps every image into a vector space - and after training a neural network using modern 
techniques you can have a program that looks at an image and then outputs "Blue is in top mount". 

Part of the key here is that you don't have to tell it that the picture is BJJ. Every picture that program sees is two 
people doing BJJ. All it has to do is output what positions they are in.

And in the end, by assigning point values to transitions between positions, you will have an automatic BJJ judge. I've 
applied for a TensorFlow API key from Google since although this is not a hard problem by ML standards I want to do it 
the right way and get good scalable results on video later.

And of course, the same thing is true for the process information El Jefe will give you. All those "behavioral analysis 
machine learning intrusion detection" startups are about to be crushed by simple open source projects that use Google 
and MS and Amazon's exported Machine Learning APIs. 

-dave



_______________________________________________
Dailydave mailing list
Dailydave () lists immunityinc com
https://lists.immunityinc.com/mailman/listinfo/dailydave




-- 
Dr. Anton Chuvakin
Site: http://www.chuvakin.org
Twitter: @anton_chuvakin
Work: http://www.linkedin.com/in/chuvakin

Attachment: smime.p7s
Description:

_______________________________________________
Dailydave mailing list
Dailydave () lists immunityinc com
https://lists.immunityinc.com/mailman/listinfo/dailydave

Current thread:

Re: AI Anton Chuvakin (Apr 01)
- Re: AI Sven Krasser (Apr 01)
  - Re: AI Smoak, Christopher (Apr 01)