Dailydave mailing list archives

Re: Machine Learning and Dimensions and stuff


From: William Kupersanin <wkupersa () gmail com>
Date: Sat, 22 Nov 2014 10:46:08 -0500

Yes. Sorry. I'll try to elaborate with an example.  Not really supervised
or unsupervised learning but I think that it gets to the point..... Say for
example that one of my indicators is low probability parent child process
relationships that lead to potential high risk applications such as
powershell or reg. I haven't tried this one, but I am willing bet that the
intersection of these conditions is a fairly high signal to noise ratio.

If I understood Halvar's comment, as this becomes well known, the adversary
will take measures to avoid "tripping" this analytic.   Thinking about the
ways that they might approach this, they might inject code into common
processes in order to avoid the odd parent child relationship, or they
might bring their own tools to avoid using powershell or reg.

My response is two fold:  Either of these scenarios is also detectable and
it becomes an arms race of sensors and analytics versus techniques. Event
Tracing for Windows and various COTS tools already give you some of the
capabilities to detect.  The point that I poorly explained earlier is that
making the adversary work harder to stay on our systems is game changing.
I'd argue that we don't do a lot in this area and that the old techniques
for credential access, exfiltration, and command and control keep working.
When we can begin to challenge the adversary and make our systems contested
space, we raise their cost of information. Finally.

Hope this makes more sense,
--Willie
On Nov 22, 2014 4:30 AM, "shadown [at] gmail" <shadown () gmail com> wrote:

Willie, could you elaborate?
I'm interested in details, from vague statements we don't learn anything
new. Please remember this is not the physical world, and very different
rules apply.

Cheers,
  Sergio

On 21.11.2014, at 22:19, William Kupersanin <wkupersa () gmail com> wrote:


The implications are though, that even if the adversary adapts, that the
ML analytic is forcing the adversary to operate in a smaller space to avoid
appearing anomalous. I consider anything that can shift the balance of cost
from the defender to the adversary to be wildly successful.

--Willie

On Thu, Nov 20, 2014 at 5:25 PM, Halvar Flake <HalVar () gmx de> wrote:

Hey all,

thanks for the link, and it is indeed a fun talk :-)

An important detail that many people in "machine learning for security" neglect
is that the vast majority
of ML algorithms were not designed for (and will not function well) in
an adversarial model. Normally,
one is trying to model an unknown statistical process based on past
observables; the concept that the
statistical process may adapt itself with the intent of fooling you isn't
really of interest when you try to
recognize faces / letters / cats / copyrighted content programmatically.

For entertainment, I think everyone that plays with statistics / curve
fitting / machine learning in our field
should have a look at two things:

    http://cvdazzle.com/ - people trying crazy makeup / hair styles to
screw with face detection.
    http://blaine-nelson.com/research/pubs/Huang-Joseph-AISec-2011 - a
riot of a paper that introduces "Adversarial Machine Learning"

This doesn't mean that you can't have huge successes temporarily using ML
/ curve fitting / statistics;
attackers haven't felt the need to adapt to anything but AV signatures
and DNS blacklisting yet, so relatively simple
ML will have big gains initially. I suspect, though, that a really
important part of using ML for defense in any form
is "not becoming an oracle" - which is often counter to commercial
success. It may be that the only good, long-term
ML-based defense is one that can't be bought.

Cheers,
Halvar




*Gesendet:* Donnerstag, 20. November 2014 um 19:16 Uhr
*Von:* "Dave Aitel" <dave () immunityinc com>
*An:* dailydave () lists immunityinc com
*Betreff:* [Dailydave] Machine Learning and Dimensions and stuff
https://vimeo.com/112322888

Dmitri pointed me at the above talk which is essentially a good
specialized 101-level lecture on how machine learning works in the
security space.

There's not much to criticize in the talk! (It has a lot of the features
of El Jefe!) They use a real graph database to run their algorithms
against process trees - but if you wanted to heckle you'd ask "Doesn't
the CreateProcess() system call also take "parent process" as an
argument? What IS the rate of false positives? Because if you can't get
it down to basically 0 then you are essentially wasting your time? etc."
:>

But again, nobody asked any hard questions - and while the talk nibbled
around the edges of the tradeoffs with using machine learning techniques
on this kind of data, it didn't go into any depth at all about which
ones they've tried and failed at. It's a technical talk, but it's not a
DETAILED talk in the sense of "Here's some outliers that show us where
we fail and where we succeed and perhaps why".

That said, if you don't have a plan to do this sort of thing, then
you're probably failing at some level, so worth a watch. :>

-dave


_______________________________________________
Dailydave mailing list
Dailydave () lists immunityinc com
https://lists.immunityinc.com/mailman/listinfo/dailydave

_______________________________________________
Dailydave mailing list
Dailydave () lists immunityinc com
https://lists.immunityinc.com/mailman/listinfo/dailydave


_______________________________________________
Dailydave mailing list
Dailydave () lists immunityinc com
https://lists.immunityinc.com/mailman/listinfo/dailydave


_______________________________________________
Dailydave mailing list
Dailydave () lists immunityinc com
https://lists.immunityinc.com/mailman/listinfo/dailydave

Current thread: