Dailydave mailing list archives

Re: Assymetry

From: Josh Saxe <josh.saxe () invincea com>
Date: Mon, 11 Apr 2016 14:04:37 -0400

I figured I'd chime in as someone who builds security machine learning
models as part of his day job.  A few hopefully not-too-incongruous
observations:

1) Most security problems are not machine learning problems.  Like
encryption, dual-factor authentication, taint analysis, or hand-crafted
IOCs, machine learning is just one of many security tools.  But somehow
people outside of machine learning seem to think a) machine learning can be
applied everywhere and replace every other approach or b) machine learning
can be applied nowhere, always underperforms, and is marketing snake oil.
The people who believe a) are bound to be disappointed and the people who
believe b) are bound to be blindsided when they wake up and realize machine
learning has become an important ingredient in the network defense
landscape.

2) For a working security data scientist, much of the ingenuity to
developing a successful machine learning product is in picking problems
that *are* good machine learning problems and not going down the rabbit
hole of problems that aren't.  Unsupervised clustering of malware to help
identify new malware families or link threat actors -- that's a good
problem, and systems that do this are currently deployed to good effect,
but can probably be improved upon.  Detecting and classifying malware is
another good one that's already been productized but merits continued
research.  Setting firewall policy or predicting which users on a network
will commit treason or sell your trade secrets is not a good machine
learning problem and probably won't be in the foreseeable future, even
though I'll bet there are products on the market that claim to do these
things.

3) For a problem to be a good security machine learning problem you need a
continuously replenished source of good data, because security models go
out of date as adversaries evolve if the models don't evolve along with
them.  If you don't have good data at scale (and this includes *ground
truth* with respect to this data) machine learning is the wrong approach.
For example, because we don't have thousands of examples of employees going
rogue and selling trade secrets (at least I don't) a machine learning
approach to detecting such employees doesn't make sense.

4) To echo what Sven said, custom modeling for a given security
application, which involves mostly either feature engineering or custom
crafting of deep learning models that automate a portion of the feature
engineering process, is the main work of a security data scientist.  In my
experience, wholesale adoption of approaches from other fields never
works.  For one, the statistics of the problem are totally different: in
the detection use case, we tend only to care about the performance of a
model in the extremely low false positive rate region, which changes the
modeling goals from many non-security applications.  And secondly, security
is just different from computer vision, text mining, etc., and in my
experience requires custom solutions to perform well.

Best,
Josh


On Fri, Apr 1, 2016 at 9:59 PM, <Robin.Lowe () forces gc ca> wrote:

Good day all,



Just a couple things I thought of while reading the earlier discussion on
AI and this follow-up email. Just some, as Chris so eloquently put it
earlier, conversation fodder.



I think one thing we have to keep in mind is that the underlying framework
behind machine learning is still a machine. An issue I can see about this
is who is accountable for if it fails? If we’re talking about national
security, what’s the risk that someone will be willing to take on in order
to prove that their new machine learning intrusion detection system works
100% of the time? The number of hours that would be required to amass the
amount of data needed to seed the system would be substantial, even on its
own.



There’s also the possibility of false positives being generated by
erroneous data. Sure, an listening meterpreter shell on port 4444 is pretty
damn obvious, but what about, say, Cobalt Strike’s Beacon system? Will the
people developing the IDS need to spend thousands of dollars throwing all
of these expensive network auditing programs at it in order to generate the
data necessary to make it accurate even 90% of the time?



Also, the budget just for personnel would be pretty high. You’d need
people in R&D, maintenance, actually checking flagged intrusion attempts,
etc.



One last thing before I start in on the possible positives is that the
machine itself might be prone to exploitation. Similar to how getting into
domain controllers and hypervisors are pretty much endgame states, what if
you broke into the IDS itself and started messing with its signatures?
Seems like a few things to think about.



However, some cost-reducing factors are that it’s always looking. And
faster than a person can. Sure, there are some blue teams that are
basically machines at this point, I can definitely see a time where
machines can take over that facet of security.



You don’t have to pay it a salary, just keep the machine happy with
electricity and known behaviours and it’ll chug along.



Kind of starting to sound like an antivirus program but one that looks at
networks instead of files.



New to this sort of thing so sorry if I mentioned something that would be
considered common knowledge or just plain nonsense.



Cheers,



Leading Seaman/Matelot de 1re classe Robin Lowe



Naval Communicator, HMCS EDMONTON

Department of National Defence / Government of Canada

*Robin.Lowe () forces gc ca <Robin.Lowe () forces gc ca>* / Tel: 250-363-7940



Communicateur Naval, NCSM EDMONTON

Ministère de la Défense nationale / Gouvernement du Canada

*Robin.Lowe () forces gc ca <Robin.Lowe () forces gc ca>* / Tel: 250-363-7940


*“**The quieter you are, the more you are able to hear.”*



*From:* dailydave-bounces () lists immunityinc com [mailto:
dailydave-bounces () lists immunityinc com] *On Behalf Of *Dave Aitel
*Sent:* April-01-16 11:36 AM
*To:* dailydave () lists immunityinc com
*Subject:* [Dailydave] Assymetry



One possible long-lasting cause of the "asymmetry" everyone talks about is
that US defenders get quite high salaries compared to Chinese attackers (I
assume, not being a Chinese attacker it's hard to know for sure).



Just in pure "dollars spent vs dollars spent" it seems like it would be
three times cheaper to be a Chinese attacker at that rate?



But I think it's still a question whether or not machine learning
techniques make surveillance cheaper than intrusion as a rule. What if it
does? What would that change about our national strategy? (And if it
DOESN'T then why bother?)



-dave



_______________________________________________
Dailydave mailing list
Dailydave () lists immunityinc com
https://lists.immunityinc.com/mailman/listinfo/dailydave

_______________________________________________
Dailydave mailing list
Dailydave () lists immunityinc com
https://lists.immunityinc.com/mailman/listinfo/dailydave

Current thread:

Assymetry Dave Aitel (Apr 01)
- Re: Assymetry Sven Krasser (Apr 01)
- Re: Assymetry Robin.Lowe (Apr 11)
- Message not available
  - Re: Assymetry Josh Saxe (Apr 12)