BreachExchange mailing list archives

Practical applications of machine learning in cyber security


From: Audrey McNeil <audrey () riskbasedsecurity com>
Date: Fri, 15 May 2015 13:19:03 -0600

http://www.net-security.org/article.php?id=2286

As more and more organizations are being targeted by cyber criminals,
questions are being raised about their planning, preparedness, and
investment into cyber security in order to tackle such incidents. The
adoption of cloud technologies and the invasion of social media platforms
into the workspace have added to the problem. Experts believe that most
organizations’ cyber-security programs are not a match for the attackers’
persistence and skills. Does the answer to this problem lie in machine
learning and artificial intelligence?

Why traditional approaches are failing

Traditional security systems are passive, and a small code change by the
attackers can lead to even the most secured networks being breached. And
even if a threat is detected, a valuable and prompt alert sent by these
systems is often just one amongst hundreds of false ones generated on daily
basis. In the majority of security breaches, post-attack analysis carried
out by cyber security experts reveals that attackers had just to tweak the
malware code a bit to get past the organizations’ cyber defenses.

The problem lies in the fact that most of the current security systems rely
primarily on static knowledge. They are designed to detect malware, spot
intrusions, and discover data theft, but only based on signatures present
in their database. Of course, this signature database can (and should) be
updated regularly, but for all that, it will still only contain signatures
for known malware. Given the sophistication of modern day multi-vectored
threat attacks, we need to devise a cyber-security solution based on
emerging technologies such as machine learning, which has raised
considerable interest among cyber security experts in recent years.

How cyber security and machine learning intersect

The fundamental principle of machine learning is to recognize patterns that
emerge from past experiences and make a prediction based on them. This
means reacting to a new, unseen threat based on past know-hows, i.e. a
known data set. Past experiences can be a pre-defined set of examples or
“training data” from which program “learns” and develops the ability to
react to new, unknown data.

Still, any quality solution has to incorporate predictive modeling with
expert input and data mining. It’s unwise to believe that machine learning
can entirely replace the human element, but it can be very effective in
narrowing down the threats so that network analysts can focus on analyzing
only the serious ones.

Real-world application of machine learning in cyber security

An organization’s networks can be compromised through a variety of attacks.
The most common and serious network security threats are brute-force
attacks, intrusions, and DDoS attacks. How can, for example, machine
learning be used to prevent this last type of attack? In a research project
carried out by Internetwork Research Department in BBN Technologies, the
task was divided into three steps:

1) Detect network traffic flow that can compromise the botnet command and
control infrastructure,

2) Group the traffic flows from the same botnet by correlating them with
each other, and

3) Identify the command and control host, which should help to identify the
attack host.

Machine learning techniques were used to identify the command and control
traffic of IRC (Internet Relay Chat)-based botnets. The task was split into
two stages: (I) distinguishing between IRC and non-IRC traffic, and (II)
distinguishing between botnet IRC traffic and real IRC traffic. In stage 1,
the Naïve Bayes classifier was found to perform best with low false
negative and false positive. In stage 2, telltales of hosts were used to
label the traffic as suspicious and non-suspicious.

The results of the research indicated that machine learning techniques can
indeed distinguish the subtle differences in the IRC flows. However, one of
the challenges in using this technique is the availability of an accurately
labelled sample data set for training and testing. The research proved to a
large extent the applicability of machine learning techniques for
identifying compromised hosts.

This research is based only on predictive modeling. An effective machine
learning solution that will go into production should also use expert
inputs combined with predictive modeling. Companies can use these
technologies to detect imminent risks and alert IT administrators before
the breach happens.

Conclusion

Traditional cyber security applications are built on rules, signatures, and
fixed algorithms, and can act only based on the “knowledge” that has been
fed to them. In the event of a new, previously undetected threat, these
applications may fail to spot it. Machine learning applications, on the
other hand, are based on “learning” algorithms, which check a continually
increasing data set.

Machine learning-based applications can also be used to ward off insider
threats. They can collect data from an employee’s system and study them to
find anomalous behavior. As more and more companies each year fall victim
to security breach, it’s time for enterprises to adopt next-gen security
solutions based on machine learning to perfect their cyber security defense.
_______________________________________________
Dataloss Mailing List (dataloss () datalossdb org)
Archived at http://seclists.org/dataloss/
Unsubscribe at http://lists.osvdb.org/mailman/listinfo/dataloss
For inquiries regarding use or licensing of data, e-mail
        sales () riskbasedsecurity com 

Supporters:

Risk Based Security (http://www.riskbasedsecurity.com/)
YourCISO is an affordable SaaS solution that provides a comprehensive information security program that ensures focus 
on the right security.  If you need security help or want to provide real risk reduction for your clients contact us!

Current thread: