Educause Security Discussion mailing list archives

Re: How much host data collected?


From: "Bridges, Robert A." <bridgesra () ORNL GOV>
Date: Mon, 30 Apr 2018 15:51:59 +0000

Alan,



* What's "host security data?"

I think we are interested in data that can be used for diagnosing both security-related incidents (intrusions, 
breaches), and misconfigurations. Any system logs, host-based IPS logs etc. we're interested in understanding.



* Are you breaking things out by service?

No, but we’ve been having a general understanding that workstation monitoring is generally different (higher number of 
IPs, less data per IP, different data) than servers.



* Are you considering the differences in OSes?

This has been a necessity. Folks we talk to generally collect greater sets of system logs for Windows than other OSes 
on workstations. Is that true in your (everyone out there?)’s operations?



* Is compressibility a factor?

I haven’t considered this. We’ve been defaulting to whatever the impact is for the resource (memory, HD, disk IO on the 
host device, and HD space to store logs if  you store logs) for whatever format it is in.



* Are you interested in event counts or raw byte counts for data?

We are interested in bytes per IP per day, and number of IPs. Ideally can estimate the cost of security using cloud 
costs (for each host we need x amount of memory to run the AV, y amount of disk space..), which is directly 
translatable to $.





Overall, our goals are to understand what host data is collected and how much (in terms of bytes per host per day). 
We're informing future research efforts. We'd like to know, e.g., the cost of security (e.g., how much memory on the 
host is used? How much HD space is used to store data? .. ) and then if we can find ways to lower the cost but increase 
the signal (e.g., only collect high fidelity data after some alert has tripped).



If anyone has more information about what and how much data you collect, we’d be interested. Or if there are ideas for 
next-generation tools research can pursue—e.g., turning on audit logs only after some event?



Similarly, if anyone can give costs of an intrusion, that’d be interesting for estimating the opposite side of the 
coin, i.e., when security is insufficient.



Thanks

Bobby



--

Robert A. Bridges, PhD, Research Mathematician, Cyber & Information Science Research Group, Oak Ridge National 
Laboratory

On 4/26/18, 3:37 PM, "Alan Amesbury" <amesbury () oitsec umn edu> wrote:



    On Apr 19, 2018, at 20:32 , Bridges, Robert A. <bridgesra () ORNL GOV> wrote:



    > What is the average amount of host security data your SOC collects per host, per day?

    [snip]



    It's hard to say without knowing the full extent of what "security data" entails.  Some questions that come to mind 
include:



                * What's "host security data?"  There's a great deal of overlap between "security" and

                  "operations" as far as I'm concerned.  For example, log data generated by the latter

                  domain will almost certainly contain information the former domain finds interesting.

                  However, others might consider system logs to not be "security data."



                * Are you breaking things out by service?  I'm also not sure whether "average" will

                  suffice as a reasonable measure, given that a web server's logs are going to likely

                  be very different than logs from another kind of server, e.g., mail, DHCP, LDAP,

                  domain controller, etc.  Workstations (i.e., users' hosts) are also an entirely

                  different category (maybe multiple ones?), too.



                * Are you considering the differences in OSes?  Different OSes also log at

                  significantly different levels depending on their settings.  Windows hosts, for

                  example, can produce MASSIVE amounts of data when compared to a Unix host.



                * Is compressibility a factor?  Some log formats are binary, which may not compress

                  very well.  Text formatted logs may compress *extremely* well, at better than 10:1.



                * Are you interested in event counts or raw byte counts for data?  There's a vast

                  difference between storing 1000 events and storing 1000 bytes of event data.





    Data can generally be stored pretty cheaply.  Filesystems like ZFS can provide transparent data compression and 
scale to pretty large sizes while maintaining data integrity (it checksums the data, checksums the metadata, and then 
checksums the checksums, if I recall correctly, and can use distributed parity to reconstruct corrupted data).  If 
you're talking about being able to *use* the data, then costs tend to go up.  Tools available can range from about zero 
software costs to thousands or millions of dollars depending on scale, ease of use, and a host of other factors.



    That said, I might be able to give you a rough idea of what we see in terms of event counts by several different 
sources, although it might make more sense to discuss those specifics off list.





    --

    Alan Amesbury

    University Information Security

    http://umn.edu/lookup/amesbury







Current thread: