Educause Security Discussion mailing list archives

Re: How much host data collected?


From: Alan Amesbury <amesbury () OITSEC UMN EDU>
Date: Thu, 26 Apr 2018 14:37:27 -0500

On Apr 19, 2018, at 20:32 , Bridges, Robert A. <bridgesra () ORNL GOV> wrote:
 
What is the average amount of host security data your SOC collects per host, per day?
[snip]

It's hard to say without knowing the full extent of what "security data" entails.  Some questions that come to mind 
include:

        * What's "host security data?"  There's a great deal of overlap between "security" and
          "operations" as far as I'm concerned.  For example, log data generated by the latter
          domain will almost certainly contain information the former domain finds interesting.
          However, others might consider system logs to not be "security data."

        * Are you breaking things out by service?  I'm also not sure whether "average" will
          suffice as a reasonable measure, given that a web server's logs are going to likely
          be very different than logs from another kind of server, e.g., mail, DHCP, LDAP,
          domain controller, etc.  Workstations (i.e., users' hosts) are also an entirely
          different category (maybe multiple ones?), too.

        * Are you considering the differences in OSes?  Different OSes also log at
          significantly different levels depending on their settings.  Windows hosts, for
          example, can produce MASSIVE amounts of data when compared to a Unix host.

        * Is compressibility a factor?  Some log formats are binary, which may not compress
          very well.  Text formatted logs may compress *extremely* well, at better than 10:1.

        * Are you interested in event counts or raw byte counts for data?  There's a vast
          difference between storing 1000 events and storing 1000 bytes of event data.


Data can generally be stored pretty cheaply.  Filesystems like ZFS can provide transparent data compression and scale 
to pretty large sizes while maintaining data integrity (it checksums the data, checksums the metadata, and then 
checksums the checksums, if I recall correctly, and can use distributed parity to reconstruct corrupted data).  If 
you're talking about being able to *use* the data, then costs tend to go up.  Tools available can range from about zero 
software costs to thousands or millions of dollars depending on scale, ease of use, and a host of other factors.

That said, I might be able to give you a rough idea of what we see in terms of event counts by several different 
sources, although it might make more sense to discuss those specifics off list.


-- 
Alan Amesbury
University Information Security
http://umn.edu/lookup/amesbury


Current thread: