BreachExchange mailing list archives

Modelling data breaches and collecting data

From: Luther Martin <martin () voltage com>
Date: Thu, 28 Jan 2010 18:00:28 -0800

I recently got a few minutes to crunch some more of the data about data breaches and found something interesting. I've 
mentioned before how the size of breaches appears to follow a lognormal distribution. What I recently found is that the 
fit gets better as you go back a few years into the past. So the fit is very good for 2006, good, but not quite as good 
for 2007, etc. 

My theory is that this is due to the way in which the data is collected. I'd guess that lots of the big breaches are 
widely reported, but the smaller breaches that might only involve a few records only show up later, maybe after someone 
files a FOIA request to get information about all the breaches that took place in a particular state, for example. 

If this is true, I'd expect to see a better and better fit to the model over time as more data is collected.

Any thought on that?
_______________________________________________
Dataloss-discuss Mailing List (dataloss-discuss () datalossdb org)
Archived at http://seclists.org/dataloss/

Get business, compliance, IT and security staff on the same page with
CREDANT Technologies: The Shortcut Guide to Understanding Data Protection
from Four Critical Perspectives. The eBook begins with considerations
important to executives and business leaders.
http://www.credant.com/campaigns/ebook-chpt-one-web.php

Current thread:

Blue Cross Blue Shield of TN data breach Henry Brown (Jan 11)
- Re: Blue Cross Blue Shield of TN data breach Henry Brown (Jan 28)
  - Modelling data breaches and collecting data Luther Martin (Jan 29)