Educause Security Discussion mailing list archives

Re: NSF Data Management Plans

From: Joe St Sauver <joe () OREGON UOREGON EDU>
Date: Thu, 5 Aug 2010 14:31:33 -0700
Steve Brukbacher <sab2 () UWM EDU> mentioned (in connection with new required 
NSF data sharing plans):

#The changes are designed to address trends and needs in the modern era 
#of data-driven science. "Science is becoming data-intensive and 
#collaborative," notes Ed Seidel, acting assistant director for NSF's 
#Mathematical and Physical Sciences directorate. "Researchers from 
#numerous disciplines need to work together to attack complex problems; 
#openly sharing data will pave the way for researchers to communicate and 
#collaborate more effectively."
#
#We're looking at how to assist researchers with this.  Has anyone 
#established any security strategies related to this new requirement?

I could see a number of different aspects to that question. Are you 
primarily concerned about:

-- research dataset provenence and integrity (in a nutshell, how do I 
   know that the dataset I think I just retrieved is the one that I 
   think it is, and that it hasn't been accidentally or intentionally
   altered since it was created? Yes, you could checksum the file, but
   is that enough?)

-- dataset documentation (getting a dump of a dataset doesn't do much
   good if you don't know how the data was collected and coded,
   including any inherent limitations to the data, etc. -- in the bad
   old days when I was providing statistical research support for
   faculty members and grad students, I have to admit that I 
   occaisionally saw datasets received from offsite that suffered from 
   woefully incomplete and insufficient documentation (presumably
   because some researchers are like bad programmers, deferring
   documentation until they "get a couple of minutes," only to never
   have that quiet time actually turn up). 

-- or was it more a matter of insuring you simultaneously protect any 
   sensitive data elements (such as human subjects data) while also 
   meeting the NSF's new open access requirements (e.g., questions
   about how to handle data anonymization schemes, access control 
   and logging, or related sorts of things)?

-- other sites might be interested in monitoring data assets for
   abuse and misuse (conceptually imagine a dataset released 
   for non-profit research use (only), which subsequently gets
   commercially exploited without permission) -- obviously it
   can be tricky to find and prove these sort of things, although
   reportedly some information providers have been known to "salt"
   things like maps with harmless but non-existent features 
   they've made up -- if you have the bad luck to blindly copy
   the fictitious feature, well, they arguably have you dead to
   rights

-- potentially some research data might be export controlled, and
   some sites might want to insure that they don't inadvertently
   allow proscribed foreign nationals access to export controlled
   information

-- librarians and archivists take a unique long term view, worrying 
   about accessibility and usability of information assets decades or 
   even centuries in the future, and have been known to insist on
   multiple distributed copies of information assets for redundancy 
   and survivability in the event of adverse events (whether that's 
   fire, flood, institutions going out of business, people getting
   rid of their last 9 track tape drives or 8" floppy drives, spinning
   media crashing or non-archival magnetic media deteriorating over 
   time, etc.)

-- Or is your query specific to system and network security-related 
   datasets that your researchers may be working with? (If the later, 
   I'd mention that we'll be having the 2nd Data Driven Collaborative 
   Security Workshop for High Performance Networks later this month,
   and as you might expect from the title, methodological and 
   substantive data-driven collaborative sharing issues relating to 
   security data will likely be "center stage" during those sessions, 
   as they were for the first DDCSW last year, see 
   http://security.internet2.edu/ddcsw/ )

Anyhow, love to hear more about the specific areas related to this 
topic that you or others may be particularly interested in... I think
it's a fascinating (but potentially immense) topic, so narrowing in
on the particular aspects you're most interested in would probably
be a key first step.

Regards,

Joe St Sauver (joe () oregon uoregon edu or joe () internet2 edu)
Internet2 Security Programs Manager
http://www.uoregon.edu/~joe/
Current thread:

NSF Data Management Plans Steve Brukbacher (Aug 05)
- <Possible follow-ups>
- Re: NSF Data Management Plans Joe St Sauver (Aug 05)
  - Re: NSF Data Management Plans Steve Brukbacher (Aug 30)