Educause Security Discussion mailing list archives
Re: Has anyone looked at digital archiving?
From: Brad Judy <Brad.Judy () COLORADO EDU>
Date: Thu, 13 Apr 2006 12:52:53 -0600
It's funny that you mention the Rosetta Stone since the Rosetta Project (http://www.rosettaproject.org/) was faced with the challenge of a near-permanent (i.e. thousands of years) archive of all of the languages of the world. They selected micro-engraving in a metal disk since optical magnification will likely be a technology that any society would have or create. It would be interesting to hear what they are doing with their electronic language database. Of course, I doubt any of us have that kind of retention goal. :) Brad Judy
-----Original Message----- From: Stewart, Ian [mailto:istewart () UMASSP EDU] Sent: Thursday, April 13, 2006 8:06 AM To: SECURITY () LISTSERV EDUCAUSE EDU Subject: Re: [SECURITY] Has anyone looked at digital archiving? Don't forget the Rosetta Stone and film. -----Original Message----- From: stanislav shalunov [mailto:shalunov () INTERNET2 EDU] Sent: Wednesday, April 12, 2006 6:14 PM To: SECURITY () LISTSERV EDUCAUSE EDU Subject: Re: [SECURITY] Has anyone looked at digital archiving? Jim, Just some random notes first: * Digital signatures with today's security levels will likely be quite useless in 40 years. * Error-correcting codes are a storage space optimization technique (you can always just have multiple copies), and storage space typically is not a problem, so they have little or nothing to contribute here. * No electronic storage medium is known to physically retain the bits for 40 years. CD-Rs last for 2--5 years after being burned. DVD-Rs are shorter-lived. Tapes vary widely and depend on the tape type. Regular hard disks might last about 10 years on the shelf. * The only format that exists today that has a claim on having been around for 40 years is plain text. Even there, the changes have been no less significant proportionally to the complexity of the format than in other formats; it's just that the format is so trivial (just a sequence of letters, each represented with a fixed-sized block of bits, with special values for space and newline and so forth) that recoding is just as trivial (character set changes, byte size changes, newline and other special character encoding, etc.). My best solution being able to read a Word document in 40 years would be to print it out in a few copies on a black-and-white laser printer making sure the paper is alkaline (you can test that with a $5 device---I use something called Abbey pH pen, which is just a marker with a solution of chlorophenol red instead of a dye) and the fusing is done at high enough temperature and the finish is compatible (that's easy to check with a regular eraser: if adhesion is poor, you'll be able to make the characters fainter or even erase parts of them; with good adhesion, an eraser will have no effect until it starts ripping paper). Then store it in your library and in another library. A simple and cost-effective way to store a collection of documents in another good library (but only assuming the documents are not meant to be highly proprietary) is to register your copyright and submit a bound copy of the collection to the Library of Congress (the fee for indefinite storage is about $30---a real bargain). If for some strange and irrational reason it is desirable to keep archival documents exclusively electronic, then one would have to keep multiple copies, keep them in constant rotation, updating media and formats and never letting anything sit for more than a few years. Keeping all the versions (e.g., Word 6, Word 7, Word 8, ... Word 55, etc.) is probably advisable. Using the simplest possible formats is also good. Plain text beats anything else for simplicity. The worst would be proprietary, frequently and incompatibly changing formats such as DOC or PDF. Things like DVI, HTML, and XML would be in the middle. I would not trust PDF/A at all (too young, far too complex, and not at all implemented). For integrity checking, one might compute one-way cryptographically secure hashes using the strongest technology today (so, SHA512 for now and probably something else, perhaps substantially algorithmically different, 20 years from now) and keep files of these hashes around for each directory. I'd really, really want to print all these out, but conceptually, one might compute a SHA512 on those files and keep higher level files around and so forth and only keep secure copies (written down, printed out, memorized, whatever) of a few topmost levels of the hierarchy. -- Stanislav Shalunov http://www.internet2.edu/~shalunov/ Just my 0.086g of Ag.
Current thread:
- Re: Has anyone looked at digital archiving?, (continued)
- Re: Has anyone looked at digital archiving? stanislav shalunov (Apr 12)
- Re: Has anyone looked at digital archiving? stanislav shalunov (Apr 12)
- Re: Has anyone looked at digital archiving? Graham Toal (Apr 13)
- Re: Has anyone looked at digital archiving? Graham Toal (Apr 13)
- Re: Has anyone looked at digital archiving? Parker, Ron (Apr 13)
- Re: Has anyone looked at digital archiving? Stewart, Ian (Apr 13)
- Re: Has anyone looked at digital archiving? David Gillett (Apr 13)
- Re: Has anyone looked at digital archiving? stanislav shalunov (Apr 13)
- Re: Has anyone looked at digital archiving? Cal Frye (Apr 13)
- Re: Has anyone looked at digital archiving? Graham Toal (Apr 13)
- Re: Has anyone looked at digital archiving? Brad Judy (Apr 13)
- Re: Has anyone looked at digital archiving? Valdis Kletnieks (Apr 14)