Educause Security Discussion mailing list archives

Re: Has anyone looked at digital archiving?


From: Graham Toal <gtoal () UTPA EDU>
Date: Thu, 13 Apr 2006 08:32:05 -0500

If for some strange and irrational reason it is desirable to 
keep archival documents exclusively electronic, then one 
would have to keep multiple copies, keep them in constant 
rotation, updating media and formats and never letting 
anything sit for more than a few years.
Keeping all the versions (e.g., Word 6, Word 7, Word 8, ... Word 55,
etc.) is probably advisable.  Using the simplest possible 

Just FYI, I still have every email I ever sent going back to the
day I wrote my first email program in 1976.  There was a tough
period between 76 and 81 when I had to rely on my University's
mainframe for storage, but as soon as personal machines became
available, disk space has increased so fast *every* year that
it has *always* been possible to copy the entire contents of
my hard disk to the next hard disk.

I don't think it's an unreasonable strategy to store on hard
disk, and transfer to new disks every time there's say a
doubling of capacity. (with suitable geographical redundancy etc)

Another somewhat off-the-wall solution to keeping old media
alive is to preserve the runtime environment 100% and rely
on emulation to invoke the previous run-time.  Over a long
period this may create a chain of emulators, such as some
PDP7 code I have which is emulated by an ICL7502 which is
emulated by an Interdata32 which is emulated by any Linux
with C, such as my handheld "GP2X" videogame machine that I
happen to have in my pocket right now which has more power
than *every* computer that Edinburgh University owned in
1976 put together :-)  Yes, machines get larger and faster
at the same pace as disk storage, if not moreso.

A paper dump is OK for text but not for code or just about
any other form of data.  And OCR will probably *never* be
good, just like voice recognition and handwriting recognition
never will.  They're hard problems that aren't helped by
bigger faster machines.  They're only going to be solved by
smarter programmers, and in today's environment I have to
say that I think the state of the art in coding is actually
regressing.

By the way, this is an issue I think about a lot because one
of my projects is keeping alive software from the 60's.  There's
a gap in the early days where original software was not preserved
on removable media.  After about 1990, it no longer was a problem
as just about everything ever written has remained around
*somewhere* online.  But we're at great risk of losing our
computing heritage if we don't preserve these old sources
now ... you'd be amazed how many old paper tapes and DECtapes
turn up in people's attics if you ask them to find their old
code.  But many of these early pioneers won't be with us for
much longer, so the archiving effort has to start now.

Sorry to drift off topic, this is quite a big issue for me...

IMHO.  $0.02.  IANALB.  etc.

Graham
PS Yes, we have probably had our biggest successes recovering old
source code by simply retyping paper listings.  OCR at the present
*does not work* well enough to do it automatically, we have tried
very hard with lots of programs.  We are however archiving high-res
scans just in case OCR improves enough some time in the future.
http://history.dcs.ed.ac.uk/

Current thread: