Educause Security Discussion mailing list archives

Re: Has anyone looked at digital archiving?


From: Valdis Kletnieks <Valdis.Kletnieks () VT EDU>
Date: Fri, 14 Apr 2006 17:46:18 -0400

On Wed, 12 Apr 2006 18:14:00 EDT, stanislav shalunov said:

* The only format that exists today that has a claim on having been
  around for 40 years is plain text.  Even there, the changes have
  been no less significant proportionally to the complexity of the
  format than in other formats; it's just that the format is so
  trivial (just a sequence of letters, each represented with a
  fixed-sized block of bits, with special values for space and newline
  and so forth) that recoding is just as trivial (character set
  changes, byte size changes, newline and other special character
  encoding, etc.).

Remember - 40 years ago there wasn't much that talked ASCII.  30 years ago,
so many different storage formats for ASCII existed (6 6-bit, or 4 9-bit, or
5 7-bit in a 36 bit word, 8-bit bytes, and the totally bizarre variable-width
byte of the Dec KL 10/20 processors) that the Internet RFCs referred to 'octets'
rather than 'bytes'.

And woe unto them who have data currently stored in iso8859-* format, without
any tagging of which '*' charset.  iso-2022 encoding is only slightly less
messy..

Transliterating "plain text" is a lot harder than you might think, once you
realize that there exists "plain text" that isn't ANSI-Standard 7-bit ASCII....

Attachment: _bin
Description:


Current thread: