Bugtraq mailing list archives

Re: Fwd: CERT Advisory CA-2000-02


From: lbudney-lists-bugtraq () NB NET (Len Budney)
Date: Tue, 8 Feb 2000 08:24:44 -0500


Byron Alley <liondios () UVIC CA> wrote:

Some web sites use an implementation based on this idea of a subset
of HTML.  You don't even need to use real HTML - just take the most
useful functions, like bold, italics - and build a sub-language.
In at least one case I recall, a site used a format with []'s: [B]
instead of <B>, etc.

Sigh. Sounds like some people are reinventing the wheel, badly. HTML
is already a dialect of SGML; there are even DTDs out there which
conform pretty closely to the nightmare which is handled by Netscape
and IE. SGML was _designed_ for creating customized-syntax markup.

Using existing technology, you can easily document the allowed subset of
HTML tags, by modifying one of those DTDs. Then you can use parsers to
validate compliance automatically, normalizers to fill in missing end
tags, and filters to do things like reformatting and indexing.

This way you can safely remove any kind of tags, translate >'s to &gt;
entities, etc.

SGML parsers are never confused about when '<' begins a tag, and when
it is an (illegal) bare character. The normalizers I mentioned above
can automatically fix this and many other errors, such as:

 &   -> &amp;
  -> &lt;
 <   -> &gt;
(c)  -> &#169                   # The ASCII symbol, not the glyph I used

It can even fix those annoying dashes---the ones which Netscape and lynx
ignores---and the fancy `quotes' which things like FrontPage use, also
ignored by Netscape and lynx.

Naive users may not even know HTML anyways, and advanced users will
find it intuitive.

Teaching them (a subset of) the right way is just as easy, and less
wasteful.

It's questionable whether there is real usefulness in allowing a full
range of HTML tags.  This solution fits.

Agreed. But why not use existing, 20-year-old technology, instead of
rolling your own clunky syntax for your own ad hoc parsers?

Len.

PS Help stamp out   --- or help teach lynx to do something with it!

PPS I wonder how this post looks in Netscape?


Current thread: