oss-sec mailing list archives

Re: escaping terminal control characters (was Re: backdoor in upstream xz/liblzma leading to ssh server compromise)


From: Steffen Nurpmeso <steffen () sdaoden eu>
Date: Fri, 03 May 2024 01:03:58 +0200

Steffen Nurpmeso wrote in
 <20240502223912.08A3RYp4@steffen%sdaoden.eu>:
 |Sam James wrote in
 | <87o79nlwxl.fsf () gentoo org>:
 ||Solar Designer <solar () openwall com> writes:
 ||> On Wed, Apr 03, 2024 at 11:03:17AM +1100, Matthew Fernandez wrote:
 ||>> On 4/1/24 08:30, Solar Designer wrote:
 ||>>>On Sat, Mar 30, 2024 at 04:37:48PM -0000, Tavis Ormandy wrote:
 ...
 ||>> Is the currently accepted wisdom that any application printing to 
 ||>> stdout/stderr should take steps to avoid control characters in the 
 ||>> output?
 ||>
 ||> First, let's limit this to cases where the control characters come from
 ||> potentially untrusted input to the program.  Obviously, many programs
 ||> generate terminal escapes on their own (usually via a library), for
 ||> their intended functionality (colorized listings, TUIs, etc.)  Some
 ||> programs pass potential control characters from their trusted input.
 ||>
 ||> Second, I think no, there isn't currently an established opinion on
 ||> whether programs should perform such filtering of untrusted input.
 ||
 ||Lasse has put up an initial implementation for xz:
 ||https://github.com/tukaani-project/xz/pull/118.
 ||
 ||Comments are welcome. It was a TODO from a long time ago ;)
 ||
 ||We're not sure how much is overkill (or underkill) for this, especially
 ||given it gets harder when Unicode is involved.
 ||
 ||> [...]
 |
 |For this purpose there exists the (very very expensive)
 |
 |  https://man.netbsd.org/vis.3
 |
 |series of functions.  Or you do something like this, where "isuni"
 |gives you "this is a UTF-8 nl_langinfo(CODESET)".
 ...
 |            if((wc & ~S(wchar_t,037)) == 0)
 |               wc = isuni ? 0x2400 | wc : '?';
 |            else if(wc == 0177)
 |               wc = isuni ? 0x2421 : '?';
 |            else
 |               wc = isuni ? 0x2426 : '?';
 ...
 |This can be made better (for example the above requires "wc" to be
 |an actual ISO 10646 codepoint, which ISO C etc), but the key point
 |is that the Unicode standard gives you everything needed to
 |properly mask these sequences, from its very beginning in 1993:
 |
 |  2400..2424    ; 1.1 #  [37] SYMBOL FOR NULL..SYMBOL FOR NEWLINE
 |
 |and i have yet to see a font which does not support those.
 |(Whether and how users can make sense of them totally aside.)
 |Of course you loose the copy&paste capability.

While here please let me back the not yet gracefully supported
shell escape mechanism $''.
The current approach seems to be to be as atomic as possible:

  # touch $(printf 'a\rb\tc\a')
  # ll
->
  -rw-r----- 1 steffen steffen   0 May  3 00:46 'c'$'\a'
  -rw-r----- 1 steffen steffen   0 May  3 00:46 'a'$'\r''b'

(GNU coreutils).  Isn't that just terrible?  In (the development
version of) my mailer tab-completion leads to

  #..mbox? /tmp/<TAB>
  $'a\rb'
  $'c\a'

which i find at least a little bit better.  (Do not even think
about looking in its implementation though, look ICU or what.)

And even though currently unsupported, it should be said that with
"grapheme clusters" and in general things like ligatures and other
such language-specific constructs which need to look at
surroundings -- in general interfaces like towupper() etc are not
useful in global context, entire sentences have to be looked at as
a whole due to this! --, shell quotes should be extended to the
largest possible range possible.  Ie, all the iconv(3)s that are
currently used because of a lack of other interfaces should be
enabled to see the longest possible (sub)string, not the most
atomar, as seen above.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


Current thread: