Wireshark mailing list archives

Re: UTF8 vs. locale in error messages (bug 5715)

From: Guy Harris <guy () alum mit edu>
Date: Tue, 28 Jun 2011 10:54:04 -0700


On Jun 28, 2011, at 10:43 AM, Guy Harris wrote:

On Jun 28, 2011, at 10:27 AM, Guy Harris wrote:

     when putting them into a textual representation of the protocol tree or into columns or something else to be 
shown to humans, map them to UTF-8, with anything that can't be mapped to UTF-8 - including, if the encoding is 
putatively UTF-8, octet sequences that aren't valid UTF-8 sequences - shown as the Unicode replacement character 
U+FFFD;


...and, for "for display" conversions, we might want to convert control characters to "Control Pictures" symbols 
(0x0000 to 0x001F convert to 0x2400 to 0x241f: ␀, ␁, etc. through ␟; 0x007F converts to 0x2421, i.e. ␡ - in the font 
in which this message is being displayed to me, those have the control character abbreviations displayed in really 
really small letters, diagonally from upper left to lower right; unfortunately, I see nothing for C1 control 
characters).


        http://en.wikipedia.org/wiki/Template:Unicode_chart_Control_Pictures

That claims that this is "as of Unicode 6.0", so, if true, either they have a different name for control pictures for 
C1 control characters or there aren't any.  (I have no idea what those other symbols are doing in there.)

U+FFFD is often shown as a white question mark inside a black diamond:

        http://en.wikipedia.org/wiki/Specials_(Unicode_block)

Oh, and if we're going to be extremely completist, there are the EBCDIC control characters, for which there are not 
always control pictures; see table 5.1:

        ftp://kermit.columbia.edu/kermit/ucsterminal/control.txt

This was from 1998.  I don't know whether any of the proposals were accepted.
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request () wireshark org?subject=unsubscribe

Current thread:

Re: UTF8 vs. locale in error messages (bug 5715), (continued)
- - - Re: UTF8 vs. locale in error messages (bug 5715) Guy Harris (Jun 28)
    - Re: UTF8 vs. locale in error messages (bug 5715) Guy Harris (Jun 28)
    - Re: UTF8 vs. locale in error messages (bug 5715) Guy Harris (Jun 28)
    - Re: UTF8 vs. locale in error messages (bug 5715) Stig Bjørlykke (Jun 29)
    - Re: UTF8 vs. locale in error messages (bug 5715) Guy Harris (Jun 29)
  - Re: UTF8 vs. locale in error messages (bug 5715) Graham Bloice (Jun 28)
    - Re: UTF8 vs. locale in error messages (bug 5715) Guy Harris (Jun 28)
  - Re: UTF8 vs. locale in error messages (bug 5715) Stig Bjørlykke (Jun 28)
    - Re: UTF8 vs. locale in error messages (bug 5715) Guy Harris (Jun 28)
    - Re: UTF8 vs. locale in error messages (bug 5715) Guy Harris (Jun 28)
    - Re: UTF8 vs. locale in error messages (bug 5715) Guy Harris (Jun 28)
    - Re: UTF8 vs. locale in error messages (bug 5715) Guy Harris (Jun 28)
    - Re: UTF8 vs. locale in error messages (bug 5715) Stig Bjørlykke (Jun 28)
    - Re: UTF8 vs. locale in error messages (bug 5715) Guy Harris (Jun 28)
    - Re: UTF8 vs. locale in error messages (bug 5715) Stig Bjørlykke (Jun 28)
    - Re: UTF8 vs. locale in error messages (bug 5715) Graham Bloice (Jun 29)
    - Re: UTF8 vs. locale in error messages (bug 5715) Guy Harris (Jun 29)