Wireshark mailing list archives

Re: UTF8 vs. locale in error messages (bug 5715)


From: Guy Harris <guy () alum mit edu>
Date: Tue, 28 Jun 2011 10:01:14 -0700


On Jun 28, 2011, at 3:22 AM, Jakub Zawadzki wrote:

Btw. I know that nowadays I'm the only one who uses non-utf locales on console,
but when we print on console (stdout/stderr) I think we should use strerror() from libc,
i.e. strerror() which don't recode message to utf-8.

It's more complicated than that.

There are many source of strings in the non-GUI output of the programs in the Wireshark suite:

        the message text itself - that's generally ASCII;

        file names - internally to those programs, those are in UTF-8;

        error strings for errno values and signal-name strings from signals - those might be in the current locale for 
strerror()/strsignal() and would be in UTF-8 with g_strerror()/g_strsignal();

        etc.

In addition, the non-GUI output of the program can be sent either to the terminal or to files.

Output to the terminal should be in whatever character set the terminal expects.  I'm not sure what would indicate the 
character set the terminal expects.  On my machine, the "terminal" is Terminal.app, and can handle UTF-8 output; on 
other UN*Xes, in the GUI, it's probably similar.  For consoles (which I'm using here to mean "no GUI, just the console 
of a workstation/personal computer") it might be less capable.  For real terminals, it's almost certainly less-capable; 
I'm not sure whether there's ever be a real serial-port terminal that handles UTF-8.  I don't know what the various 
terminal emulators for Windows, e.g. cmd.exe, do.

Output to files, whether it's the result of redirecting the standard output or error of a command-line program to a 
file, or of one of the "export to a text file" operations in Wireshark, or..., is another matter.  It might be that the 
character encoding should be the same as would be used on a terminal.

In any case, that means that using strerror() is probably not going to be sufficient to fix the problem.  What we might 
want to do is use UTF-8 everywhere we can, and, for non-GUI output, convert to the appropriate character encoding - 
whatever that might be - at the last minute.
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request () wireshark org?subject=unsubscribe


Current thread: