Wireshark mailing list archives

Re: UTF8 vs. locale in error messages (bug 5715)


From: Guy Harris <guy () alum mit edu>
Date: Mon, 27 Jun 2011 17:58:35 -0700


On Jun 27, 2011, at 11:54 AM, Stig Bjørlykke wrote:

When looking at bug 5715 I found that we use both UTF8 (from file
names) and locale (from strerror()) in the error messages presented
from simple_dialog().  In vsimple_dialog() we convert all messages
with g_locale_to_utf8(), which will wrongly convert the file name
(like in the bug report).  When using Norwegian characters in the file
name the text in the dialog is empty.

I suspect this wouldn't be an issue on my machine, given that if, on my machine, g_locale_to_utf8() behaves differently 
from strcpy(), there's either a misconfiguration or a bug in g_locale_to_utf8():

        $ echo $LANG
        en_US.UTF-8

I.e., this issue should, modulo bugs, only show up in locales where the character encoding isn't UTF-8, meaning:

        1) UN*Xes where LANG etc. aren't set to a locale with UTF-8 as the encoding (are you seeing the issue with 
Norwegian characters on your system?  If so, what's the setting of LANG?);

        2) Windows, where "Unicode" generally means "UTF-16", and APIs that return strings encoded as sequences of 
octets rather than hexadectets probably return strings in the local code page.

Any ideas how we should fix this?  Convert all messages from
strerror() when putting the text into the error string and remove the
conversion in vsimple_dialog()?

I would say "yes", given that GTK+ uses UTF-8 as the string encoding for all GUI functions, and I think any other 
toolkit we might use as an alternative would also use some encoding of Unicode (UTF-8 or UTF-16, most likely).

We have about 240 calls to strerror().

...and, unfortunately, a variant that converts to UTF-8 and is API-compatible is non-trivial, as any version that 
allocates a buffer for the result of the conversion would leak memory we just globally replaced strerror() with 
ws_strerror().

(Of course, strerror() is also not thread-safe, so there might be other reasons to avoid routines with such an API; the 
latest shiniest Single UNIX Specification has strerror_r(), which takes a buffer that it fills in, which has its own 
issues (as in "how big a buffer do you need"?), and I don't know how many platforms have it.

But if you're doing enough calls to strerror() that throwing a mutex around strerror() in your wrapper causes 
performance problems, those performance problems are probably the least of your problems....)
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request () wireshark org?subject=unsubscribe


Current thread: