Wireshark mailing list archives
Re: UTF8 vs. locale in error messages (bug 5715)
From: Guy Harris <guy () alum mit edu>
Date: Mon, 27 Jun 2011 17:58:35 -0700
On Jun 27, 2011, at 11:54 AM, Stig Bjørlykke wrote:
When looking at bug 5715 I found that we use both UTF8 (from file names) and locale (from strerror()) in the error messages presented from simple_dialog(). In vsimple_dialog() we convert all messages with g_locale_to_utf8(), which will wrongly convert the file name (like in the bug report). When using Norwegian characters in the file name the text in the dialog is empty.
I suspect this wouldn't be an issue on my machine, given that if, on my machine, g_locale_to_utf8() behaves differently from strcpy(), there's either a misconfiguration or a bug in g_locale_to_utf8(): $ echo $LANG en_US.UTF-8 I.e., this issue should, modulo bugs, only show up in locales where the character encoding isn't UTF-8, meaning: 1) UN*Xes where LANG etc. aren't set to a locale with UTF-8 as the encoding (are you seeing the issue with Norwegian characters on your system? If so, what's the setting of LANG?); 2) Windows, where "Unicode" generally means "UTF-16", and APIs that return strings encoded as sequences of octets rather than hexadectets probably return strings in the local code page.
Any ideas how we should fix this? Convert all messages from strerror() when putting the text into the error string and remove the conversion in vsimple_dialog()?
I would say "yes", given that GTK+ uses UTF-8 as the string encoding for all GUI functions, and I think any other toolkit we might use as an alternative would also use some encoding of Unicode (UTF-8 or UTF-16, most likely).
We have about 240 calls to strerror().
...and, unfortunately, a variant that converts to UTF-8 and is API-compatible is non-trivial, as any version that allocates a buffer for the result of the conversion would leak memory we just globally replaced strerror() with ws_strerror(). (Of course, strerror() is also not thread-safe, so there might be other reasons to avoid routines with such an API; the latest shiniest Single UNIX Specification has strerror_r(), which takes a buffer that it fills in, which has its own issues (as in "how big a buffer do you need"?), and I don't know how many platforms have it. But if you're doing enough calls to strerror() that throwing a mutex around strerror() in your wrapper causes performance problems, those performance problems are probably the least of your problems....) ___________________________________________________________________________ Sent via: Wireshark-dev mailing list <wireshark-dev () wireshark org> Archives: http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-request () wireshark org?subject=unsubscribe
Current thread:
- UTF8 vs. locale in error messages (bug 5715) Stig Bjørlykke (Jun 27)
- Re: UTF8 vs. locale in error messages (bug 5715) Guy Harris (Jun 27)
- Re: UTF8 vs. locale in error messages (bug 5715) Jakub Zawadzki (Jun 28)
- Re: UTF8 vs. locale in error messages (bug 5715) Stig Bjørlykke (Jun 28)
- Re: UTF8 vs. locale in error messages (bug 5715) Jakub Zawadzki (Jun 28)
- Re: UTF8 vs. locale in error messages (bug 5715) Stig Bjørlykke (Jun 28)
- Re: UTF8 vs. locale in error messages (bug 5715) Guy Harris (Jun 28)
- Re: UTF8 vs. locale in error messages (bug 5715) Guy Harris (Jun 28)
- Re: UTF8 vs. locale in error messages (bug 5715) Guy Harris (Jun 28)
- Re: UTF8 vs. locale in error messages (bug 5715) Stig Bjørlykke (Jun 29)
- Re: UTF8 vs. locale in error messages (bug 5715) Guy Harris (Jun 29)
- Re: UTF8 vs. locale in error messages (bug 5715) Jakub Zawadzki (Jun 28)
- Re: UTF8 vs. locale in error messages (bug 5715) Guy Harris (Jun 27)