Wireshark mailing list archives
Re: tvb_get_string_enc() doesn't always return valid UTF-8
From: Guy Harris <guy () alum mit edu>
Date: Mon, 20 Jan 2014 17:27:24 -0800
On Jan 20, 2014, at 1:49 PM, Martin Kaiser <lists () kaiser cx> wrote:
I committed the change to tvb_get_string() in r54864.
I've changed that *not* to map bytes with the 8th bit set to REPLACEMENT CHARACTER for UTF-8 strings. For UTF-8 strings, we need to do a more complicated check and map invalid octet sequences to REPLACEMENT CHARACTER. (We also need to do some more stuff for UCS-2, UTF-16, and UCS-4.) tvb_get_string() still treats the string as ASCII.
I'll have a look at tvb_get_stringz() tomorrow.
I've added that (with the same change *not* to do it for UTF-8 strings). tvb_get_stringz() treats the string as ASCII. ___________________________________________________________________________ Sent via: Wireshark-dev mailing list <wireshark-dev () wireshark org> Archives: http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-request () wireshark org?subject=unsubscribe
Current thread:
- tvb_get_string_enc() doesn't always return valid UTF-8 Martin Kaiser (Jan 20)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Evan Huus (Jan 20)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Martin Kaiser (Jan 20)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Guy Harris (Jan 20)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Evan Huus (Jan 20)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Guy Harris (Jan 20)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Evan Huus (Jan 21)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Jakub Zawadzki (Jan 26)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Guy Harris (Jan 26)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Evan Huus (Jan 27)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Evan Huus (Jan 29)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Martin Kaiser (Jan 20)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Evan Huus (Jan 20)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Guy Harris (Jan 26)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Evan Huus (Jan 26)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Guy Harris (Jan 26)