Wireshark mailing list archives

Re: tvb_get_string_enc() doesn't always return valid UTF-8

From: Jakub Zawadzki <darkjames-ws () darkjames pl>
Date: Mon, 20 Jan 2014 20:52:03 +0100

Hi,

On Mon, Jan 20, 2014 at 06:22:37PM +0100, Martin Kaiser wrote:

if I have a tvbuff that starts with 0x86 and I call

a = tvb_get_string_enc(tvb, 0, ENC_ASCII)
proto_tree_add_string(..., a);

I can trigger the DISSECTOR_ASSERT since a is not a valid unicode string.

Comments in the code suggest that tvb_get_string() should replace
chars>=0x80 with the unicode replacement char, which is two bytes long.
This would look like
[...] 

The resulting string would still contain len+1 chars but not necessarily
len+1 bytes. Would that be a problem, i.e. is it ok to do sth like

b = tvb_get_string(NULL, tvb, offset, len_b);
copy_of_b = g_malloc(len_b+1);
memcpy(copy_of_b, b, len_b+1);


If you just want to duplicate string you should definitely use g_strdup() ;-)

If that should work, we'd need a separate function for get string &
replace 8bit chars.


I think we don't need, tvb_get_string_enc(, ENC_ASCII) should return valid UTF-8 string,
and all callers assuming it's just 1:1 copy are buggy.

Maybe we should add: ENC_STRING_DONT_CONVERT, if people want just to
have NUL terminated string?


btw. I really wonder if current way of using a replacement character is good one.
Maybe we should escape it to some: \x86.
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request () wireshark org?subject=unsubscribe

Current thread:

Re: tvb_get_string_enc() doesn't always return valid UTF-8, (continued)
- - - Re: tvb_get_string_enc() doesn't always return valid UTF-8 Guy Harris (Jan 20)
    - Re: tvb_get_string_enc() doesn't always return valid UTF-8 Evan Huus (Jan 21)
    - Re: tvb_get_string_enc() doesn't always return valid UTF-8 Jakub Zawadzki (Jan 26)
    - Re: tvb_get_string_enc() doesn't always return valid UTF-8 Guy Harris (Jan 26)
    - Re: tvb_get_string_enc() doesn't always return valid UTF-8 Evan Huus (Jan 27)
    - Re: tvb_get_string_enc() doesn't always return valid UTF-8 Evan Huus (Jan 29)
    - Re: tvb_get_string_enc() doesn't always return valid UTF-8 Guy Harris (Jan 26)
    - Re: tvb_get_string_enc() doesn't always return valid UTF-8 Evan Huus (Jan 26)
    - Re: tvb_get_string_enc() doesn't always return valid UTF-8 Guy Harris (Jan 26)
    - Re: tvb_get_string_enc() doesn't always return valid UTF-8 Evan Huus (Jan 26)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Jakub Zawadzki (Jan 20)
  - Re: tvb_get_string_enc() doesn't always return valid UTF-8 Evan Huus (Jan 20)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Guy Harris (Jan 20)
  - Re: tvb_get_string_enc() doesn't always return valid UTF-8 Evan Huus (Jan 20)