Wireshark mailing list archives
tvb_get_string_enc() doesn't always return valid UTF-8
From: Martin Kaiser <lists () kaiser cx>
Date: Mon, 20 Jan 2014 18:22:37 +0100
Hi, if I have a tvbuff that starts with 0x86 and I call a = tvb_get_string_enc(tvb, 0, ENC_ASCII) proto_tree_add_string(..., a); I can trigger the DISSECTOR_ASSERT since a is not a valid unicode string. Comments in the code suggest that tvb_get_string() should replace chars>=0x80 with the unicode replacement char, which is two bytes long. This would look like guint8 * tvb_get_string(wmem_allocator_t *scope, tvbuff_t *tvb, gint offset, gint length) { wmem_strbuf_t *str; tvb_ensure_bytes_exist(tvb, offset, length); str = wmem_strbuf_new(scope, ""); while (length > 0) { guint8 ch = tvb_get_guint8(tvb, offset); if (ch < 0x80) wmem_strbuf_append_c(str, ch); else { wmem_strbuf_append_unichar(str, UNREPL); } offset++; length--; } wmem_strbuf_append_c(str, '\0'); return (guint8 *) wmem_strbuf_get_str(str); } The resulting string would still contain len+1 chars but not necessarily len+1 bytes. Would that be a problem, i.e. is it ok to do sth like b = tvb_get_string(NULL, tvb, offset, len_b); copy_of_b = g_malloc(len_b+1); memcpy(copy_of_b, b, len_b+1); ? If that should work, we'd need a separate function for get string & replace 8bit chars. Thoughts? Martin ___________________________________________________________________________ Sent via: Wireshark-dev mailing list <wireshark-dev () wireshark org> Archives: http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-request () wireshark org?subject=unsubscribe
Current thread:
- tvb_get_string_enc() doesn't always return valid UTF-8 Martin Kaiser (Jan 20)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Evan Huus (Jan 20)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Martin Kaiser (Jan 20)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Guy Harris (Jan 20)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Evan Huus (Jan 20)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Guy Harris (Jan 20)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Evan Huus (Jan 21)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Jakub Zawadzki (Jan 26)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Guy Harris (Jan 26)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Evan Huus (Jan 27)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Evan Huus (Jan 29)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Martin Kaiser (Jan 20)
- Re: tvb_get_string_enc() doesn't always return valid UTF-8 Evan Huus (Jan 20)