Wireshark mailing list archives
Re: How to print out string encoded data that contains nul characters?
From: Guy Harris <guy () alum mit edu>
Date: Wed, 9 Apr 2014 14:24:53 -0700
On Apr 9, 2014, at 2:06 PM, "John Dill" <John.Dill () greenfieldeng com> wrote:
I have several character data fields that happen to contain sections of non-ascii binary data including nul characters. I'd like to get a string display that shows all of the characters according to the length of the field, i.e. 20 20 20 20 20 20 01 00 01 00 48 31 20 20 20 20 produces " \001\000\001\000H1 " In proto.c, I see that all of the format_text calls use strlen(bytes) as the length. case FT_STRING: case FT_STRINGZ: case FT_UINT_STRING: bytes = (guint8 *)fvalue_get(&fi->value); label_fill(label_str, hfinfo, format_text(bytes, strlen(bytes))); What is the recommended way of creating a text string that uses the octal encoding '\xxx' for non-ASCII data including nul characters that uses the 'length' field of 'proto_tree_add_item'?
The right short-term way would be to use proto_tree_add_string_format_value() to add the field, and format the string's value yourself, using format_text() with a byte count rather than strlen(). The right long-term way is to modify Wireshark so that this works. The way we handle strings should probably be changed so that we: store the raw string octets as a counted array, along with the string encoding; convert the octets from the encoding to UTF-8 *with invalid octets and sequences shown as escapes* when displaying the strings; convert the octets from the encoding to UTF-8 with invalid octets and sequences shown as Unicode REPLACEMENT CHARACTERS when making the string available for processing by other software (e.g., "-T fields", etc.) (or somehow saying "this isn't a valid string in this encoding); somehow arrange that strings with invalid octets or sequences are *always* unequal to any character string in packet-matching expressions (display/read filters, color "filters", etc.), and perhaps allow strings to be compared against octet sequences (e.g. "foobar.name = 20:20:20:20:20:20:01:00:01:00:48:31:20:20:20:20" matches the raw octets of the string), and use that with "Prepare As Filter" etc.. Alternatively, if they're *not* really character strings, display them as a set of subfields, with the text part shown as strings and the binary data shown as whatever it is, e.g. Frobozz text 1: {blanks} Frobozz count 1: 1 Frobozz count 2: 1 Frobozz text 2: H1{and more blanks} or whatever it is. ___________________________________________________________________________ Sent via: Wireshark-dev mailing list <wireshark-dev () wireshark org> Archives: http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-request () wireshark org?subject=unsubscribe
Current thread:
- How to print out string encoded data that contains nul characters? John Dill (Apr 09)
- Re: How to print out string encoded data that contains nul characters? Guy Harris (Apr 09)