Wireshark mailing list archives

Re: How to print out string encoded data that contains nul characters?


From: Guy Harris <guy () alum mit edu>
Date: Wed, 9 Apr 2014 14:24:53 -0700


On Apr 9, 2014, at 2:06 PM, "John Dill" <John.Dill () greenfieldeng com> wrote:

I have several character data fields that happen to contain sections of non-ascii binary data including nul 
characters.  I'd like to get a string display that shows all of the characters according to the length of the field, 
i.e.

20 20 20 20 20 20 01 00 01 00 48 31 20 20 20 20

produces

"      \001\000\001\000H1    "

In proto.c, I see that all of the format_text calls use strlen(bytes) as the length.

case FT_STRING:
case FT_STRINGZ:
case FT_UINT_STRING:
        bytes = (guint8 *)fvalue_get(&fi->value);
        label_fill(label_str, hfinfo, format_text(bytes, strlen(bytes)));

What is the recommended way of creating a text string that uses the octal encoding '\xxx' for non-ASCII data 
including nul characters that uses the 'length' field of 'proto_tree_add_item'?

The right short-term way would be to use proto_tree_add_string_format_value() to add the field, and format the string's 
value yourself, using format_text() with a byte count rather than strlen().

The right long-term way is to modify Wireshark so that this works.  The way we handle strings should probably be 
changed so that we:

        store the raw string octets as a counted array, along with the string encoding;

        convert the octets from the encoding to UTF-8 *with invalid octets and sequences shown as escapes* when 
displaying the strings;

        convert the octets from the encoding to UTF-8 with invalid octets and sequences shown as Unicode REPLACEMENT 
CHARACTERS when making the string available for processing by other software (e.g., "-T fields", etc.) (or somehow 
saying "this isn't a valid string in this encoding);

        somehow arrange that strings with invalid octets or sequences are *always* unequal to any character string in 
packet-matching expressions (display/read filters, color "filters", etc.), and perhaps allow strings to be compared 
against octet sequences (e.g. "foobar.name = 20:20:20:20:20:20:01:00:01:00:48:31:20:20:20:20" matches the raw octets of 
the string), and use that with "Prepare As Filter" etc..

Alternatively, if they're *not* really character strings, display them as a set of subfields, with the text part shown 
as strings and the binary data shown as whatever it is, e.g.

        Frobozz text 1: {blanks}
        Frobozz count 1: 1
        Frobozz count 2: 1
        Frobozz text 2: H1{and more blanks}

or whatever it is.
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request () wireshark org?subject=unsubscribe


Current thread: