Wireshark mailing list archives

Re: No tvb_get for string-encoded numbers?


From: Evan Huus <eapache () gmail com>
Date: Fri, 4 Apr 2014 16:08:26 -0400

On Fri, Apr 4, 2014 at 4:04 PM, Guy Harris <guy () alum mit edu> wrote:

On Apr 4, 2014, at 7:30 AM, Hadriel Kaplan <hadriel.kaplan () oracle com> wrote:

I might be overlooking something, but I don't see a tvb_get_* function to get a uint8/16/32/64 that was encoded as a 
ascii or utf-8 string in the packet. Is there such a thing?

No.

I've occasionally also thought there should be such a routine.

Note, though, that, whilst tvb_get_guint8() and tvb_get_{n,le}tohXXX() can never fail, because every possible 
sequence of octets is a valid 2's complement integral value, routines to get a number encoded as a string *can* fail, 
e.g. 0123xyzw is not a valid number in bases 8, 10, or 16.

There are other cases where a tvb_get_ routine can return "you lose", e.g. tvb_get_string_enc() can fail if there are 
invalid octet sequences (about the only encodings I know of where *every* octet sequence is a valid string are some 
of the ISO 8859-n encodings), and at least some floating-point formats probably have invalid values (I guess an IEEE 
NaN is "valid", at least to the extent that if we try to format it it'll show up as "NaN", but if we try to do 
calculations with it we might get a floating-point exception.

Instead, it seems the dissectors that deal with string messages do a tvb_get_string_enc() or tvb_format_text(), and 
then a strtol() or atoi(). But in my way of thinking, the fact that it's in a string-encoded form in the tvb isn't 
that much different from it being encoded as little-endian vs. network-order.

Likewise, it's not clear if there's a way to define a protocol field that is encoded as a string in the packet but 
is internally a uint8/16/32/64 (e.g., for filtering purposes, val_string lookup, etc.). For example such that 
proto_tree_add_item() would work. Instead, it seems some dissectors use the returned strtol/atoi to then add the 
field to the tree as a true uint type, or add it as a FT_STRING field type.

One advantage of that is that, if the routine to fetch the value also adds an item to the protocol tree, it could, in 
the cases where the value is invalid, also add an expert item indicating that the value isn't valid.

And I'd like to see proto_tree_add_XXX_item() routines that add an item with a particular type *and* take a pointer 
argument and return the value for the item through that pointer; that could replace

        xxx = tvb_get_XXX();
        proto_tree_add_XXX(..., xxx);

combinations and

        xxx = tvb_get_XXX();
        proto_tree_add_item(...);       /* re-fetches the item value */

with

        proto_tree_add_XXX_item(..., &xxx);

That would be neat, though we would have to be careful with our
fast-path handling, since we should return the value regardless.

And if we had common functions handle ascii and utf-8 string-encoded numbers, they could avoid creating temporary 
strings as well.

The only real encoding issues are "ASCII superset" (so that "0123456789", for example, is encoded the same as in 
ASCII) vs. "2 or more bytes per ASCII character" (e.g., UCS-2, UTF-16, and UCS-4) vs. "one of those 7-bit GSM 
character encodings" vs. "EBCDIC".
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request () wireshark org?subject=unsubscribe
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request () wireshark org?subject=unsubscribe


Current thread: