Wireshark mailing list archives

Re: No tvb_get for string-encoded numbers?


From: Guy Harris <guy () alum mit edu>
Date: Fri, 4 Apr 2014 13:04:40 -0700


On Apr 4, 2014, at 7:30 AM, Hadriel Kaplan <hadriel.kaplan () oracle com> wrote:

I might be overlooking something, but I don’t see a tvb_get_* function to get a uint8/16/32/64 that was encoded as a 
ascii or utf-8 string in the packet. Is there such a thing?

No.

I've occasionally also thought there should be such a routine.

Note, though, that, whilst tvb_get_guint8() and tvb_get_{n,le}tohXXX() can never fail, because every possible sequence 
of octets is a valid 2's complement integral value, routines to get a number encoded as a string *can* fail, e.g. 
0123xyzw is not a valid number in bases 8, 10, or 16.

There are other cases where a tvb_get_ routine can return "you lose", e.g. tvb_get_string_enc() can fail if there are 
invalid octet sequences (about the only encodings I know of where *every* octet sequence is a valid string are some of 
the ISO 8859-n encodings), and at least some floating-point formats probably have invalid values (I guess an IEEE NaN 
is "valid", at least to the extent that if we try to format it it'll show up as "NaN", but if we try to do calculations 
with it we might get a floating-point exception.

Instead, it seems the dissectors that deal with string messages do a tvb_get_string_enc() or tvb_format_text(), and 
then a strtol() or atoi(). But in my way of thinking, the fact that it’s in a string-encoded form in the tvb isn’t 
that much different from it being encoded as little-endian vs. network-order.

Likewise, it’s not clear if there’s a way to define a protocol field that is encoded as a string in the packet but is 
internally a uint8/16/32/64 (e.g., for filtering purposes, val_string lookup, etc.). For example such that 
proto_tree_add_item() would work. Instead, it seems some dissectors use the returned strtol/atoi to then add the 
field to the tree as a true uint type, or add it as a FT_STRING field type.

One advantage of that is that, if the routine to fetch the value also adds an item to the protocol tree, it could, in 
the cases where the value is invalid, also add an expert item indicating that the value isn't valid.

And I'd like to see proto_tree_add_XXX_item() routines that add an item with a particular type *and* take a pointer 
argument and return the value for the item through that pointer; that could replace

        xxx = tvb_get_XXX();
        proto_tree_add_XXX(..., xxx);

combinations and

        xxx = tvb_get_XXX();
        proto_tree_add_item(...);       /* re-fetches the item value */

with

        proto_tree_add_XXX_item(..., &xxx);

And if we had common functions handle ascii and utf-8 string-encoded numbers, they could avoid creating temporary 
strings as well.

The only real encoding issues are "ASCII superset" (so that "0123456789", for example, is encoded the same as in ASCII) 
vs. "2 or more bytes per ASCII character" (e.g., UCS-2, UTF-16, and UCS-4) vs. "one of those 7-bit GSM character 
encodings" vs. "EBCDIC".
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request () wireshark org?subject=unsubscribe


Current thread: