Wireshark mailing list archives

Re: tvb_get_string_enc() doesn't always return valid UTF-8


From: Evan Huus <eapache () gmail com>
Date: Sun, 26 Jan 2014 18:32:18 -0500

On Sun, Jan 26, 2014 at 5:43 PM, Guy Harris <guy () alum mit edu> wrote:

On Jan 26, 2014, at 2:32 PM, Evan Huus <eapache () gmail com> wrote:

OK. I just meant that since tvb_get_string() is currently ASCII, a
dumb search and replace will let us make the API change now without
any regressions. We can then audit calls that could be incorrect.

I apologize - I misparsed your question as "why would dumb search-and-replace of tvb_get_string with 
tvb_get_string_enc and ENC_ASCII be an easy way to make (part of) the API transition?", i.e. that you were saying 
that dumb search-and-replace didn't sound like a good idea to you, rather than as "so does that mean that we should 
start by doing a dumb search-and-replace of tvb_get_string with tvb_get_string_enc and ENC_ASCII, as an easy way to 
make (part of) the API transition?"

Darn, your right I never even thought of that interpretation. I
apologize also; the inflection in my head made it unambiguous :P

(It might've been clearer as "in which case, is dumb search and replace", so that dummies like me read "in which 
case" as meaning "therefore" rather than "to which case are you referring where...")

And note that this is what happens between two native English
speakers. I don't even want to think about the problems a non-native
speaker might have with some of what I've written. Sigh.

Admittedly, it's easier to track which calls have been audited if we
do it gradually, so that's probably a better choice anyways.

Yes.  In some cases, ENC_ASCII may well be appropriate, if the protocol spec says that the string must be ASCII 
(i.e., ASCII, and not ISO 8859-n, and not MacWhatever, and not DOS or Windows code page whatever, and not 
PickYourEUCMultiByteCodeSet, and not UTF-8...), and ENC_ASCII as the result of a dumb search-and-replace is, absent a 
"this really means ASCII" comment, indistinguishable from ENC_ASCII as the result of looking in the protocol 
specification and seeing that they really mean ASCII.

___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request () wireshark org?subject=unsubscribe
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request () wireshark org?subject=unsubscribe


Current thread: