Wireshark mailing list archives
Re: Request for RFC regarding string handling
From: Evan Huus <eapache () gmail com>
Date: Tue, 29 Oct 2013 10:46:59 -0400
On Mon, Oct 28, 2013 at 8:03 PM, Ed Beroset <beroset () mindspring com> wrote:
Also, if we make the possibly rash assumption that Unicode is the superset, perhaps we can regularize the addition of new renderings by requiring conversions to and from Unicode and routines that can create an array of pointers (or maybe offsets) of encoding errors in the encoded version of the string.
I think we more-or-less have to take Unicode as the superset, because AFAIK none of the UI toolkits available will render anything else. According to what's been gestating in my brain, the outstanding questions (in order they probably need to be answered) are: 1. How do we handle valid but non-printable characters in strings? We currently have a mishmash of different C-style escapes, replacement characters, and "nothing" (which is really "whatever our UI toolkit does"). Should we pick one? Should we make it a user option? Should it be dependent on the context? On the protocol? On the field? Should our replacement character *always* be U+FFFD (the unicode replacement character) or should we also permit using - or . or any other character that might be useful? 2. How do we handle "broken" strings (eg claim to be UTF8 but don't follow UTF8 encoding rules)? We currently have a mix of assertions and expert info and "nothing" (again meaning "whatever our UI toolkit does"). It would be useful to decode as much as possible, and annotate errors, etc but that becomes almost worthy of a program in its own right. 3. How do we represent strings internally to the dissection engine? We are pretty standardized right now on null-terminated ASCII, but some places use UTF8, some use counted strings, etc. My vote here would be to standardize on counted UTF8 of some sort, since that is relatively simple to manipulate and is capable of representing any string I can think of (including embedded nulls, which keep popping up as a problem). 4. Given 1-3, what APIs do we expose to dissectors? 5. Given 4, how do we get there from here? Cheers, Evan ___________________________________________________________________________ Sent via: Wireshark-dev mailing list <wireshark-dev () wireshark org> Archives: http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-request () wireshark org?subject=unsubscribe
Current thread:
- Request for RFC regarding string handling Evan Huus (Oct 28)
- Re: Request for RFC regarding string handling Ed Beroset (Oct 28)
- Re: Request for RFC regarding string handling Guy Harris (Oct 28)
- Re: Request for RFC regarding string handling Evan Huus (Oct 29)
- Re: Request for RFC regarding string handling Ed Beroset (Oct 28)