Wireshark mailing list archives
Re: utf8 support on http dissectors
From: Guy Harris <guy () alum mit edu>
Date: Mon, 19 Mar 2018 10:00:39 -0700
On Mar 19, 2018, at 4:53 AM, Roberto Ayuso <roberto.ayuso () gmail com> wrote:
Really I mean two fields http.file_data and http,request_uri, both can have non ascii chars but are treated only as ascii on the source code. Cannot be added a option to manage that?
The question is whether it *should* be added, i.e. whether it would be correct to do so, not whether it *could* be added. Please read my reply in detail. The body of an HTTP request or response is *not* necessarily text, so always treating it as text is an error, and if it *is* text, either the content type specifies the character encoding, or the encoding is the default ASCII encoding, so if we *do* treat it as text, we should use the content type, *not* a user preference, to control the encoding. So either it should be an FT_BYTES field, which has no character encoding and thus would neither be ASCII nor UTF-8, or the dissector should determine whether the content type corresponds to text or not and: if it's not text, it should add it as an FT_BYTES version of http.file_data; if it is text, it should be extracted using whatever the character encoding specified by the content type is, and added as an FT_STRING version of http.file_data (if it's a character encoding Wireshark currently doesn't support, fall back on FT_BYTES). As for the request URI, are there non-ASCII characters because of percent-escaping, or because the URI uses RFC 2047-style indicators, or because the sending machine just added octets with the 8th bit set in *some* encoding, not necessarily UTF-8? Those three possibilities would have to be handled in different ways. ___________________________________________________________________________ Sent via: Wireshark-dev mailing list <wireshark-dev () wireshark org> Archives: https://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://www.wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-request () wireshark org?subject=unsubscribe
Current thread:
- utf8 support on http dissectors Roberto Ayuso (Mar 19)
- Re: utf8 support on http dissectors Guy Harris (Mar 19)
- Re: utf8 support on http dissectors Roberto Ayuso (Mar 19)
- Re: utf8 support on http dissectors Guy Harris (Mar 19)
- Re: utf8 support on http dissectors Roberto Ayuso (Mar 19)
- Re: utf8 support on http dissectors Guy Harris (Mar 19)