Wireshark mailing list archives
Re: Wrongly escaped UTF-8 characters in JSON values ( epan/print.c )
From: Richard Sharpe <realrichardsharpe () gmail com>
Date: Fri, 6 Jul 2018 07:32:59 -0700
On Fri, Jul 6, 2018 at 4:46 AM, Andrea Lo Pumo <alopumo () movia biz> wrote:
From: Dario LombardoWhat do you mean by "I do not know the Wireshark code"? What did you patch? Do you mean you don't know the submission procedure instead?I mean I do not know the full implications of changing the code as I did. It worked for me because I am just interested in gsm_sms.sms_text, however, before accepting this patch someone with better understanding of the Wireshark code should think if it is ok.What did you patch?print_escaped_bare() of epan/print.c 2018-07-05 16:01 GMT+02:00 Andrea Lo Pumo <alopumo () movia biz>:I am using "tshark -T json -V -r file.pcap" and specifically I am looking for the gsm_sms.sms_text field. I get this output: "gsm_sms.sms_text": "Ok per\u00c3\u00b2 non piove" Instead, using "tshark -V -r file.pcap" I get: SMS text: Ok però non piove (There is an accent in the "o" of "però") The problem is that the \uXXYY syntax is UTF-16 (see [1]), while "ò" is UTF-8 and its bytes are c3 b2. Wireshark writes c3 b2 as they were UTF-16. I solved the problem by changing print_escaped_bare() of epan/print.c as follow: substitute default: if (g_ascii_isprint(*p)) fputc(*p, fh); else { g_snprintf(temp_str, sizeof(temp_str), "\\u00%02x", (guint8)*p); fputs(temp_str, fh); } with default: fputc(*p, fh); I do not know the Wireshark code, so I am not submitting a patch. This, however, should work because JSON supports UTF-8 (see again [1]). [1] From the JSON page on Wikipedia: JSON exchange in an open ecosystem must be encoded in UTF-8. However, if escaped, those characters must be written using UTF-16 surrogate pairs, a detail missed by some JSON parsers.
That may mean the change needs to be wrapped in #indef Windows or something similar, because the current encoding looks like it is for Windows, but I may be wrong. -- Regards, Richard Sharpe (何以解憂?唯有杜康。--曹操)(传说杜康是酒的发明者) ___________________________________________________________________________ Sent via: Wireshark-dev mailing list <wireshark-dev () wireshark org> Archives: https://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://www.wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-request () wireshark org?subject=unsubscribe
Current thread:
- Wrongly escaped UTF-8 characters in JSON values ( epan/print.c ) Andrea Lo Pumo (Jul 05)
- Re: Wrongly escaped UTF-8 characters in JSON values ( epan/print.c ) Dario Lombardo (Jul 05)
- Re: Wrongly escaped UTF-8 characters in JSON values ( epan/print.c ) Andrea Lo Pumo (Jul 06)
- Re: Wrongly escaped UTF-8 characters in JSON values ( epan/print.c ) Dario Lombardo (Jul 06)
- Re: Wrongly escaped UTF-8 characters in JSON values ( epan/print.c ) Richard Sharpe (Jul 06)