Nmap Development mailing list archives

Re: [NSE] http-feed.nse


From: George Chatzisofroniou <sophron () latthi com>
Date: Sat, 17 Aug 2013 21:21:37 +0300

On Sat, Aug 17, 2013 at 10:14:38AM -0700, David Fifield wrote: 
To be clear, the &amp; is not the reason the feed wasn't detected with
the previous version of the script, right? It was because the script
lacked an "application/atom+xml" pattern.

It was both. Even if there was the right pattern, the url won't get parsed
correcly. Try commenting out "l = l:gsub("&amp;", "&")" to see it by yourself.
 
The right way to get rid of the &amp; is with an HTML parser. Since we
don't have that, I think I would prefer that we not interpret the string
at all in the script. If we handle &amp;, we really should handle &lt;
and &gt; and especially &quot; and &apos; that are likely to appear in
attribute values. (Granted, &amp; is the most likely and most
problematic of all of these.) But there are also numeric character
entities and the large number of HTML named entities too.

Decoding just &amp; and nothing else creates ambiguities, for example
both of the input strings
      http://example.com/?a=b&quot;=d
      http://example.com/?a=b&amp;quot;c=d
map to the same output string
      http://example.com/?a=b&quot;=d

I agree with you that it shouldn't be handled by just this script, in
just this place.

Yes, you are right. I don't like it either. I removed it from the script.

In HTML5 there are some rules about when an ampersand is just an
ampersand and when it is part of a character reference.
http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#syntax-ambiguous-ampersand
http://www.w3.org/International/questions/qa-escapes#bytheway

It looks like HTML5 rules are different from HTML4. According to HTML4 specs,
encoding an ampersand is always required, even though most of developers ignore
this.

-- 
George Chatzisofroniou
_______________________________________________
Sent through the dev mailing list
http://nmap.org/mailman/listinfo/dev
Archived at http://seclists.org/nmap-dev/


Current thread: