Nmap Development mailing list archives
Re: [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans)
From: Brandon Enright <bmenrigh () ucsd edu>
Date: Fri, 3 Apr 2009 20:53:26 +0000
On Fri, 3 Apr 2009 01:57:17 +0000 doug () hcsw org wrote:
Hi Brandon,
...snip...
I always try to make sure that there are no such segments between .* groups if the s modifier is in use. While avoiding the s modifier will usually allow matches to fail early in some cases, I think the s modifier is very important in creating robust match lines for some protocols like HTTP.
Yes but as you point out below, sometimes content between .* is actually matching on something useful. I looked at all content 10 chars or less between two .* and some of it was pretty decent. ...snip...
I can't really think of any cases where it's necessary to have something like .*\r\n.* in a match line with an s modifier.
I was operating under the assumption that for some reason unknown to me it was important to make sure that there were some buffer headers between the start of the response and the header like Server: that was being examined. ...snip...
Like I said, I try to avoid matches like that but I think I know some ways they can occur: 1) A match line originally has no s modifier but later was changed to have an s modifier without removing problematic sections of the match like .*\n.*
I hadn't considered this but now that you point it out I think this is the likely source of all of our .*\n.*
2) When constructing an s match line, someone had a more unique section like .*X-Unique-header: blahblah\r\n.* but found that the X-Unique-header: wasn't always there and removed the header part but left the line terminator .*\r\n.* in the final version.
Possibly. Leaving the the \r\n will still require *some* header though.
In all the cases I've run into this issue I've been able to fix the match by using atomic grouping and lazy quantification.I think we can make the following substitution on all s modifier match lines (untested): s/[.][*]([\]r[\]n|[\]n)*?[.][*]/.*/g
I agree but I'd suggest a few modifications. First) we need to make sure your first [.] isn't preceded by a \ (easy with negative look-behind). Second) in most cases the content trailing the .* is still in the header and not in the body. Lazy quantification with .*? should be generally faster because it won't consume the whole string and then slowly back off. I'd propose changing to .*? in the case that the trailing content is still in the header and .* when the trailing content is in the body.
The resulting match lines will match strict supersets of the previous match lines' matches (meaning anything that used to match will still match plus at least 1 more, newline replaced with empty string) and I don't think these segments add any important value to the matching process.
Agreed.
There may be unusual cases I'm not considering at the moment though, perhaps very tough to match services whose only identifying characteristics are the order and count of their newlines, so I think these should be processed on a case-by-case basis as you appear to be doing.
Yeah, I plan on doing these by hand rather than via search-and-replace.
Here is a match diff: * -match http m|^HTTP/1\.0 \d\d\d .*\n.*Server: ADSM_HTTP/([\d.]+)\r?\n.*<TITLE>\nServer ...snip... * +match http m|^HTTP/1\.0 \d\d\d (?>.*?\n).*Server: ADSM_HTTP/([\d.]+)\r?\n.*<TITLE>\nServer ...snip...What about just replacing .*\n.* with .* ?
I was trying to keep the match *exact* but as you've pointed above, the \n doesn't add enough to the match to be worthwhile.
********************************************************************* But remember that in non s modifier match lines it is very important to keep these segments as is to ensure it still matches. *********************************************************************
Yeah. And in the case where 's' isn't there, I'll evaluate whether it should be there or not. ...snip..
Thanks for doing this. Artifacts like ".*.*" are especially embarassing (though it's possible this case is optimised out by PCRE).
Probably although I suspect it increase the time spent in the PCRE study routine.
This isn't going to fix ALL of our MATCHLIMIT problems but it should go a long way towards making the problem better.Agreed. Please let me/the list know if you notice any other patterns in use that cause this warning to be generated. Thanks, Doug
If you're okay with me going through by hand and replacing .*\r?\n.* with .* or .*? I'll get started right away. There are about 50 matches that need work. Brandon
Attachment:
signature.asc
Description:
_______________________________________________ Sent through the nmap-dev mailing list http://cgi.insecure.org/mailman/listinfo/nmap-dev Archived at http://SecLists.Org
Current thread:
- [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans) Brandon Enright (Apr 02)
- Re: [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans) doug (Apr 02)
- Re: [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans) Brandon Enright (Apr 03)
- Re: [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans) doug (Apr 05)
- Re: [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans) Brandon Enright (Apr 07)
- Re: [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans) Brandon Enright (Apr 03)
- Re: [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans) doug (Apr 02)