Nmap Development mailing list archives

Re: [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans)

From: Brandon Enright <bmenrigh () ucsd edu>
Date: Fri, 3 Apr 2009 20:53:26 +0000

On Fri, 3 Apr 2009 01:57:17 +0000
doug () hcsw org wrote:

Hi Brandon,


...snip...


I always try to make sure that there are no such segments
between .* groups if the s modifier is in use. While avoiding
the s modifier will usually allow matches to fail early in some
cases, I think the s modifier is very important in creating
robust match lines for some protocols like HTTP.


Yes but as you point out below, sometimes content between .* is
actually matching on something useful.  I looked at all content 10
chars or less between two .* and some of it was pretty decent.

...snip...


I can't really think of any cases where it's necessary to have
something like .*\r\n.* in a match line with an s modifier.


I was operating under the assumption that for some reason unknown to me
it was important to make sure that there were some buffer headers
between the start of the response and the header like Server: that was
being examined.

...snip...


Like I said, I try to avoid matches like that but I think I know some
ways they can occur:

1) A match line originally has no s modifier but later was changed to
   have an s modifier without removing problematic sections of the
   match like .*\n.*


I hadn't considered this but now that you point it out I think this is
the likely source of all of our .*\n.*


2) When constructing an s match line, someone had a more unique
   section like .*X-Unique-header: blahblah\r\n.* but found that
   the X-Unique-header: wasn't always there and removed the header
   part but left the line terminator .*\r\n.* in the final version.


Possibly.  Leaving the the \r\n will still require *some* header though.

In all the cases I've run into this issue I've been able to fix the
match by using atomic grouping and lazy quantification.


I think we can make the following substitution on all s modifier
match lines (untested):

s/[.][*]([\]r[\]n|[\]n)*?[.][*]/.*/g


I agree but I'd suggest a few modifications.

First) we need to make sure your first [.] isn't preceded by a \ (easy
with negative look-behind).  Second) in most cases the content trailing
the .* is still in the header and not in the body.  Lazy quantification
with .*? should be generally faster because it won't consume the whole
string and then slowly back off.  I'd propose changing to .*? in the
case that the trailing content is still in the header and .* when the
trailing content is in the body.


The resulting match lines will match strict supersets of the previous
match lines' matches (meaning anything that used to match will still
match plus at least 1 more, newline replaced with empty string) and I
don't think these segments add any important value to the matching
process.


Agreed.

There may be unusual cases I'm not considering at the moment
though, perhaps very tough to match services whose only identifying
characteristics are the order and count of their newlines, so I think
these should be processed on a case-by-case basis as you appear to be
doing.


Yeah, I plan on doing these by hand rather than via search-and-replace.

Here is a
match diff:

* -match http m|^HTTP/1\.0 \d\d\d .*\n.*Server:
ADSM_HTTP/([\d.]+)\r?\n.*<TITLE>\nServer ...snip...
* +match http m|^HTTP/1\.0 \d\d\d (?>.*?\n).*Server:
ADSM_HTTP/([\d.]+)\r?\n.*<TITLE>\nServer ...snip...


What about just replacing .*\n.* with .* ?


I was trying to keep the match *exact* but as you've pointed above, the
\n doesn't add enough to the match to be worthwhile.


*********************************************************************
But remember that in non s modifier match lines it is very important
to keep these segments as is to ensure it still matches.
*********************************************************************


Yeah.  And in the case where 's' isn't there, I'll evaluate whether it
should be there or not.

...snip..


Thanks for doing this. Artifacts like ".*.*" are especially
embarassing (though it's possible this case is optimised out by PCRE).


Probably although I suspect it increase the time spent in the PCRE
study routine.

This isn't going to fix ALL of our MATCHLIMIT problems but it
should go a long way towards making the problem better.


Agreed. Please let me/the list know if you notice any other patterns
in use that cause this warning to be generated.

Thanks,

Doug


If you're okay with me going through by hand and replacing .*\r?\n.*
with .* or .*? I'll get started right away.  There are about 50 matches
that need work.

Brandon

Attachment: signature.asc
Description:


_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://SecLists.Org

Current thread:

[RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans) Brandon Enright (Apr 02)
- Re: [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans) doug (Apr 02)
  - Re: [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans) Brandon Enright (Apr 03)
    - Re: [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans) doug (Apr 05)
    - Re: [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans) Brandon Enright (Apr 07)