Nmap Development mailing list archives
Re: [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans)
From: doug () hcsw org
Date: Fri, 3 Apr 2009 01:57:17 +0000
Hi Brandon, On Thu, Apr 02, 2009 at 11:09:57PM +0000 or thereabouts, Brandon Enright wrote:
In scanning the thousands of services on our network I regularly run into the following error: Warning: Hit PCRE_ERROR_MATCHLIMIT when probing for service http with the regex '^HTTP/1\.0 \d\d\d .*\r\n.*\r\n\r\n.*\t<title>Strongdc\+\+ webserver - Login Page</title>\t' There are a number match lines that trigger this, here are a couple more examples: Warning: Hit PCRE_ERROR_MATCHLIMIT when probing for service http with the regex '^HTTP/1\.0 \d\d\d .*\n.*Server: ADSM_HTTP/([\d.]+)\nContent-type: text/html\n\n<HEAD>\n<TITLE>\nServer Administration\n</TITLE>\n\n<META NAME=\"IBMproduct\" CONTENT=\"ADSM\">\n<META NAME=\"IBMproductVersion\" CONTENT=\"([\d.]+)\">.*Storage Management Server for AIX'
Ya, I've seen this warning before too although I can't remember the service or match line that triggered it. We should definitely avoid backtracking as much as possible. I always try to make sure that there are no such segments between .* groups if the s modifier is in use. While avoiding the s modifier will usually allow matches to fail early in some cases, I think the s modifier is very important in creating robust match lines for some protocols like HTTP. Here is a typical HTTP s modifier match: match http m|^HTTP/1\.0 200 .*\r\nServer: Allegro-Software-RomPager/([\w-_.]+)\r\n.*<TITLE>SONY NSP-100 Main Page</TITLE>|s \------------/ \--------------------------------------------------/ \-----------------------------------/ 1 2 3 1: Matches the service, HTTP 2: Gets the httpd used, no matter where the server: line appears in the header, and irregardless of the presence/ordering of other http headers. 3: Some unique string that confirms the device branding. I can't really think of any cases where it's necessary to have something like .*\r\n.* in a match line with an s modifier.
Warning: Hit PCRE_ERROR_MATCHLIMIT when probing for service http with the regex '^HTTP/1\.0 \d\d\d .*\n.*Server: ADSM_HTTP/([\d.]+)\r?\n.*<TITLE>\nServer Administration\n</TITLE>.*<META NAME=\"IBMproductVersion\" CONTENT=\"([\d.]+)\">.*<TITLE>\nAdministrator Login\n</TITLE>.*Storage Management Server for Windows' The issue is in the construction of the match over/poorly using the greedy quantifier ".*" as in: "HTTP/1\.0 \d\d\d .*\n.*Server:" The problem arises when matching against services that have a large number of partial matches between the .* constructs that force the engine to backtrack too much while trying to match.
Like I said, I try to avoid matches like that but I think I know some ways they can occur: 1) A match line originally has no s modifier but later was changed to have an s modifier without removing problematic sections of the match like .*\n.* 2) When constructing an s match line, someone had a more unique section like .*X-Unique-header: blahblah\r\n.* but found that the X-Unique-header: wasn't always there and removed the header part but left the line terminator .*\r\n.* in the final version.
In all the cases I've run into this issue I've been able to fix the match by using atomic grouping and lazy quantification.
I think we can make the following substitution on all s modifier match lines (untested): s/[.][*]([\]r[\]n|[\]n)*?[.][*]/.*/g The resulting match lines will match strict supersets of the previous match lines' matches (meaning anything that used to match will still match plus at least 1 more, newline replaced with empty string) and I don't think these segments add any important value to the matching process. There may be unusual cases I'm not considering at the moment though, perhaps very tough to match services whose only identifying characteristics are the order and count of their newlines, so I think these should be processed on a case-by-case basis as you appear to be doing.
Here is a match diff: * -match http m|^HTTP/1\.0 \d\d\d .*\n.*Server: ADSM_HTTP/([\d.]+)\r?\n.*<TITLE>\nServer ...snip... * +match http m|^HTTP/1\.0 \d\d\d (?>.*?\n).*Server: ADSM_HTTP/([\d.]+)\r?\n.*<TITLE>\nServer ...snip...
What about just replacing .*\n.* with .* ? ********************************************************************* But remember that in non s modifier match lines it is very important to keep these segments as is to ensure it still matches. *********************************************************************
Rather than fix the handful of these that happen to come up in my scans, I got to thinking about how to recognize one of the patterns that makes these problems. Essentially, any simple string between two .* clauses that can appear in many places in output can cause excessive backtracking. This command will find a list of candidates for this "bad" pattern: $ cat nmap-service-probes | perl -ne 'print $1, "\n" if ($_ =~ /((?!<\\)\.\*[^.*]{0,10}\.\*)/)' In looking through that list, it seems that \r\n and variations on it are the most common problem construction we have: $ cat nmap-service-probes | perl -ne 'print $_ if ($_ =~ m/(?!<\\)\.\*((\\r)?\\n)+\.\*/)' We do have one ".*.*":
Thanks for doing this. Artifacts like ".*.*" are especially embarassing (though it's possible this case is optimised out by PCRE).
This isn't going to fix ALL of our MATCHLIMIT problems but it should go a long way towards making the problem better.
Agreed. Please let me/the list know if you notice any other patterns in use that cause this warning to be generated. Thanks, Doug
Attachment:
_bin
Description:
_______________________________________________ Sent through the nmap-dev mailing list http://cgi.insecure.org/mailman/listinfo/nmap-dev Archived at http://SecLists.Org
Current thread:
- [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans) Brandon Enright (Apr 02)
- Re: [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans) doug (Apr 02)
- Re: [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans) Brandon Enright (Apr 03)
- Re: [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans) doug (Apr 05)
- Re: [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans) Brandon Enright (Apr 07)
- Re: [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans) Brandon Enright (Apr 03)
- Re: [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans) doug (Apr 02)