Nmap Development mailing list archives
Re: RFC: patch to skip some service matches
From: Daniel Miller <bonsaiviking () gmail com>
Date: Wed, 24 Aug 2016 15:52:02 -0500
David, I considered this concept, but didn't get as far as investigating actual algorithms because of an important constraint we have: the order of match lines matters. We have several cases where we put a more-specific match earlier and a more-general match later, so that a single service can stop at the most-specific match we can generate. Some examples: * Nearly all softmatches are very general and can be found near the end of the matches in a probe. * These comments in nmap-service-probes: # Needs to go before the Apache match lines -Doug # Needs to go before BaseHTTPServer match lines. # These should hopefully match before the more general Ubicom line in GenericLines # Has to come before BIND matches. # Has to come before BIND matches. # Sometimes we can get a host name or an IP address; those with come before those without. Of course, for the vast majority of cases it wouldn't matter, and we could come up with something like runlevels or dependencies for match lines, but I wanted very much to not affect the syntax of the nmap-service-probes file. Dan P.S. Two additional notes: 1. Finding and improving slow match lines, like anchoring unanchored ones, will speed up any implementation. 2. Some reordering to group similar match lines would improve the speedup of my implementation. On Wed, Aug 24, 2016 at 11:23 AM, David Fifield <david () bamsoftware com> wrote:
On Wed, Aug 24, 2016 at 12:20:37AM -0500, Daniel Miller wrote:List, I've spent probably too much time today enhancing Nmap's service matching system to try to reduce CPU time spent in regular expression matching. Unfortunately, I can't tell whether it has improved anything yet, so I'maskingfor help testing. Since this is a CPU-time enhancement, it would only affect scans whichareCPU-bound. For this reason, I've CC'd Tudor and Brandon, as their GSoCprojectresulted in speedups of certain large scans due to algorithmicimprovements,and I hope they can test. The change involves inspecting each match line in nmap-service-probes tosee ifit is part of a contiguous group of match lines that will only evermatch astring starting with a given single byte. Already, this means we aretargetingonly the very fastest match lines, so chances are good there won't be noticeable improvement. The first such match in a group will link to thelastone (by index) so that the entire group can be skipped if the firstmatch failsbecause of an incorrect initial byte. There are groups of hundreds ofsuchcontiguous match lines in a few places: FTP matches starting with "2",SSHmatches starting with "S", and HTTP matches starting with "H" forinstance. I've thought about this problem too. I'm intrigued by the Rust "RegexSet" API: https://doc.rust-lang.org/regex/regex/struct.RegexSet.html "A regex set corresponds to the union of two or more regular expressions... it will also report *which* regular expressions in the set match." A RegexSet feels like the "right" way to do our kind of matching. But I haven't found an equivalent API in PCRE, or in any library other than Rust's. It presumably requires the RE engine to use a linear-time matching algorithm (as in https://swtch.com/~rsc/regexp/regexp1.html) so that matching the union of many REs will be faster than matching each one individually in turn--the NFA implementation would do your first-byte optimization implicitly. You could cobble together a poor man's RegexSet using binary search. First, construct an RE that is the union of all the REs in the set: re_1|re_2|re_3|...|re_n If it fails to match, then none of the component REs matches. Otherwise, split the set into two halves and see if either of them matches: re_1|re_2|...|re_{n/2} re_{n/2+1}|re_{n/2+2}|...|re_n Continue recursing, preferring leftmost matches, until you get to a single RE that matches. This way you only need log(n) match operations instead of n. But: it'll only be faster if PCRE applies some optimization such that matching a large union is not basically equivalent to trying each component RE in turn.
_______________________________________________ Sent through the dev mailing list https://nmap.org/mailman/listinfo/dev Archived at http://seclists.org/nmap-dev/
Current thread:
- RFC: patch to skip some service matches Daniel Miller (Aug 23)
- Re: RFC: patch to skip some service matches David Fifield (Aug 24)
- Re: RFC: patch to skip some service matches Daniel Miller (Aug 24)
- Re: RFC: patch to skip some service matches David Fifield (Aug 24)
- Re: RFC: patch to skip some service matches Daniel Miller (Aug 24)
- Re: RFC: patch to skip some service matches David Fifield (Aug 24)