Nmap Development mailing list archives

[NSE] Auto pipelining for http library


From: Peter O <perdo.olma () gmail com>
Date: Tue, 21 Aug 2012 01:29:33 +0200

Hi all,

during the last few weeks I've been working on adding auto pipelining
features to http library. This post summarizes what I've been able to
accomplish.

The library can be found in my nmap-dev directory which can be checked
out with: svn co
https://svn.nmap.org/nmap-exp/peter/nse-auto-pipeline.


Intro
------

The new design aims to make pipelining fully automatic and transparent
to script writers. It also makes use of persistent connections by
keeping sockets open for some short amount of time after sending a
request and reading a response.

The new library should introduce speedup while doing large scans as
the nsock library binding allows 20 scripts to have sockets at a time,
so many of them would block. With the new library, all the scripts
could use the same socket. This is accomplished by having a thread in
http library that 'owns' a socket for a particular host and keeps it
open for scripts to use.

Unlike the old pipelining API, the new one supports caching.


More details
----------------

o The old API
http.{get|head} in the current API function by first creating a
socket, then issuing a request on this socket, reading a response and
finally closing it. This works OK if we have a moderate number of
scripts running against a host and they will not block due to the
limitation nsock introduces (20 sockets). However, it'd be nice to
take advantage of HTTP/1.1 persistent connections that let us issue
several requests on the same socket without reconnecting. This will
allow many scripts to use the same socket and not block. Another
problem with the current API is that it doesn't use pipelining, which
is basically sending multiple requests at once.

o The new API
As mentioned before, there are threads in http library that own
sockets connected to hosts. Scripts issue requests via a routine that
communicates with those threads. Connections are closed only when
there are no requests for ~1 sec for a particular host, server closed
the connection or we hit some kind of limit - for example we exceeded
the number of requests server allows per one connection.

This image: http://s8.postimage.org/iyt8nmgv9/pipe_simple.png (it's
also attached to this message) is a graphical representation of how
things work. The steps listed below correspond to those in the image.

- Step 1
http.{get|head} is a blocking call, what it means is a script sends a
request, and blocks waiting for a response.

- Step 2
The http library manages library worker threads, which keep
connections open and pipeline requests for a host. On the first
http.{get|head}, such a worker is created by a routine in http.lua.
Worker waits on a condition variable if its request queue is empty.
Scripts communicate with those threads by a routine in http library,
which adds requests to worker's queue, then signals the worker.

- Step 3
Worker reads the request queue, pipelines all the requests, sends them
out and collects responses.

- Steps 4, 5
The responses are saved and made available for the routine, which
reads them and sends them back to requesting scripts.

Summing up, the main difference between the current http.{get|head}
and the new one is instead of a script calling http.{get|head}
actually issuing a request on a socket it controls, it passes that
request onto a http worker thread which is doing pipelining
for a host and waits for a response. The connections are not owned by
scripts anymore, the worker threads in http library own them. After
issuing a request and getting a response, the connections are left
open. Connections are closed when one of aforementioned conditions
occur.

If we decide to enable pipelining by default, then scripts will be
allowed to specify no_pipeline option which results in script issuing
a regular, single request the way http.{get|head} function ATM.

The mechanism described above pipelines requests that multiple scripts
make for the same host. http.{get|head} being a blocking call forces
scripts, which would like to take advantage of pipelining, to have
their own worker threads. You can see how to accomplish this here:
https://svn.nmap.org/nmap-exp/peter/pipe.nse (this script is also
attached to this message).

The script I've linked above should have the worker pool code exported
into a library to make it easier to pipeline http requests.


Testing
---------

Run times that I'm listing below are usually arithmetic mean of few
runs. Tests have been run against a Linode server that was provided by
Patrick (thanks!).

As you'll probably notice, some tests have been run with
--max-parallelism 1 or 2. This is an attempt to model a scenario where
there's a large scan running and most of the sockets are occupied.
Also, we don't want socket parallelism to weight the results for the
serial GET case. If we have 20 sockets available for serial GET,
pipelining will be beaten in most circumstances.
The new API is not affected by --max-parallelism as much as the old
one, because we reuse the same socket and pipeline the requests. In
the old API, we create a socket each time we do a request so if the
number of sockets is limited, blocking occurs.

Setups for the following tests are:
  - new-pipeline == new pipelining API, with script workers, that is,
a script creates a thread for each http.get it issues
  - parallel-request == the old API, and same as above, script creates
a thread for each http.get it issues.

I. The following results are against an Apache server with default
settings (pipelining enabled and 100 allowed requests/connection).

1a) One script making N requests. This models a situation where a user
just want to run a single script (like http-sql-injection for example)
against a target.

                         120 reqs     320 reqs
new-pipeline          3s              5.5s
parallel-request     4s              8.2s

1b) Same as 1a, but with --max-parallelism 1

                         120 reqs     320 reqs
new-pipeline          3s              5.5s
parallel-request    21s             58s

2a) 100 scripts, each making 1 request. This models a situation where
a lot of small http scripts (http-title, http-robots, etc.) issue a
request to get some info from the host.

new-pipeline        3s
parallel-request   3s

2b) Same as 2a, but with --max-parallelism 1.

new-pipeline          3s
parallel-request    18s

3) 2 scripts, each making N requests. This is for example a case where
two spiders are running.

                           20 req/each     120 req/each    200 req/each
new-pipeline             2.15s                4.5s                    6.1s
parallel-request        1.5s                  6.3s                   10.2s

4) 2 scripts, each making 200 requests, and --max-parallelism 1.

new-pipeline         6.2s
parallel-request    74s

5) --script "http-* and not http-slowloris and not http-form-fuzzer
and not http-enum". Note: those were single runs.

--max-parallelism        20          10         2
new-pipeline              272s       421s    1114s
parallel-request         294s       427s    1143s

II. If a server doesn't support pipelining (this particular server was
http.server that is provided by python 3). Setups are the same as
above.

1a)
                             120 reqs     320 reqs
new-pipeline            5s                9.4s
parallel-request       5s                 9s

1b)

                             120 reqs     320 reqs
new-pipeline            21s              64s
parallel-request       21s              64s


2a)

new-pipeline       4s
parallel-request  4s

2b)

new-pipeline            20s
parallel-request       20s

3)
                           20 req/each     120 req/each    200 req/each
new-pipeline            2.3s                  8.2s                   12s
parallel-request       1.9s                  7.5s                   12s

4)

new-pipeline          85s
parallel-request     84s

III. Comparison of old pipelining API with the new one using http-enum script.

new-pipeline        26s
parallel-request   22s


Options for integrating the API
--------------------------------------

In the next section you can find some links that mention drawbacks of
pipelining. That's why we need to think if we'd like to have pipelinig
enabled by default, or just let users specify a 'pipeline' option (but
still take advantage of persistent connections).


Some info on pipelining
------------------------------

For further reading on why enabling pipelining by default might cause
problems (and on pipelining in general), I suggest to follow those
links:

http://www.guypo.com/technical/http-pipelining-not-so-fast-nor-slow/
http://www.guypo.com/mobile/http-pipelining-big-in-mobile/
https://bugzilla.mozilla.org/show_bug.cgi?id=264354
https://bugzilla.mozilla.org/show_bug.cgi?id=329977.

An article in one of the above links says that web browsers usually
use small pipelines of 2-3 requests. The current implementation allows
to specify the size of pipeline, but with smaller pipelines I got
worse results.



Future work
---------------

Firstly, add a library that will manage a pool of workers for a
script. This would make pipelining easier for scripts to use.
Secondly, let there be more than one thread connected to a host. If a
pipeline for one thread is full, then use another one. This should
introduce some speedup.
Thirdly, review the API's code once again. I'm sure there are some
areas that can be written better.
Finally, make a lot more tests. Main focus should be on large scans
for 20+ hosts, because I haven't had a chance to run many of those
yet.
---------------------------------------------


I'd really appreciate any comments/tests/suggestions.


- Peter

Attachment: pipe.nse
Description:

_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/

Current thread: