Nmap Development mailing list archives

Re: [NSE] ideas for http library


From: David Fifield <david () bamsoftware com>
Date: Mon, 9 Jan 2012 19:08:02 -0800

On Thu, Oct 27, 2011 at 02:42:53AM -0400, Patrick Donnelly wrote:
I promised a while back I'd make a post about this earlier this Summer
and just now remembered:

We were discussing pipelining for the http library and how it doesn't
currently use the caching mechanism of http.get. I brought up how it'd
be useful if pipelining were transparent and automatic via http.get
and the rest of the current pipelining API were thrown out.

Essentially, the http library would have separate worker threads (one
thread for each host) which would concatenate requests for URIs to a
single host via pipelining and give back the response to the script
which requested it via some sort of callback (or similar design). The
advantages of this is that we have only one active connection to an
HTTP server which reduces load, improves overall performance, and
increases parallelism (more scripts can do HTTP requests at once).

Some things to consider:

o The worker thread would "belong" to the first script which made the
request. I believe currently that worker thread would function
normally despite that script possibly finishing before its work is
done. Perhaps there should be a separate "library thread" function
which would have separate ownership mechanisms.

o http.get should maybe block in certain circumstances e.g. when the
pipeline "queue" is full. That way a script doesn't make thousands of
requests and then wait for callbacks to return results.

o Other http functions should also work in the same pipeline, e.g. http.head.

o Obviously, the cache should function in the new system.

I agree with what you wrote above. The most commonly used http functions
should go through a library that automatically pipelines them if
necessary. It may not be as easy as the RFCs make it sound, though. I
think I heard that browsers severely limit the amount they're willing to
pipeline, just because support is so flaky on servers and intermediate
devices like caches, and because it can be costly to restart a pipeline
when there's an error.

We would retain some low-level functions that allow you to exactly craft
a single request.

I seem to recall that you're not supposed to pipeline POST (because you
don't know whether it might have taken effect in case there's an error).

David Fifield
_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/


Current thread: