Nmap Development mailing list archives

Re: Nsock read buffer


From: ambarisha b <b.ambarisha () gmail com>
Date: Wed, 12 Jan 2011 23:40:02 +0530

Hi,

This is what I could come up with basing on the use-cases that David
suggested.I still can't figure out the need to add another buffer to
the iod struct.

use-cases:

1. Return everything left on the socket (looped reading), with a
  sanity limit on bytes.
2. Return whatever is immediately available on the socket.
3. Return an exact number of bytes.
4. Return an exact number of lines, with a sanity limit on bytes.


Setting aside 1(Addressed at the end)


2.MSG_PEEK option could be set on the recvfrom call which will give
the size of the data already available on the socket and then you do
safe_malloc to get a buffer of that size.Then you read the  entire
data into the newly created buffer and store the pointer in the nse
structure.O_NONBLOCK option might also be set to make sure that we
don't get blocked if the socket has 0 bytes.

3.Do recvfrom with nse->readinfo.num which is the no.of bytes
requested by the user, only that number of bytes are read from the
socket buffer and the remaining bytes still remain in the socket
buffer to be read by the next read call.Appropriate action can be
taken if there aren't requested number of bytes in the socket.This
verification can again be carried out using MSG_PEEK option.

4.Read from the socket byte by byte till you get required no.of '\n'
characters or EOF or no data available to be read yet.In the later two
cases, that could be an error or whatever you guys       decide
upon.The sanity check , ofcourse, could be placed at the loop counter.

Advantages:

-->Select call would work without any trouble, as we have no need to
verify if there is any data to be read from buffers other than the
socket buffer.But there has also been a proposal to migrate    off
from Select infavour of (maybe)poll.Heard it works better.Haven't
checked it out myself yet.So, this might be the time to think of that
as well.
-->There would be no need to maintain another buffer for each
socket.It seems pointless to have another buffer just to serve the
purpose of "buffering".Infact , trying to read from the socket into
the custom buffer 'buf' seems to be core of the problem.We are
unnecessarily trying to replicate the buffering mechanism which is
already built into the OS networking sockets.

Disadvantages:

-->Obvious disadvantage that can be percieved is interms of the
overhead incurred due the repeated read calls to socket in case-4.But
still that might not account to much as the entire data must
anyhow be copied into the proposed new buffer and verified byte by
byte running a count for '\n' characters.The difference would be "one
big read call + repeated user buffer accesses" vs "         repeated
accesses to the kernel buffer"

-->Another disadvatage that I can think of is again in case.4.Suppose
after reading byte by byte , we find out that the data doesn't contain
as many lines as requested by the user, we have no way   to put the
read data back in.Either we could wait for more data or we could
return the incomplete data with an indication of the exception
through,perhaps, errno.

Coming to 1:
As far as I am aware the normal unix sockets api doesn't provide a
read option like that.May be MSG_WAITALL can be worked into by
supplying it with a sanity limit as an argument.I am not sure of   its
behavior when it encounters EOF before reading requested no.of
bytes.Still adding such a functionality to nsock makes sense.Another
way I can think of right now is to make blocking read calls  in an
infinite loop until EOF is read from the socket, which would be
indicated by read returning 0.Another thing to consider is the
timeout.As we are making multiple calls timeout cannot be
handled by read itself.We should explicitly set an alarm(timeout_ms);
before initializing the
loop.And break out of the loop either when read returns a zero or when
alarm signal is recieved.

I think I haven't analysed the SSL section clearly yet.So any comments
or ideas are most welcome.


Cheers,
Ambarisha


On Wed, Jan 12, 2011 at 10:27 AM, David Fifield <david () bamsoftware com> wrote:
On Mon, Nov 15, 2010 at 10:23:15PM -0800, David Fifield wrote:
I started implementing this in the branch

svn co --username guest --password "" svn://svn.insecure.org/nmap-exp/david/nmap-readbuf

I attached a script that shows how it works. nsock_readbytes and
nsock_readlines (and hence nmap.receive_bytes and nmap.receive_lines)
return exactly what you ask for, not more and not less. If there are not
enough bytes available, the functions return an error and the bytes stay
in the buffer for the next read. If there are more than enough bytes
available, only the first n bytes or lines are returned, and what's left
over remains in the buffer.

I think that the behavior of nsock_readbytes and nsock_readlines in my
branch is just write. My question is, how should nsock_read work?
nsock_read is used to mean both of these two things:
1. Keep reading until timeout or EOF, and return everything.
2. Return me some small chunk of data, up to one buffer's worth, but I
   don't care about the exact number.
In practice, nsock_receive does (1), except that there a built-in limit
of 589823 bytes, after which the read always finishes. (Look for the
comment "spews and spews data" in nsock_core.c.) When people want (2),
they have been using nsock_readbytes(1), which only does one recv and
only returns up to a relatively small fixed-size buffer (like recv).

My personal feeling is the nsock_read should work like (2). But we still
want a function that works like (1) and returns a big block of data all
at once. Service scan does this for example. But even service scan has
no use for 589823 bytes; it seems like the caller should be able to set
this limit. "As many bytes as are available, but not more than X." This
is pretty close to how nsock_readbytes works now, in that it quits
reading once enough bytes are available, except that you can still get
more than you asked for (up to one fixed-size buffer's worth more).

I've been going back and forth over the API for these read functions.
I've identified four use cases:

1. Return everything left on the socket (looped reading), with a
  sanity limit on bytes.
2. Return whatever is immediately available on the socket.
3. Return an exact number of bytes.
4. Return an exact number of lines, with a sanity limit on bytes.

In the branch, nsock_readbytes and nsock_readlines do (3) and (4).
nsock_read does (2), and there's no built-in way to do (1).

I updated some scripts that were, for example, using
socket:receive_bytes(1), to use socket:receive() instead. However I
can't guarantee that this is correct in every case.

Another question is what to do with sanity byte limits. We can't just
read forever if someone keeps writing without sending a newline. Current
nsock_readlines already has such a limit, only it is an undocumented,
unspecified internal constant. If there's no newline you get back a
partial line. An alternative is just to return an error and not violate
the contract that we will return only the requested number of lines.

Some of the migration is tricky. Here is an example from domcom-cmd.nse.
The call to receive_lines(1) is expecting to get more than one line. It
looks like it's expecting a line with "BeginData" and one with
"EndData".

 local status, line = socket:receive_lines(1)
 if ( not(status) ) then return false, "Failed to read line" end
 lines = stdnse.strsplit( "\n", line )
 for _, line in ipairs( lines ) do
   if ( not(line:match("BeginData")) and not(line:match("EndData")) ) then
     table.insert(result, line)
   end
 end

Strictly speaking, this code is incorrect. There's no reason why
receive_lines has to return everything on the socket. It just happens
that line endings fall on packet boundaries so no half-lines are
returned. Since the new receive_lines(1) will always return exactly one
line, this has to be rewritten. What's wanted here is an nsock_read that
does option (1) above. But then it becomes obvious that the code is
relying on an integer number of lines being returned, and you have to
write an explicit loop with receive_lines(1). This is what I have done
but it's more code, and frustrating for a project that was supposed to
simplify code.

David Fifield
_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/

_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/


Current thread: