Nmap Development mailing list archives

Re: Bug with nping sequence number


From: Éric Hoffman <ehoffman () videotron ca>
Date: Tue, 23 Aug 2016 22:41:30 -0400

Hello Daniel,

Well, as I said the patch did fix the issue with nping, however I do have to confess that I honestly don't know much about the whole nmap suite and which modules uses the modified function, and as such, what could be affected (or broken) by the fix.

Indeed, I started with the idea that this would be a trivial fix, but realized that this is really a race condition/design type of problem.

I did not like the idea either of having this possible extra loop. It may be possible to rearrange the "fix" to avoid this. I did not really spent much time around finding the most elegant fix, however, the heart of the issue really is there. It's kind of chicken and the egg problem. If you set a timer event at, let's say, 100ms, and timeout from loop after 101ms, then if you check for timeout first, you may miss the timer trigger (which should logically always trigger, i.e. we should logically never see a timeout) since you may end up running the loop at time=95ms, followed by another run at time=105ms. However, if you always run the loop one last time, you may trigger some event which should not have been (for example, if a timer was set to 102ms while timeout was set to 98ms, with the above loop timing). This is pretty much the core of the event loop, and it affect not only created timers, but any other events.

However, in the above example, with the loop called @ time=95ms, and @ time=105ms, if instead of a timer event, you had set (for example) a keyboard event (simple key press), and you wish to timeout after 100ms, then in any case there is really no way to know at which time exactly the event occurred. If you read no key press @ time=95ms, and you read a key press @ time=105ms, what do we declare? Do we say that the key pressed "may" have arrived in time, and risking false positive? Or do we say that it did not, and risk false negative? Really you can not tell. For a key press event, or socket event, or many other events for that matter, the decision you take is not important. This is not important as long as having a false positive or false negative does not matter (when talking about a few ms over a few seconds for example). However, the nping issue does demonstrate that there are cases in which the decision you take will contradict expectations. The current code behavior can create false negative (timeout occurred), while setting a timer event and a timeout in the way it's done in nping DOES NOT (from a logical perspective) tolerate false negative. Applying the "fix" I did will change the behavior of the loop in which false positive may happen (event occurred before timeout).

So, there is a logical issue with timer events, as seen above. What other events can give issues if we declare timeout first, or what are the possible issues with deciding to run the event loop one last time, this I leave to those more intimate with the nmap suite.

So, really, this is not just a plus one issue. It's a question of granularity. There are multiple factors that will affect when each run of the loop will occur, and so, what will be read by the "get time" function, let it be the system timer granularity, who many threads are running and when does the CPU give us a time slice, etc. So, let it be getDelay() or getDelay()+1, or any other values, you can not tell when a timeout will occur. For the case you mention (in ProbeMode::start), that probably make no differences, as explained above, if the timeout is much greater than the "get time" granularity.

Regards,
Eric


On 22-Aug-2016 5:11 PM, Daniel Miller wrote:
Eric,

Thanks, this is a really great analysis of a difficult problem. I'm not sure what I think about the solution, though: changing the core of the Nsock loop like this seems like it would have side effects that could cause problems. By adding an extra loop iteration after the timeout, you are essentially breaking the contract to honor the timeout for other applications. I've CC'd Henri Doreau, who has a lot of Nsock experience, to see if he can comment.

I wonder if instead we could force something like a synchronous timed loop by having the timer event handler terminate the loop with nsock_loop_quit? So we set the loop timeout to something very large, like 10 minutes beyond o.getDelay(), and rely on the timer to end the loop so that there is only one end time; the timer event can't be missed, so it can't fire twice in one loop either.

On the other hand, maybe this is as simple as an off-by-one error: There are several different cases in ProbeMode::start() where nsock loops are started with timers, and they each use different values for the different timeouts! See below:

case TCP_CONNECT:
  event handler timeout: o.getDelay()+1
  nsock_loop timeout: o.getDelay()+1
case UDP_UNPRIV:
  event handler timeout: o.getDelay()
  nsock_loop timeout: o.getDelay()
case UDP, TCP, ICMP, ARP:
  event handler timeout: o.getDelay()
  nsock_loop timeout: o.getDelay()+1

I worry a bit about timing error propagation, too; if it takes slightly longer than the intended time for the event to fire, will the next packet be delayed by that much, too? What is the intent of the Nping program: interval delay or send delay?

Sorry to answer you with so many questions, but we need to think seriously about the intent and effects of changes here. I hope you will continue to be involved in the discussion, since you have spent so much time considering the problem.

Dan

_______________________________________________
Sent through the dev mailing list
https://nmap.org/mailman/listinfo/dev
Archived at http://seclists.org/nmap-dev/


Current thread: