Nmap Development mailing list archives

Re: Bug with nping sequence number

From: Éric Hoffman <ehoffman () videotron ca>
Date: Tue, 23 Aug 2016 22:41:30 -0400

Hello Daniel,

Well, as I said the patch did fix the issue with nping, however I dohave to confess that I honestly don't know much about the whole nmapsuite and which modules uses the modified function, and as such, whatcould be affected (or broken) by the fix.

Indeed, I started with the idea that this would be a trivial fix, butrealized that this is really a race condition/design type of problem.

I did not like the idea either of having this possible extra loop. Itmay be possible to rearrange the "fix" to avoid this. I did not reallyspent much time around finding the most elegant fix, however, the heartof the issue really is there. It's kind of chicken and the eggproblem. If you set a timer event at, let's say, 100ms, and timeoutfrom loop after 101ms, then if you check for timeout first, you may missthe timer trigger (which should logically always trigger, i.e. we shouldlogically never see a timeout) since you may end up running the loop attime=95ms, followed by another run at time=105ms. However, if youalways run the loop one last time, you may trigger some event whichshould not have been (for example, if a timer was set to 102ms whiletimeout was set to 98ms, with the above loop timing). This is prettymuch the core of the event loop, and it affect not only created timers,but any other events.

However, in the above example, with the loop called @ time=95ms, and @time=105ms, if instead of a timer event, you had set (for example) akeyboard event (simple key press), and you wish to timeout after 100ms,then in any case there is really no way to know at which time exactlythe event occurred. If you read no key press @ time=95ms, and you reada key press @ time=105ms, what do we declare? Do we say that the keypressed "may" have arrived in time, and risking false positive? Or dowe say that it did not, and risk false negative? Really you can nottell. For a key press event, or socket event, or many other events forthat matter, the decision you take is not important. This is notimportant as long as having a false positive or false negative does notmatter (when talking about a few ms over a few seconds for example).However, the nping issue does demonstrate that there are cases in whichthe decision you take will contradict expectations. The current codebehavior can create false negative (timeout occurred), while setting atimer event and a timeout in the way it's done in nping DOES NOT (from alogical perspective) tolerate false negative. Applying the "fix" I didwill change the behavior of the loop in which false positive may happen(event occurred before timeout).

So, there is a logical issue with timer events, as seen above. Whatother events can give issues if we declare timeout first, or what arethe possible issues with deciding to run the event loop one last time,this I leave to those more intimate with the nmap suite.

So, really, this is not just a plus one issue. It's a question ofgranularity. There are multiple factors that will affect when each runof the loop will occur, and so, what will be read by the "get time"function, let it be the system timer granularity, who many threads arerunning and when does the CPU give us a time slice, etc. So, let it begetDelay() or getDelay()+1, or any other values, you can not tell when atimeout will occur. For the case you mention (in ProbeMode::start),that probably make no differences, as explained above, if the timeout ismuch greater than the "get time" granularity.


Regards,
Eric


On 22-Aug-2016 5:11 PM, Daniel Miller wrote:

Eric,
Thanks, this is a really great analysis of a difficult problem. I'mnot sure what I think about the solution, though: changing the core ofthe Nsock loop like this seems like it would have side effects thatcould cause problems. By adding an extra loop iteration after thetimeout, you are essentially breaking the contract to honor thetimeout for other applications. I've CC'd Henri Doreau, who has a lotof Nsock experience, to see if he can comment.
I wonder if instead we could force something like a synchronous timedloop by having the timer event handler terminate the loop withnsock_loop_quit? So we set the loop timeout to something very large,like 10 minutes beyond o.getDelay(), and rely on the timer to end theloop so that there is only one end time; the timer event can't bemissed, so it can't fire twice in one loop either.
On the other hand, maybe this is as simple as an off-by-one error:There are several different cases in ProbeMode::start() where nsockloops are started with timers, and they each use different values forthe different timeouts! See below:
case TCP_CONNECT:
  event handler timeout: o.getDelay()+1
  nsock_loop timeout: o.getDelay()+1
case UDP_UNPRIV:
  event handler timeout: o.getDelay()
  nsock_loop timeout: o.getDelay()
case UDP, TCP, ICMP, ARP:
  event handler timeout: o.getDelay()
  nsock_loop timeout: o.getDelay()+1
I worry a bit about timing error propagation, too; if it takesslightly longer than the intended time for the event to fire, will thenext packet be delayed by that much, too? What is the intent of theNping program: interval delay or send delay?
Sorry to answer you with so many questions, but we need to thinkseriously about the intent and effects of changes here. I hope youwill continue to be involved in the discussion, since you have spentso much time considering the problem.
Dan


_______________________________________________
Sent through the dev mailing list
https://nmap.org/mailman/listinfo/dev
Archived at http://seclists.org/nmap-dev/

Current thread:

Bug with nping sequence number Éric Hoffman (Jul 27)
- Re: Bug with nping sequence number Daniel Miller (Aug 22)
- Message not available
  - Re: Bug with nping sequence number Éric Hoffman (Aug 24)