Nmap Development mailing list archives

Re: Bug with nping sequence number

From: Daniel Miller <bonsaiviking () gmail com>
Date: Mon, 22 Aug 2016 16:11:31 -0500

Eric,

Thanks, this is a really great analysis of a difficult problem. I'm not
sure what I think about the solution, though: changing the core of the
Nsock loop like this seems like it would have side effects that could cause
problems. By adding an extra loop iteration after the timeout, you are
essentially breaking the contract to honor the timeout for other
applications. I've CC'd Henri Doreau, who has a lot of Nsock experience, to
see if he can comment.

I wonder if instead we could force something like a synchronous timed loop
by having the timer event handler terminate the loop with nsock_loop_quit?
So we set the loop timeout to something very large, like 10 minutes beyond
o.getDelay(), and rely on the timer to end the loop so that there is only
one end time; the timer event can't be missed, so it can't fire twice in
one loop either.

On the other hand, maybe this is as simple as an off-by-one error: There
are several different cases in ProbeMode::start() where nsock loops are
started with timers, and they each use different values for the different
timeouts! See below:

case TCP_CONNECT:
  event handler timeout: o.getDelay()+1
  nsock_loop timeout: o.getDelay()+1
case UDP_UNPRIV:
  event handler timeout: o.getDelay()
  nsock_loop timeout: o.getDelay()
case UDP, TCP, ICMP, ARP:
  event handler timeout: o.getDelay()
  nsock_loop timeout: o.getDelay()+1

I worry a bit about timing error propagation, too; if it takes slightly
longer than the intended time for the event to fire, will the next packet
be delayed by that much, too? What is the intent of the Nping program:
interval delay or send delay?

Sorry to answer you with so many questions, but we need to think seriously
about the intent and effects of changes here. I hope you will continue to
be involved in the discussion, since you have spent so much time
considering the problem.

Dan

On Wed, Jul 27, 2016 at 11:55 PM, Éric Hoffman <ehoffman () videotron ca>
wrote:

Hello

This have been brought to this mailing list before, by another person, but
no answer was given.

I actually stumbled upon the same issue with nping, that is, the sequence
number is non-linear.  Ex:

$ sudo nping --icmp 192.168.200.75 -c 10 --delay 100ms

Starting Nping 0.7.12 ( https://nmap.org/nping ) at 2016-07-27 23:36 EDT
SENT (0.0164s) ICMP [192.168.200.20 > 192.168.200.75 Echo request
(type=8/code=0) id=42883 seq=1] IP [ttl=64 id=28064 iplen=28 ]
RCVD (0.0179s) ICMP [192.168.200.75 > 192.168.200.20 Echo reply
(type=0/code=0) id=42883 seq=1] IP [ttl=64 id=34536 iplen=28 ]
SENT (0.1174s) ICMP [192.168.200.20 > 192.168.200.75 Echo request
(type=8/code=0) id=42883 seq=2] IP [ttl=64 id=28064 iplen=28 ]
RCVD (0.1190s) ICMP [192.168.200.75 > 192.168.200.20 Echo reply
(type=0/code=0) id=42883 seq=2] IP [ttl=64 id=34539 iplen=28 ]
SENT (0.2184s) ICMP [192.168.200.20 > 192.168.200.75 Echo request
(type=8/code=0) id=42883 seq=3] IP [ttl=64 id=28064 iplen=28 ]
RCVD (0.2199s) ICMP [192.168.200.75 > 192.168.200.20 Echo reply
(type=0/code=0) id=42883 seq=3] IP [ttl=64 id=34541 iplen=28 ]
SENT (0.3190s) ICMP [192.168.200.20 > 192.168.200.75 Echo request
(type=8/code=0) id=42883 seq=5] IP [ttl=64 id=28064 iplen=28 ]
RCVD (0.3203s) ICMP [192.168.200.75 > 192.168.200.20 Echo reply
(type=0/code=0) id=42883 seq=5] IP [ttl=64 id=34546 iplen=28 ]
SENT (0.4194s) ICMP [192.168.200.20 > 192.168.200.75 Echo request
(type=8/code=0) id=42883 seq=5] IP [ttl=64 id=28064 iplen=28 ]
RCVD (0.4209s) ICMP [192.168.200.75 > 192.168.200.20 Echo reply
(type=0/code=0) id=42883 seq=5] IP [ttl=64 id=34553 iplen=28 ]
SENT (0.5214s) ICMP [192.168.200.20 > 192.168.200.75 Echo request
(type=8/code=0) id=42883 seq=6] IP [ttl=64 id=28064 iplen=28 ]
RCVD (0.5227s) ICMP [192.168.200.75 > 192.168.200.20 Echo reply
(type=0/code=0) id=42883 seq=6] IP [ttl=64 id=34556 iplen=28 ]
SENT (0.6224s) ICMP [192.168.200.20 > 192.168.200.75 Echo request
(type=8/code=0) id=42883 seq=7] IP [ttl=64 id=28064 iplen=28 ]
RCVD (0.6238s) ICMP [192.168.200.75 > 192.168.200.20 Echo reply
(type=0/code=0) id=42883 seq=7] IP [ttl=64 id=34560 iplen=28 ]
SENT (0.7231s) ICMP [192.168.200.20 > 192.168.200.75 Echo request
(type=8/code=0) id=42883 seq=9] IP [ttl=64 id=28064 iplen=28 ]
RCVD (0.7246s) ICMP [192.168.200.75 > 192.168.200.20 Echo reply
(type=0/code=0) id=42883 seq=9] IP [ttl=64 id=34570 iplen=28 ]
SENT (0.8234s) ICMP [192.168.200.20 > 192.168.200.75 Echo request
(type=8/code=0) id=42883 seq=9] IP [ttl=64 id=28064 iplen=28 ]
RCVD (0.8250s) ICMP [192.168.200.75 > 192.168.200.20 Echo reply
(type=0/code=0) id=42883 seq=9] IP [ttl=64 id=34578 iplen=28 ]
SENT (0.9243s) ICMP [192.168.200.20 > 192.168.200.75 Echo request
(type=8/code=0) id=42883 seq=10] IP [ttl=64 id=28064 iplen=28 ]
RCVD (0.9254s) ICMP [192.168.200.75 > 192.168.200.20 Echo reply
(type=0/code=0) id=42883 seq=10] IP [ttl=64 id=34579 iplen=28 ]

Max rtt: 1.196ms | Min rtt: 0.928ms | Avg rtt: 1.063ms
Raw packets sent: 10 (280B) | Rcvd: 10 (460B) | Lost: 0 (0.00%)
Nping done: 1 IP address pinged in 0.94 seconds

Look at the sequence number, it jumps from 3 to 5, and from 7 to 9, with
the same sequence number repeated twice.  I have found out this occur more
likely is you set delay under 100ms, and less likely with 1-second delay.

I actually looked in the code and found the issue!  The way nping work, in
the ProbeMode::start() function (nping/ProbeMode.cc), is that it fill a
packet buffer (with the sequence number), then create a timer (which call a
handler to send the packet at expiration), and call the event dispatch loop
with a timeout 1ms later.  So in theory, during a nping loop, let's suppose
the delay is set to 100ms, then the packet is created and scheduled to be
sent 100ms later.  Then the event loop is started with a 101ms timeout (1
ms more than the timer trigger time).  The reasoning behind this is that
the timer is bound to always be triggered before the timeout. However, it
is not the case, as explained bellow.

In the nsock_loop() function (nsock/src/nsock_core.c), used to loop
through events until timeout occurs, the function does basically this:
1 - Check if quit signal is set, and return if so.
2 - Check if there are still pending events, if not, return.
3 - Compute remaining time left before timeout, and if timeout has
occurred, return.
4 - Call the engine loop, which is responsible to dispatch all events
(including our timer), with the remaining time computed at point 3.
5 - Update the current time-of-day, used to compute timeout.

The issue is that there is a race condition.  It is well possible that in
one loop, there are still a few milliseconds left for the timer to
trigger.  So, at points 3 and 4, nothing happens.  And, at point 5, is it
possible that the new time of day put us past the expiration timeout.  If
(when) this occurs, the point 3 above compute the new timeout, and return.
Out timer does not get triggered (although it is still pending in the event
pool).  This behavior occurs because either of two reasons.  It can be
because the system is busy and take more than 1ms to do a loop (which is
foreseeable on a serial tty for example, where just the fact of printing
the SEND/RECV packet traces can take a few ms), or (and this is the most
likely culprit) the system time returned by gettimeofday() is not
contiguous.  In other word, it does not increment by 1ms (nor anywhere the
precision of the timespec type). The granularity of the gettimeofday()
function is system-dependent, and depends upon such things as the interrupt
rate, if there are high precision performance counter, etc.  Typical values
seen can be 10~20ms steps).  All of this to say that when this occurs, the
nsock_loop() function return, and the main ping function generate a new
packet, and setup a new event.

The next time the nsock_loop() function is called, then the last timer
does trigger immediately.  Then (possibly before the timeout) the new timer
does trigger too.  So, the correct *number* of ICMP packets do get sent.
However, emphasis on the correct *number* of ICMP packets is made because
the ICMP packet content itself is not. In fact, in the ProbeMode::start()
function (nping/ProbeMode.cc), which generate the actual ICMP packets
content, although there are 1024 sendpkt_t entries, there is actually only
one 64KB packet buffer that get assigned and reused for all packets (u8
pkt[MAX_IP_PACKET_LEN]).  And this buffer is assigned to each output
packets.  So, when the timeout issue occurs, and a new packet is generated,
the next time nsock_loop() is called, the last packet (that was support to
be sent during the last loop) data has been overwritten.

So that is the reason why we see duplicate and non-linear sequence number.

To fix this, either we have to fix the ProbeMode::start() function, taking
into account that getting a timeout is a possibility, but this create the
burden of holding more 64KB buffers.  Clearly we don't wish to also have
1024 of those buffers (64MB).  Not really embedded-system friendly.  The
other way to fix it is to make the nsock_loop() give one last chance to
execute the event loop engine when timeout is detected.  I implemented the
second option and it does seem to fix the problem.

--- nmap-master-old/nsock/src/nsock_core.c      2016-07-27
16:35:00.000000000 -0400
+++ nmap-master-new/nsock/src/nsock_core.c      2016-07-28
00:47:52.403081022 -0400
@@ -899,6 +899,7 @@ enum nsock_loopstatus nsock_loop(nsock_p
   int msecs_left;
   unsigned long loopnum = 0;
   enum nsock_loopstatus quitstatus = NSOCK_LOOP_ERROR;
+  int last_try = 0;

   gettimeofday(&nsock_tod, NULL);

@@ -934,8 +935,12 @@ enum nsock_loopstatus nsock_loop(nsock_p
     if (msec_timeout >= 0) {
       msecs_left = MAX(0, TIMEVAL_MSEC_SUBTRACT(loop_timeout,
nsock_tod));
       if (msecs_left == 0 && loopnum > 0) {
-        quitstatus = NSOCK_LOOP_TIMEOUT;
-        break;
+        if (!last_try) {
+          last_try = 1;
+        } else {
+          quitstatus = NSOCK_LOOP_TIMEOUT;
+          break;
+        }
       }
     }

Regards,
Eric

_______________________________________________
Sent through the dev mailing list
https://nmap.org/mailman/listinfo/dev
Archived at http://seclists.org/nmap-dev/

_______________________________________________
Sent through the dev mailing list
https://nmap.org/mailman/listinfo/dev
Archived at http://seclists.org/nmap-dev/

Current thread:

Bug with nping sequence number Éric Hoffman (Jul 27)
- Re: Bug with nping sequence number Daniel Miller (Aug 22)
- Message not available
  - Re: Bug with nping sequence number Éric Hoffman (Aug 24)