Wireshark mailing list archives

Re: [semi-OT] request second opinion on possible bugs in OS TCP window and SACK implementation


From: Alan Tu <8libra () gmail com>
Date: Mon, 17 Jan 2011 05:27:01 +0000

Sake, I have additional evidence supporting the theory that no
follow-on ACKs were received by the server. Using the TCP timestamp
options data. There is no clean display filter for TCP timestamps, so
I output in PDML format, as in

tshark -r sack_fail.pcap -T pdml

and search on the string tcp.options.time_stamp.

TCP timestamps are defined in RFC1323. Conceptually, each packet
contains a timestamp (TSval), and the timestamp of the packet being
acknowledged (TSecr). The following is from RFC1323:

"(1)  The connection state is augmented with two 32-bit slots:
TS.Recent holds a timestamp to be echoed in TSecr whenever a segment
is sent, and Last.ACK.sent holds the ACK field from the last segment
sent.  Last.ACK.sent will equal RCV.NXT except when ACKs have been
delayed.

(2)  If Last.ACK.sent falls within the range of sequence numbers of an
incoming segment:
SEG.SEQ <= Last.ACK.sent < SEG.SEQ + SEG.LEN
then the TSval from the segment is copied to TS.Recent; otherwise, the
TSval is ignored.

(3)  When a TSopt is sent, its TSecr field is set to the current
TS.Recent value."

For the difficult connection, the client correctly increments its
TSval value. If the ACKs from the client were being received by the
server, the TSecr sent by the server would be incremented per #2
above. However, the server's TSecr consistently refers to the TSval of
the packet containing the HTTP GET request. Therefore it stands to
reason the server is not receiving any additional ACK.

So, the remaining mystery is why are ACKs from the client not making
it to the server, especially when the initial three way handshake and
HTTP GET request are successfully received, and other hosts behind the
same public IP have no connection issue? For what its worth, the other
host on the same LAN is not using the TCP timestamp option at all,
otherwise I cannot see any obvious difference.

I hope the information and techniques described in this thread is
enlightening to some.

Alan


On 1/17/11, Alan Tu <8libra () gmail com> wrote:
Sake, Thanks for your analysis. It helps a lot. I knew I needed a
second opinion. I'll start liking my Nokia phone again. Well, maybe
just a little.

For everyone else, the original PCAP (Nokia phone) is on Cloudshark at
http://www.cloudshark.org/captures/3cc0916bb5be

I agree figuring out why this site (not a fly by night site) always
times out and retransmits (I have other samples) is interesting. I
just didn't focus on that because (1) dropped packets is part of the
TCP model, and (2) I overlooked the idea that my packets may be
selectively but systematically filtered. Clearly, the initial SYN and
HTTP request packets are getting to the server. Also, Windows and this
site communicate fine, using the same shared public IP (PCAP at
http://www.cloudshark.org/captures/f06cb43fec83
). What criteria might an intermediate host be using to distinguish my
Nokia-originated packets and my PC-originated packets?

Back to the stack issue. Also from the RFC 2018: "the bytes just below
the block, (Left Edge of Block - 1), and just above the block, (Right
Edge of Block), have not been received". So SACK'ing 1-1448 while
ACK'ing 2896 in the same packet (frame 10) seems unreasonable.

I do concur that it appears somehow none of the later ACKs are making it.
Weird.

Thanks for your help using your wisdom to slightly untangle this.

Alan


On 1/16/11, Sake Blok <sake () euronet nl> wrote:
On 16 jan 2011, at 17:00, Alan Tu wrote:

If after reading this and you're interested in
helping, please e-mail me individually and I'll reply with the PCAP.

Thanks for sending the PCAP.

[...]
This is my assessment of what is going on:
Frame 1-3: three way handshake, normal
Frame 4: client sends HTTP GET request, normal
Frame 5: server ACK frame 4, normal
Frame 6: server sends payload segment 1 (PS1), normal
Frame 7: server sends PS2, normal
Frame 8: client ACK PS2, normal

For some reason, frame 8 is not received or processed by the server
(this is a mystery, but not discussed here.)

IMHO, this mystery is what needs to be investigated, as this is the cause
of
the problem. Here follows my analysis which backs up that statement :-)

Frame 9: server resends PS1, normal
***Frame 10: client receives frame 9, a duplicate of frame 6. Client
ACK frame 7, but sends a SACK with the segment from frame 6.

This is clearly incorrect behavior, ref the SACK RFC, RFC2018. The
client is treating frame 9 as an out of order packet and jumping into
SACK mode, but frame 9 is merely a duplicate or retransmit. Frame 9
falls outside the client's receive window (updated after frame 7) and
should discard it, but doesn't. My theory is that the client (Symbian
OS TCP stack) is not doing a bounds check on its TCP receive window.

According to the RFC:

"If the data receiver generates SACK
   options under any circumstance, it SHOULD generate them under all
   permitted circumstances."

So it is obligated to use the SACK option when ACKing the retransmission.

Also from the RFC:

"The first SACK block (i.e., the one immediately following the
      kind and length fields in the option) MUST specify the contiguous
      block of data containing the segment which triggered this ACK,
      unless that segment advanced the Acknowledgment Number field in
      the header.  This assures that the ACK with the SACK option
      reflects the most recent change in the data receiver's buffer
      queue."

This means it has to SACK the block that has just been received and it
does.

Frame 11: The server TCP stack has received an invalid SACK and is now
confused. It retransmits PS1. This is semantically incorrect because
the client actually indicates it has received PS1.

*If* the server TCP received the ACK with SACK. But I don't think it did.
If
you use the filter "tcp.srcport==80", you can see clearly that it keeps
retransmitting the same segment with an increasing retransmission
timeout.
This is the behavior of a system that does *not* receive any ACKs.

Frame 12: client retransmits frame 10
Frame 13: server retransmits PS1
Frame 14: client retransmits frame 10
Frame 15: server retransmits PS1
Frame 16: client retransmits frame 10

Actually, the client keeps ACKing the received frame, hoping to reach the
server and make it send new data.

Frame 17: server sends PS3, normal

Somehow, this "breaks the spell", for the moment.

The somehow could be explained by a Keep-Alive timer on the server. As
can
be seen in the HTTP data, both the client and the server want to use
Keep-Alive, so neither of the two should close the connection until a
timeout has been reached or the maximum configured objects have been
served
over the same TCP connection.

Since the server waited on an ACK after sending two full frames, it will
not
send all the data at once without waiting on ACKs. So the fact that it
sends
all the data at once and the fact that is closes the connection with a
FIN
tells me it is flusing its send buffer after the http daemon has told it
to
close the connection due to a 15 sec idle timeout.

Frame 18: server sends PS4, normal
Frame 19: server sends PS5, normal
Frame 20: server sends PS6, normal
Frame 21: server sends PS7, normal
Frame 22: server sends PS8, normal
Frame 23: server sends PS9, normal
Frame 24: server sends PS10, normal
Frame 25: server sends PS12 with FIN, received out of order
Frame 26: server sends PS11, received out of order
Frame 27: client ACK PS4, normal
Frame 28: client ACK PS6, normal
Frame 29: client ACK PS8, normal
Frame 30: client ACK PS10, normal
Frame 31: client ACK PS10, but sends a SACK saying it has received PS12,
normal
Frame 32: client ACK PS12, normal
Frame 33: client sends FIN/ACK, acknowledging server's FIN from frame
25,
normal

At this point, the client is expecting an ACK to its own FIN.

Frame 34: for some reason, the client does not receive an ACK to its
FIN, so two seconds later it retransmits a FIN/ACK, normal

Well, since the server does not seem to receive the packets of the
client,
it will never respond to these FINs.

***Frame 35: server resends PS1, 2.7 seconds after the client sends
the first FIN in frame 33

Why oh why does the server (unknown OS) do this? the SACK storm from
earlier seemed to have broken, the client has acknowledged all the
later payload segments, the server has sent its FIN, and the client
has sent its FIN (twice.) PS1 should be out of the server's TCP send
window anyway.

It should... *IF* it ever received an ACK from the client.

So the main question is... why do the packets from the client never reach
the server? Or do they reach the server in a transformed state and get
discarded by the server?

Hope this helps,
Cheers,


Sake

___________________________________________________________________________
Sent via:    Wireshark-users mailing list <wireshark-users () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-users
Unsubscribe: https://wireshark.org/mailman/options/wireshark-users

mailto:wireshark-users-request () wireshark org?subject=unsubscribe


___________________________________________________________________________
Sent via:    Wireshark-users mailing list <wireshark-users () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-users
Unsubscribe: https://wireshark.org/mailman/options/wireshark-users
             mailto:wireshark-users-request () wireshark org?subject=unsubscribe


Current thread: