tcpdump mailing list archives
[libpcap] tcpdump compiles complex expression to incorrect BPF code
From: Vadim Goncharov <vadim_nuclight () mail ru>
Date: Tue, 24 Aug 2010 10:16:53 +0000 (UTC)
Hi! This is a bug in libpcap 0.9.8 (confirmed for 1.0.0) initially reported at http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/144325 I tried to gather statistics on some packets based on signature in data payload, for plain traffic that was simple "tcpdump 'udp[20:4]=0x7fffffff'" (this works) but for PPTP things go complex and I was forced to write very complex expression. I've used cpp(1) for this. This has worked earlier in 0.9.4 days for me for another packets, I've just redone it for another bytes, file for cpp was roughtly the same. But this didn't match anything. After cutting down the full expression to only the most relevant part for my generated test IP-packets (see most parts below /* commented */), I was able to look at the tcpdump -d debug BPF assembly code output and identify that generated code was incorrect.
How-To-Repeat:
Here is the packet which I try to match, and tcpdump debug output. Note that if you uncomment all final expression's parts and do s/INNER_IS_UTP/INNER_IS_UDP/g, this will work for up to all UDP packets inside GRE (but without signatures, of course). So bugs start to happen when IS_TORRENT_UTP(INNER_UDP_OFFSET(ppp_hdr_len)) is included. $ uuencode bug100225.pcap < bug100225.pcap # test packet to catch begin 644 bug100225.pcap MU,.RH0(`!````````````/__````````W=B&2P98"P!:````6@````(```!% M``!6S'4``/PO;#!.C``,V1U>'3"!B`L`,L`````+IP``"^#_`P`A10``+CU* H```X$<?3;7N^'DZ,`WS-9P*:`!IH\@`!`@,$!08'"`D*"W____^K```` ` end $ cat tcpdump-gre-utp-cpp #define IPHDRLEN(firstbyte) ((ip[firstbyte]&0xf)<<2) #define GRESTART IPHDRLEN(0) /* Check that is GREv1 with seq num and proto set per RFC 2637 */ #define VALID_PPTP_GRE ((ip[GRESTART:4] & 0xff7fffff) = 0x3001880b) /* ACK is optional 4 bytes to previous 12 */ #define GRE_DATA_START (GRESTART + ((ip[GRESTART+1] & 0x80) >> 5) + 12) /* Actual IP byte values to find in the UDP payload of inner IP datagram */ #define IS_TORRENT_UTP(udp_hdr_start) (ip[(udp_hdr_start+20):4]=0x7fffffff) /* Check inner IP has UDP payload (proto 17) then calculate offset and pass it to UTP macro */ #define INNER_IS_UDP(ppp_hdr_len) (ip[GRE_DATA_START+ppp_hdr_len+9]=17) #define INNER_UDP_OFFSET(ppp_hdr_len) (GRE_DATA_START+ppp_hdr_len+IPHDRLEN(GRE_DATA_START+ppp_hdr_len)) #define INNER_IS_UTP(ppp_hdr_len) (INNER_IS_UDP(ppp_hdr_len) and IS_TORRENT_UTP(INNER_UDP_OFFSET(ppp_hdr_len))) /* * Finally, expression: sort by most frequent pattern first. * We check four possible PPP headers corresponding to IP, then * pass length of matched PPP header to checking macros. */ proto gre /*and VALID_PPTP_GRE*/ and ( /* ( (ip[GRE_DATA_START]=0x21) and INNER_IS_UTP(1) ) or ( (ip[GRE_DATA_START:2]=0xff03) and (ip[GRE_DATA_START+2]=0x21) and INNER_IS_UTP(3) ) or (*/ (ip[GRE_DATA_START:4]=0xff030021) and INNER_IS_UTP(4) /* ) or ( (ip[GRE_DATA_START:2]=0x0021) and INNER_IS_UTP(2) )*/ ) $ tcpdump -dni ng0 `cpp -P tcpdump-gre-utp-cpp` (000) ld [0] (001) jeq #0x2000000 jt 2 jf 73 (002) ldb [13] (003) jeq #0x2f jt 4 jf 73 (004) ldb [4] (005) and #0xf (006) lsh #2 (007) st M[3] (008) ldb [4] (009) and #0xf (010) lsh #2 (011) add #1 (012) tax (013) ldb [x + 4] (014) and #0x80 (015) rsh #5 (016) tax (017) ld M[3] (018) add x (019) add #12 (020) tax (021) ld [x + 4] (022) jeq #0xff030021 jt 23 jf 73 (023) ldb [4] (024) and #0xf (025) lsh #2 (026) st M[1] (027) ldb [4] (028) and #0xf (029) lsh #2 (030) add #1 (031) tax (032) ldb [x + 4] (033) and #0x80 (034) rsh #5 (035) tax (036) ld M[1] (037) add x (038) add #12 (039) add #4 (040) add #9 (041) tax (042) ldb [x + 4] (043) jeq #0x11 jt 44 jf 73 (044) ldb [4] (045) and #0xf (046) lsh #2 (047) add #1 (048) tax (049) ldb [4] (050) and #0xf (051) lsh #2 (052) st M[15] (053) ldb [x + 4] (054) and #0x80 (055) rsh #5 (056) tax (057) ld M[15] (058) add x (059) add #12 (060) add #4 (061) tax (062) ldb [x + 4] (063) and #0xf (064) lsh #2 (065) tax ; here is the BUG - if this and next line cut out, then (066) ld M[11] ; it will be correct... and M[11] is never stored above (067) add x (068) add #20 (069) tax (070) ld [x + 4] (071) jeq #0x7fffffff jt 72 jf 73 (072) ret #96 (073) ret #0
Fix:
No known. In some cases BPF code could be manually edited and installed to kernel, but not all programs support it (I need tcpdump). Also note that this is too complex due to one need to manually get IP headers length - slightly easier preprocessor works for me. If tcpdump's syntax supported things like ipdata[] or tcpdata[] (utilising BPF_MSH), it should be shorter and perhaps correct - I suspect that libpcap's code optimizer is buggy on long expressions.
Addition:
This rose from another earlier task: Windows clients were sometimes sending packets with their real addresses, not tunnel ones, inside of PPTP GRE. I've made a complex expression to match thses packets back in libpcap 0.9.4 days and it worked fine: $ cat tcpdump-gre-addr-cpp #define IPHDRLEN(firstbyte) ((ip[firstbyte]&0xf)<<2) #define GRESTART IPHDRLEN(0) /* Check that is GREv1 with seq num and proto set per RFC 2637 */ #define VALID_PPTP_GRE ((ip[GRESTART:4] & 0xff7fffff) = 0x3001880b) /* ACK is optional 4 bytes to previous 12 */ #define GRE_DATA_START (GRESTART + ((ip[GRESTART+1] & 0x80) >> 5) + 12) /* Actual IP subnet/Mask to find in the src IP of inner IP datagram */ #define SUBNET 0x52754000 /* 82.117.64.0 */ #define MASK 0xffffff00 /* 255.255.255.0 */ #define INNER_SRC_EQ_SUBNET(ppp_hdr_len) (ip[(GRE_DATA_START+ppp_hdr_len+12):4] & MASK = SUBNET) /* Torrent DHT UDP payload begins with "d1:?d2:id20:", we'll skip 4 bytes and check other 8 */ #define IS_TORRENT_DHT(udp_hdr_start) ((ip[(udp_hdr_start+12):4]=0x64323a69)/*and (ip[(udp_hdr_start+16):4]=0x6432303a)*/) /* Check inner IP has UDP payload (proto 17) then calculate offset and pass it to DHT macro */ #define INNER_IS_UDP(ppp_hdr_len) (ip[GRE_DATA_START+ppp_hdr_len+9]=17) #define INNER_UDP_OFFSET(ppp_hdr_len) (GRE_DATA_START+ppp_hdr_len+IPHDRLEN(GRE_DATA_START+ppp_hdr_len)) #define INNER_IS_DHT(ppp_hdr_len) (INNER_IS_UDP(ppp_hdr_len) and IS_TORRENT_DHT(INNER_UDP_OFFSET(ppp_hdr_len))) /* * Finally, expression: sort by most frequent pattern first. * We check four possible PPP headers corresponding to IP, then * pass length of matched PPP header to checking macros. */ /* proto gre and VALID_PPTP_GRE and*/ ( ( (ip[GRE_DATA_START]=0x21) and (INNER_SRC_EQ_SUBNET(1) or INNER_IS_DHT(1)) ) or ( (ip[GRE_DATA_START:2]=0xff03) and (ip[GRE_DATA_START+2]=0x21) and (INNER_SRC_EQ_SUBNET(3) or INNER_IS_DHT(3)) ) or ( (ip[GRE_DATA_START:4]=0xff030021) and (INNER_SRC_EQ_SUBNET(4) or INNER_IS_DHT(4)) ) or ( (ip[GRE_DATA_START:2]=0x0021) and (INNER_SRC_EQ_SUBNET(2) or INNER_IS_DHT(2)) ) ) Here it was invoked as tcpdump `cpp -P tcpdump-gre-utp-cpp` and worked. Note that line with VALID_PPTP_GRE macro is commented out: "or INNER_IS_DHT()" part was added later for another stats and entire expression now didn't fit to limit of 512 BPF opcodes, so I commented a little. Both variants produced correct BPF code: the variant quoted above and the one with proto gre and without "or INNER_IS_DHT()" parts in lines. Only when I tried to rework all these to third variant, matchin uTorrent uTP inside PPTP GRE, I've discovered libpcap bug about all this message is. -- WBR, Vadim Goncharov. ICQ#166852181 mailto:vadim_nuclight () mail ru [Moderator of RU.ANTI-ECOLOGY][FreeBSD][http://antigreen.org][LJ:/nuclight] - This is the tcpdump-workers list. Visit https://cod.sandelman.ca/ to unsubscribe.
Current thread:
- [libpcap] tcpdump compiles complex expression to incorrect BPF code Vadim Goncharov (Aug 24)