Wireshark mailing list archives

Re: Get all the duplicate packets


From: Jim Young <jyoung () gsu edu>
Date: Thu, 20 Sep 2012 03:38:02 +0000

Hello Boaz,

On 9/19/12 4:19 PM, "Boaz Galil" <boaz20 () gmail com> wrote:
Editcap -d will remove all the duplicates! I actually want
to find all the duplicate packets....
<snip>

Assuming that you are looking at frame level duplicates
there's a couple ways of determining which frames may
be duplicates.  This involves displaying the MD5 hash
for each frame.   

NOTE: The MD5 hash technique described below will NOT work
for L3 level duplicates one might see on a one-armed router
interface where each packet might be seen twice, ingressing
on one vlan tag and egressing on another.

NOTE2: Using the MD5 hash technique, its possible (though
very unlikely) to have a false postive where two unrelated
packets generate the exact same MD5 hash value.

Assuming that the Wireshark preference frame.generate_md5_hash
is TRUE (see note towards bottom of this message on how to
check and set) then the following tshark command line can be
used to generate a potentially large display filter for
duplicate packets. 

echo "+++The filter ..."
echo $(tshark -r MYFILE.PCAP -Tfields -e frame.md5_hash \
  | sort \
  | uniq -c \
  | sort -n -r \
  | grep -v ' 1 ' \
  | awk 'BEGIN {printf "frame.number==0"} \
    {printf "||frame.md5_hash=="$2} END {print ""}')

The constructed display filter starts with the do-nothing
clause "frame.number==0" so that the first "||" has something
to its left.

The command line above can be augmented to actually display the
duplicate frames by invoking tshark twice.

echo "+++The command to display the duplicates..."
tshark -r MYFILE.PCAP \
  $(tshark -r MYFILE.PCAP -Tfields -e frame.md5_hash \
  | sort \
  | uniq -c \
  | sort -n -r \
  | grep -v ' 1 ' \
  | awk 'BEGIN {printf "frame.number==0"} \
    {printf "||frame.md5_hash=="$2} END {print ""}')

Or the command line can be modified to save the duplicates
to a new pcap file (MYDUPLICATES.PCAP).  Again this involves
invoking tshark twice.

echo "+++The command to save the duplicates..."
tshark -w MYDUPLICATES.PCAP -r MYFILE.PCAP \
  $(tshark -r MYFILE.PCAP -Tfields -e frame.md5_hash \
  | sort \
  | uniq -c \
  | sort -n -r \
  | grep -v ' 1 ' \
  | awk 'BEGIN {printf "frame.number==0"} \
    {printf "||frame.md5_hash=="$2} END {print ""}')

FWIW:  In addition to using tshark, you can also use editcap to
display the frame MD5 hashes for each frame.

Here's an MD5 hash example using editcap:

$ editcap -v -D 0 MYFILE.PCAP /dev/null
File MYFILE.pcap is a Wireshark - pcapng capture file.
Packet: 1, Len: 42, MD5 Hash: dc69bb2da069731e40367bed2cb44d56
Packet: 2, Len: 42, MD5 Hash: dc69bb2da069731e40367bed2cb44d56
Packet: 3, Len: 42, MD5 Hash: dc69bb2da069731e40367bed2cb44d56
<snip>

And here's the MD5 example using tshark:

$ tshark -o -o frame.generate_md5_hash:TRUE -r MYFILE.PCAP -Tfields -e
frame.md5_hash
dc69bb2da069731e40367bed2cb44d56
dc69bb2da069731e40367bed2cb44d56
dc69bb2da069731e40367bed2cb44d56
<snip>

While the tshark report is cleaner (1 column versus 7), both
the editcap and tshark output can then be post processed to
extract the same counts of any duplicate MD5 hashes.

I believe using editcap to generate the MD5 hashes is faster
than using tshark if processing large trace files.

The examples below illustrate how both editcap and
tshark can be used to generate virtually identical list
of any duplicate MD5 hashes.

#1 - Using editcap:

$ editcap -v -D 0 MYFILE.PCAP /dev/null \
  | grep Hash: \
  | awk '{ print $7 }' \
  | sort \
  | uniq -c \
  | grep -v ' 1 ' \
File MYFILE.pcap is a Wireshark - pcapng capture file.
  18 198e273fe9792cbf54919701db49b9cf
  12 1e848f674c60a07d23f7104b8a205a1c
   4 28c92df42bbf9c94a93560a5fb3decf0
   2 3aabbf2969b96da88ee9b5937345eb75
   6 636c43db7e87aa86c0afaf479ded30cf
   4 67a1a4f23bf565d2ab946955a0dc4b70
   3 6e30d01d335343eed4dca273d95d6347
  24 8d7780d026fb1d883717a6957abf2476
  12 92063b2f67c0246413959046bf455c26
   3 dc69bb2da069731e40367bed2cb44d56
   2 e7177c946c4638b72fc62fe05bc5e30a
   9 fdaf0bcb2fe45420232fdd990c4fa655
$

#2 - Using tshark:

$ tshark -r MYFILE.PCAP -Tfields -e frame.md5_hash \
  | sort \
  | uniq -c \
  | sort -n -r \
  | grep -v ' 1 '
  18 198e273fe9792cbf54919701db49b9cf
  12 1e848f674c60a07d23f7104b8a205a1c
   4 28c92df42bbf9c94a93560a5fb3decf0
   2 3aabbf2969b96da88ee9b5937345eb75
   6 636c43db7e87aa86c0afaf479ded30cf
   4 67a1a4f23bf565d2ab946955a0dc4b70
   3 6e30d01d335343eed4dca273d95d6347
  24 8d7780d026fb1d883717a6957abf2476
  12 92063b2f67c0246413959046bf455c26
   3 dc69bb2da069731e40367bed2cb44d56
   2 e7177c946c4638b72fc62fe05bc5e30a
   9 fdaf0bcb2fe45420232fdd990c4fa655
$

NOTE:  For the tshark MD5 hash pipelines to work the
Wireshark preference "frame.generate_md5_hash" must be
enabled.  You can easily determine if the frame.generate_md5_hash
preference is enabled using the following tshark pipeline:

$ tshark -G currentprefs | grep frame.generate_md5_hash
frame.generate_md5_hash: TRUE
$

If MD5 hashes are disabled (which I believe is the default)
then it can be manually enabled on the tshark command line
using tshark's -o option: -o frame.generate_md5_hash:TRUE

That would make the tshark command line that saved the
packets to a new file look like:

tshark -o frame.generate_md5_hash:TRUE \
  -w MYDUPLICATES.PCAP -r MYFILE.PCAP \
  $(tshark -o frame.generate_md5_hash:TRUE \
    -r MYFILE.PCAP -Tfields -e frame.md5_hash \
<snip>

But its probably easier to just permanently enable MD5 hashes
within Wireshark's preference file so that you don't have to
remember to use the tshark -o frame.generate_md5_hash:TRUE
option.

Hope this helps,

Jim Y. 


___________________________________________________________________________
Sent via:    Wireshark-users mailing list <wireshark-users () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-users
Unsubscribe: https://wireshark.org/mailman/options/wireshark-users
             mailto:wireshark-users-request () wireshark org?subject=unsubscribe


Current thread: