Wireshark mailing list archives

Re: filter for ONLY initial get request


From: "Thierry Emmanuel" <Emmanuel.Thierry () technicolor com>
Date: Wed, 11 Aug 2010 15:35:18 +0200



-----Original Message-----
From: wireshark-users-bounces () wireshark org [mailto:wireshark-users-bounces () wireshark org] On Behalf Of Jeffs
Sent: mercredi 11 août 2010 15:07
To: Community support list for Wireshark
Subject: Re: [Wireshark-users] filter for ONLY initial get request


This formula, however, only returns results minus the links and images 
embedded in the web page:

tshark -r test.cap -T fields -e http.host  | sed 's/?.*$//' | sed -n 
'/www./p'  | sort | uniq -c | sort -rn | head -n 100

15 www.propertyshark.com
      8 www.nytimes.com
      2 www.google-analytics.com
      1 www.facebook.com


However, I am new to regex so I'm sure I may be missing  something or 
losing some links.



It is a common mistake to consider that every websites have their main
address on a "www" subdomain. If you want a generic filter, you cannot
rely on it. If you want a relevant result, you'll have to build a
non-restrictive regexp and manually filter unappropriate results,
eventually making some rules to exclude well-known advertising sites.

A fully automatic solution would be to parse the data checking it is
a well-formed html (or xml or plain-text) document. This will purge
videos and images from your results.

___________________________________________________________________________
Sent via:    Wireshark-users mailing list <wireshark-users () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-users
Unsubscribe: https://wireshark.org/mailman/options/wireshark-users
             mailto:wireshark-users-request () wireshark org?subject=unsubscribe


Current thread: