tcpdump mailing list archives

Regarding tcpdump pull request #614


From: alice-cyberreboot <alice () cyberreboot org>
Date: Tue, 18 Jul 2017 19:44:29 +0000

Hi everyone!

I’m writing regarding a pull request I submitted (#614).

My workgroup is currently working on a project utilizing machine-learning and software-defined networking to detect and 
respond to malicious network activity. We are currently focused on internal Ethernet traffic, and one of our big 
challenges is capturing enough (network) data to sufficiently train our models. We are working with a number of 
organizations that wish to share data but want some basic levels of sanitization. Lack of modern, internal and benign 
traffic is a challenge for data science teams.

In order to better facilitate data sharing between collaborating organizations, we attempted to address some common 
privacy/sensitivity issues by expanding tcpdump to create the following options:

-          Strip out the packet payload after TCP/UDP headers; and

-          Mask external IP addresses (i.e., those not included in the RFC 5735 reserved netblocks).

We have been using our modifications internally and they appear to be stable. Our initial testing using machine 
learning based on this approach was pretty successful, and we would like to open up our research to collaboration with 
other entities. Tcpdump is so common in our circles that when we suggested enhancing it everyone we work with agreed it 
was a great option. Our proposed modification performs the above operations when writing to a savefile. The two flags 
that I’ve added were:

-          -0 to zero out packet data after TCP/UDP headers

-          -00 to truncate the packet data entirely (this saves space for large packet captures)

-          -* [mask_ip] to mask external IP addresses with a user-specified IP.

In our enhancements these flags are available both when reading from an existing pcap file and when performing a live 
capture. The caveats are, this currently works solely for the Ethernet link layer (the scope of our project), the IPv6 
protocol has not yet been supported, and it does not work when printing to screen (although the user will be warned at 
the outset). However, my workgroup would love to open this up to the rest of the open source community to facilitate 
broader information sharing and make network collections more accessible to data scientists.

If there are other enhancements that might be helpful toward this topic, please let me know!


Thanks,
Alice
(@lilchurro on github)

P.S. If folks are curious, we have published some of our work, including:
https://blog.cyberreboot.org/deep-session-learning-for-cyber-security-e7c0f6804b81


--
🙋 Alice Chang
👾 Cyber Reboot Software Engineer @ In-Q-Tel




"This e-mail, and any attachments hereto, may contain information that is privileged, proprietary, confidential and/or 
exempt from disclosure under law and are intended only for the designated addressee(s). If you are not the intended 
recipient of this message, or a person authorized to receive it on behalf of the intended recipient, you are hereby 
notified that you must not use, disseminate, copy in any form, or take any action based upon the email or information 
contained therein. If you have received this email in error, please permanently and immediately delete it and any 
copies of it, including any attachments, and promptly notify the sender at In-Q-Tel by reply e-mail, fax: 703-248-3001, 
or phone: 703-248-3000. Thank you for your cooperation."
_______________________________________________
tcpdump-workers mailing list
tcpdump-workers () lists tcpdump org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers

Current thread: