Wireshark mailing list archives

Re: [Wireshark-core] Hash Tables and Algorithmic Complexity Attacks


From: Anders Broman <anders.broman () ericsson com>
Date: Tue, 22 Apr 2014 12:14:28 +0000



-----Original Message-----
From: wireshark-dev-bounces () wireshark org [mailto:wireshark-dev-bounces () wireshark org] On Behalf Of Bálint Réczey
Sent: den 22 april 2014 13:53
To: Developer support list for Wireshark
Subject: Re: [Wireshark-dev] [Wireshark-core] Hash Tables and Algorithmic Complexity Attacks

[Bringing the discussion to -dev with Evan's permission]

2014-04-22 10:15 GMT+02:00 Anders Broman <anders.broman () ericsson com>:


-----Original Message-----
From: wireshark-core-bounces () wireshark org 
[mailto:wireshark-core-bounces () wireshark org] On Behalf Of Evan Huus
Sent: den 22 april 2014 05:36
To: Wireshark core developers
Subject: Re: [Wireshark-core] Hash Tables and Algorithmic Complexity 
Attacks

On Mon, Apr 21, 2014 at 6:28 PM, Balint Reczey <rbalint () gmail com> wrote:

On Apr 22, 2014 12:11 AM, Evan Huus <eapache () gmail com> wrote:

On Mon, Apr 21, 2014 at 3:59 PM, Bálint Réczey <balint () balintreczey hu> wrote:
Hi Evan,

2014-04-21 18:21 GMT+02:00 Evan Huus <eapache () gmail com>:
(sending to -core because of potential security implications)

I was recently investigating implementing a wmem-based hash 
table, since many of the current users of the wmem-tree do not 
actually need the predecessor-lookup feature which is the only 
advantage it provides over a hash table (whereas a hash table is otherwise much faster).

I ended up wandering down the rabbit-hole of hash collisions, 
algorithmic complexity attacks, and universal hashing ([1], [2], 
[3] and more).

Then I read the Glib hash table documentation: [4]. A quick grep 
through the Wireshark source reveals numerous locations where we 
use packet data in hash tables with insecure hash functions. As 
such, I believe that Wireshark is vulnerable to a whole class of 
denial-of-service attacks in this area.

Has this come up before? Am I overlooking something clever we're doing?
It is true that Wireshark is vulnerable to hash collision attacks, 
and I think it did not come up before because we had enough 
vulnerabilities of other simpler classes. ;-)

Makes sense :)

I think we could create wrappers to provide random seed to 
standard glib hash functions which is the standard way of handling 
such vulnerabilities.

Unfortunately (based on my reading) that will not be sufficient.
There are two vectors for an attack via collisions: the mapping from 
key to guint, and the mapping from guint to bucket index. The first
(key->guint) is user-settable so we can provide a random seed. The 
second (guint->bucket) is not user-controllable. From the glib
documentation:

"The modulus is taken with the hash table size (a prime number) to 
find the 'bucket' to place each key into."
and then a few paragraphs down
"Even cryptographic hashes are very easy to find collisions for when 
the remainder is taken modulo a somewhat predictable prime number."

So basically, it doesn't matter how strong we make the initial 
mapping because the secondary bucket mapping is predictable anyways.
Fortunately there are easy and efficient ways to make the secondary 
mapping unpredictable (it's actually simpler than the initial
mapping) so I guess that a good secure wmem map implementation is 
actually fairly important to have.
Luckily ghashtable resizes itself increasing the number of buckets when the collision rate is high, thus this attack 
can't be performed effectively.
The described problem is valid only for hash tables with fixed bucket count.

Regarding the wmem hash tables I think C wrapping C++ STL hash tables with wmem based custom allocators would do the 
job.

Unfortunately this doesn't work; the free-all operation which wmem provides doesn't play nice with C++ destructors 
(besides which we are still pure-C for libwireshark for the time being).
We lost being C only with the Qt UI. :-) It can be implemented several ways which work, one is registering each hash 
table to the proper wmem pool and calling their destructor when freeing the pool - this way we don't have to play with 
a custom allocator.
Other technique would be not calling destructors, just freeing all the allocated connected objects. This is not nice, 
but would work as well IMO, since we would not hold refs to the free()-d objects.


I have whipped up a very basic wmem-map implementation at [1]. It uses universal multiply-shift hashing [2] for 
bucket placement, and provides a wmem_secure_hash function for replacing >g_str_hash and friends, based on work done 
by the Perl community [3] in hardening their hash tables. To the best of my knowledge the resulting hash map is 
secure against complexity attacks.

It still needs cleanup (comments, tests, etc). I would also like to replace one tree somewhere and run benchmarks to 
make sure it's actually faster. Suggestions and concerns are welcome.
It would be nice to see how it compares to other implementations:
http://incise.org/hash-table-benchmarks.html

I prefer using existing building building blocks instead of inventing our own.


One advantage would be the use of wmem which would release the memory atutomagically. I would also like to see 
benchmarks if the speed gain is good then more + for our own functions. 

Cheers,
Balint


[1] https://code.wireshark.org/review/1272
[2] 
https://en.wikipedia.org/wiki/Universal_hashing#Avoiding_modular_arith
metic [3] http://blog.booking.com/hardening-perls-hash-function.html


Would it be good to have initial hastable size in the API? So a table could be pre allocated to a custom size for the 
cases where we know the size will be larger than standard default.
I think it would not be good. I would save a constant number resizes
(O(1) gain) but we would reserve more memory for each hash table initially.

Yes but that would be an informed design decision, if we "know" that we will get at least x resizes for a "normal" 
trace file why not pre allocate for x if the memory "cost" is low for the
Unlikely small trace file cases where overall memory usage is low anyway? In some cases we might even know the final 
size, say a hf hash table. 

Cheers,
Balint

Regards
Anders


Cheers,
Balint


AFAIK ghashtable does not employ randomization of hash function seed:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=655044

Cheers,
Balint


Evan

P.S. I believe it's still perfectly OK to use hash tables with 
non-packet data such as static value_strings or as in [5].

[1] http://lwn.net/Articles/474912/ [2] 
http://perldoc.perl.org/perlsec.html#Algorithmic-Complexity-Attac
k s [3] https://en.wikipedia.org/wiki/Universal_hashing
[4]
https://developer.gnome.org/glib/unstable/glib-Hash-Tables.html#G
H
ashFunc [5]
https://code.wireshark.org/review/gitweb?p=wireshark.git;a=commit
;
h=5983cda769fa6615dda5fc7b8f87819d40f0a8d5

___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request () wireshark org?subject=unsubscribe
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request () wireshark org?subject=unsubscribe

Current thread: