oss-sec mailing list archives

Re: Randomness Attacks Against PHP Applications


From: Vladimir Vorontsov <vladimir.vorontsov () onsec ru>
Date: Sun, 23 Sep 2012 13:22:43 +0400

Hi all,

George,

First, thx for your work again.
I have a question to
http://crypto.di.uoa.gr/CRYPTO.SEC/Randomness_Attacks_files/paper.pdf.
Why you are using Request Tweens expect of Triple Requests?

Please, pay your attention to slide 13 from:
http://www.slideshare.net/d0znpp/dcg7812-cryptographyinwebapps-14052863

Your Adversarial Time Synchronization (ATS) methodical is great.
But it is not applicable to real web-projects whose architecture
includes an application server and front-end.

During the real audits on Web applications we are using many tricks to
know application server timestamp and pid:

1. phpinfo() output with mod_uniqueid (very common). To decode uniqueid
string use web service https://dev.onsec.ru/rands/

2. /server-status output discovered by at this post:
http://habrahabr.ru/company/pt/blog/149746/

3. others application outputs

23.09.12, 9:14, Solar Designer пишет:
George,

I watched a video of your USENIX Security talk the other day (after I
was done with my experiments with mt_rand() seed cracking, though):

http://www.youtube.com/watch?v=yE0qTTi-_iQ

Thanks.  Shortly before that, I released a faster version of my PHP
mt_rand() seed cracker, now down to 1 minute worst case on CPU:

http://www.openwall.com/lists/john-users/2012/09/20/2
http://download.openwall.net/pub/projects/php_mt_seed/

$ time ./php_mt_seed 1328851649
Found 0, trying 637534208 - 671088639, speed 63310249 seeds per second
seed = 658126103
Found 1, trying 1207959552 - 1241513983, speed 63343447 seeds per second
seed = 1234567890
Found 2, trying 4261412864 - 4294967295, speed 63347894 seeds per second
Found 2

real    1m7.798s
user    9m1.942s
sys     0m0.008s

In one minute of real time (on an FX-8120 CPU, using vectorized fused
multiply-add instructions), it found the original seed, another seed
that also produces the same mt_rand() output, and it searched the rest
of the 32-bit seed space (not finding other matches in this case).

Gifts implemented the same thing in OpenCL:

https://github.com/Gifts/pyphp_rand_ocl
https://github.com/Gifts/pyphp_rand_ocl/tree/SpeedyOCL

His SpeedyOCL branch achieves similar performance (and even 10% better
than mine in his test of both on a Core i5-2500) on CPUs (with Intel's
OpenCL SDK) and several times better performance on GPUs.

On Thu, Sep 20, 2012 at 02:31:19PM -0400, George Argyros wrote:
FYI, we have also released some code to exploit such vulnerabilities
about a month ago in github
(https://github.com/GeorgeArgyros/Snowflake).  We hope that this
framework will allow the easier development of exploits for such
vulnerabilities.
The main difference with the code mentioned in the previous posts, is
that it may not always be possible to obtain the first output from
mt_rand() after seeding. For example, many applications only leak
truncated mt_rand() outputs in which case we should really compare
bits of different mt_rand outputs to test whether we found the correct
seed.

Yes.  Your implementation is far more generic.

The way I'd approach this is by limiting the set of possible seeds based
on the first mt_rand() output (perhaps with a min-max range, so _many_
seeds will be potentially valid - could be millions, yet significantly
fewer than the full 4 billion set).  This can be accomplished almost as
quickly as cracking the seed based on an exact mt_rand() output (full
31 bits of it, no min-max range) - that is, in one minute on CPU.  Then
slower, but more generic code may be used to filter out the impossible
seeds based on further mt_rand() outputs, until there's just one seed
value left.  The slowness of that second cracker would not matter much
because it'd only need to search a much smaller seed space.

A limitation, though, is that the very first mt_rand() output after
seeding must be among those available, even if in truncated form.  If it
is not, then more of the state has to be maintained in the initial
cracking pass, thereby making it slightly slower.  When the first output
is (at least partially) available, we only need 3 state elements, so
they're kept in registers nicely.

For this reason, in our tool the user defines a "hash" function
that expresses any mt_rand dependent output and then we search for a
preimage for this hash function.

That's a generic and curious approach, but I think it's slower than what
I described above while not being _more_ generic.

We also include a python API for this functionalities as well as a
sample exploit for mediawiki 1.18.1. Some POCs might indeed help...

We will also release the gaussian solver based tool we developed in
order to recover the internal state of mt_rand from its (truncated)
outputs in the following days.

Cool!

I agree too that education is important. This is something that we
came to an agreement with the PHP team (for example that additional
information is needed on the mt_rand manual). However, as pointed out
nothing has changed yet (the conversations between us and the PHP team
took place in March/April).

Did PHP 5.4's change of session IDs (vs. 5.3's) occur before or after
your conversations with them?

However I think that the specific issue goes beyond education. First
of all, We believe that adding simple exploit mitigations such as
secure seeding in these functions is something that should definitely
happen. Secure seeding is easy to add using randomness from the
operating system and furthermore it will incur a negligible
performance overhead since it would happen only once in the process
lifetime.

Agreed.

For a more concrete solution I think that the existence of secure
PRNGs is not enough. Even if mt_rand() was producing secure random
numbers we would still be having a lot of vulnerable applications just
from the way this function is used.

Right.

Just as PHP devs use
session_start() to start a new PHP session there is a need for a
generate_token() function that will return a random token and take
care of providing the necessary level of security for this token.

Perhaps, and this is in line with Anthony Ferrara's work on a new
password hashing API for PHP 5.5 (which will eventually obsolete phpass,
I guess).  Simpler interfaces that are easier to use correctly than not.

However, a reason why many PHP app devs use mt_rand() and the like is
because they want to generate tokens of a certain format, such as to
meet a standard.  One such use is crypt() salts, which we're helping
eliminate with phpass and then with Anthony Ferrara's new API.  But I
guess other uses will remain.  So perhaps that generate_token() function
would need to accept arguments and return tokens of different target
formats accordingly.  Maybe it should have a format argument similar to
that of pack().  Its advantage over separate calls to a PRNG and then to
pack() on the PRNG's output would be that it'd keep the distribution
uniform.

It is curious how it is often overlooked that simple modulo division may
turn a uniform distribution into non-uniform.  Here's an illustration
(in case this is news to someone on oss-security):

$ php -r 'for ($n = $i = 0; $i < 1000000; $i++) { $x = mt_rand(); $n += ($x < 0x40000000); } echo "$n\n";'
499567
$ php -r 'for ($n = $i = 0; $i < 1000000; $i++) { $x = mt_rand(); $n += ($x < 0x40000000); } echo "$n\n";'
500558

$ php -r 'for ($n = $i = 0; $i < 1000000; $i++) { $x = mt_rand() % 500000000; $n += ($x < 250000000); } echo "$n\n";'
535117
$ php -r 'for ($n = $i = 0; $i < 1000000; $i++) { $x = mt_rand() % 500000000; $n += ($x < 250000000); } echo "$n\n";'
534124

First two runs show uniform distribution of mt_rand() outputs themselves
(assuming the 0 to 0x7fffffff range).  The other two runs show
non-uniform distribution over the 0 to 500 million minus 1 range,
after the modulo division by 500 million.  (Values lower than half the
maximum suddenly became more common than higher values.  It was 50/50
before the modulo division, but became 53.5/46.5 afterwards.  The reason
is obvious: the original space is not divisible into a whole number of
500 million sub-spaces.)

This is why having separate functions that return a random number and
those that process it to the desired format might not be good enough.
A single function that does both jobs at once (avoiding the above
problem by using a smarter approach) could work better.

Alexander




Attachment: signature.asc
Description: OpenPGP digital signature


Current thread: