WebApp Sec mailing list archives

RE: Sequence Identification Routines?


From: "Dawes, Rogan (ZA - Johannesburg)" <rdawes () deloitte co za>
Date: Tue, 10 Dec 2002 18:23:54 +0200

Here is a fairly simple perl script that takes a sequence of "random"
tokens, determines the charset for each character position (it assumes that
all tokens are the same length), converts each character into a decimal
based on the charset for that specific character position, and calculates
the "integer value" of each token.

It then prints each token in sequence, and calculates the difference between
the preceding token "value", and its own.

You could then graph the results, to assist in determining the level of
randomness.

The fun part about this script, is that it looks explicitly at the input
that YOU give it, so the more input, the better and more accurate its
calculations will be.

Also, tokens such as

AAAAAAAAAA
AAAAAAAAAB
AAAAAAAAAC

Should expand to 0,1,2, because the extra AAAAAAA simply evaluates to zero
(0*1^0 + 0*1^1, etc)

I would be very interested in seeing the results of this plugged into
something like Michal Zalewski's strange attractors graphs. I have seen some
references to a similar approach, using a package called OpenQVis, but have
not had time to play with it yet.

Obvious problems:

This generates VERY large numbers, depending on the character set, and the
length of the token. Differences can therefore also be quite large. Graphing
that on a graph that makes any kind of sense is non-trivial, I think. Not
being a statistician, of course!

Ways of visualising the results:

Sort the token values, and plot them on a graph. One should ideally see a
"straight line" graph, most likely sparsely populated.
Sort the differences and plot them. One should again see a straight line
graph, most likely sparsely populated.

Any deviations from a straight line could indicate somewhat non-random
behaviour. This is not to say that it can help you predict what is coming
next, but it can show flaws in the generator.

Alternatively, as someone mentioned diehard, take the integer values, break
them back into bytes, write them out as a byte stream, and use that as input
to diehard for extensive analysis. 

I must say, when I used diehard, I was pretty much unable to evaluate what
it was telling me, as I have no idea what the tests that it is running mean!
:-)

Have fun.

Rogan

P.S. Any suggestions for improvements, especially performance, and analysis,
please send them my way, and I'll see what I can do.
P.P.S.

FWIW, I typically do something like:

for i in `seq 1 1000` ; do
(echo HEAD /cookiegenerator HTTP/1.0; echo Otherheader: whatever; echo ) |
nc target 80 | grep Set-Cookie >> cookies
done

Post process cookies to get just the "crumbs" :-), then run them through the
analysis below.

Does anyone know of a tool that would automatically use Keep-Alives to speed
something like this up, if available, but would fallback to recurring
connections when not?

0 $ cat charset.pl 
#!/usr/bin/perl -w

use strict;
use Math::BigInt ':constant';

my $verbose=0;

my %chars=();

my @charpos=();

my @cookies=();

while (my $line=<>) {
  chomp $line;
  push @cookies,$line;
  my @line=split('',$line);
  for (my $i=0; $i<length($line); $i++) {
    my $char=$line[$i];
    if (! exists $chars{$char} ) {
      $chars{$char}=1;
    } else {
      $chars{$char}++;
    }
    if (!exists $charpos[$i]->{$char}) {
      $charpos[$i]->{$char}=1;
    } else {
      $charpos[$i]->{$char}++;
    }
  }
}

if ($verbose) {
  my @chars=sort keys %chars;
  print "Overall Charset count is : ",$#chars+1,"\n";
  print "Overall Charset is : \n";
  print join('',sort keys %chars),"\n";
  
  print "\nOverall Distribution is :\n";

  foreach my $char (sort keys %chars) {
    print "$char : ",$chars{$char},"\n";
  }
}


my @charset=();

if ($verbose) { print "\n\nPositional distribution is as follows:\n\n\n"; }

for (my $i=0; $i<=$#charpos; $i++) {
  my $chars=$charpos[$i];
  my @chars=sort keys %$chars;
  $charset[$i]=join('',sort keys %$chars);

  if ($verbose) {
    print "Position $i Charset count is : ",$#chars+1,"\n";
    print "Charset is : \n";
    print $charset[$i],"\n";
  
    print "\nDistribution is :\n";
  
    foreach my $char (sort keys %$chars) {
      print "$char : ",$chars->{$char},"\n";
    }
    print "\n\n\n\n";
  }

}

my $prev=Math::BigInt->new("0");

while (my $cookie=shift @cookies) {
  my $value=undef;
  my $base=undef;
  my $total=Math::BigInt->new("0");

  for ( my $p=0; $p < length($cookie); $p++) {
    if (defined $base) { $total*=$base; }
    ($value,$base)=charval(substr($cookie,$p,1),$charset[$p]);
    $total+=$value;
  }
  print $cookie," : ",$total," : ",($total-$prev),"\n";
  $prev=$total;
}

exit;


sub charval {
  my $char=shift;
  my $charset=shift;

  return (index($charset,$char),length($charset));
}

-----Original Message-----
From: Nick Jacobsen [mailto:nick () ethicsdesign com] 
Sent: 09 December 2002 10:52 AM
To: webappsec () securityfocus com
Subject: Sequence Identification Routines?


I was hoping one of you might have some input here...  I am black box
testing a web app that generates a 5 character (letter and number only,
lowercase) verification string, that it then emails to the email address on
file, and then the receiver has to type it in to continue with his
registration...  now, I am looking for some sort of programming routines,
snippets, or programs, that will look at a set of say, a 1000, numbers, and
tell me if there is any sensible pattern, off which to predict the next 5
character string in the sequence.  Any suggestions welcome!

Thanks,
Nick Jacobsen
Ethics Design
nick () ethicsdesign com


Current thread: