Nmap Development mailing list archives

[Proof of Concept] Efficient, ASCII-safe port compression


From: doug () hcsw org
Date: Fri, 6 Jul 2007 03:06:30 -0700

Hi nmap-dev!

I was thinking this evening about the problem of encoding long
lists of port strings efficently and reliably. When I have done
large-scale scans in the past, I have run up against all sorts of
scalability problems, many of them based on not having an efficient,
transportable encoding for sets of ports.

Also, when you want to keep complete information on all the ports
in large scans you often end up listing out massive ASCII lists in
XML/greppable output. Consider when there are 20 open ports, but
30000-some closed and 30000-some filtered; Nmap can ony collapse
one of those lists without throwing out information. Insignifigant
you say? Well, perhaps, but remember for a large scale distributed
scanning effort you want to be able to make use of tiny 1mb shell
accounts for scanning as well as your comfy terabyte servers.

Finally, I was considering the problem of not being able to know
what ports were scanned in the XML output of any given scan
because we don't encode the contents of the nmap-services file in
the scan itself.

Let me introduce you to portcompress:

http://hcsw.org/downloads/portcompress.c

This simple, portable C file is a program that runs in 2 modes:

   Usage: portcompress [-e|-d]

   In encode mode (-e) takes whitespace separated decimal
   port numbers until EOF and prints out a compressed port list.

   In decode mode (-d) reads in a compressed port list and
   prints out the corresponding ports separated by newlines.


Given lists of integers it encodes it in an efficient run-length
encoded ASCII-armoured format:

$ echo "1 2 3 4 5 6 7 8 9 10" | ./portcompress -e
JZ**xA

$ echo "1 2 3 4 5 6 7 8 9 10" | ./portcompress -e | ./portcompress -d
1
2
3
4
5
6
7
8
9
10

$ echo "1 2 3 4 5 6 7 8 9 10 65533 65534 65535" | ./portcompress -e
JZ**u*A

$ echo "1 2 3 4 5 6 7 8 9 10 9876 65533 65534 65535" | ./portcompress -e
JZyaF32WT8A

$ cat ~/nmap/svn/nmap/nmap-services |perl -ne 'print "$1\n" if m/^[\w-_]*\s*(\d+)/;'|sort|uniq|./portcompress -e|wc -c
696

That's right, all the TCP and UDP port numbers in the services file
can be enumerated in 696 bytes, ASCII-safe! It would be something like
4 times longer (and not ASCII-safe) if we just listed the 2 byte integers
back-to-front.

The secret is an efficient run-length encoding algorithm I developed.
It encodes runs of length 4 or more as a simple count of the length of
the run. For maximum efficiency the length of the run is variable encoded
itself. (This is very similar to an algorithm a professor of mine, Dr. Paeth,
invented. Dr's Paeth algorithms are also used in PNG, JPEG, etc).

Here is the bit-stream protocol, from the source:

Protocol:

Either

00 = 0
11 = 1

or

01 = RLE string of 0s
10 = RLE string of 1s

  followed by one of

  00 = 2 bits
  01 = 4 bits
  10 = 8 bits
  11 = 16 bits

    followed by (run length - 4)
    encoded in a binary number of the previously specified bits


Examples:

"101"   => "110011"
"111"   => "111111"
"1111"  => "100000"
"11111" => "100001"


Anyways, this was a very quick hack-job so please let me know if you
find any bugs or have other suggestions! I took code from at least 2
other bits of Hardcore Software: ASCII armour and nuff. :)

Enjoy,

Doug

Attachment: signature.asc
Description: Digital signature


_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://SecLists.Org

Current thread: