Dailydave mailing list archives

Re: A small fun Python puzzle


From: Adrien Krunch Kunysz <adrien () kunysz be>
Date: Wed, 10 Feb 2010 21:45:19 +0000

On Tue, Feb 09, 2010 at 07:13:47PM -0800, travis+ml-dailydave () subspacefield org wrote:
On Tue, Apr 01, 2008 at 04:08:41PM +0200, ergosum wrote:
Hi all, 
for l in [100000, 1000000, 5000000, 10000000]:
...   print '%10d %f' % (l, test(l))
...
    100000 0.006711
   1000000 0.764886
   5000000 28.554786
  10000000 111.738498

(wow - so not linear ...)

So if this is true, slicing data[1024:] should be O(M) where M = 100000, 
1000000, 5000000, 10000000 respectively, while slicing data[i:i+1024] should 
always be O(N) where N = 1024. There is a huge gain here that accounts for 
the more or less homogeneous times of the second algorithm. Nevertheless it 
puzzles me why the slicing operation isn't linear. Anyone with a better 
python internal knowledge can throw some light?

I think the slicing operation /is/ linear (O(M)), assuming malloc/free
as O(1) and no GC overhead.

However, in the first version of the program, it is called in a
loop, which is called ceil(M/1024) times.  So as a result, the whole
loop runs in O(M^2) time.

The second, iterative version of the program has a loop called ceil(M/1024)
times, and it uses the array slice operator, which runs in O(1024) time.
So the run time is O(M), which is borne out by the timings.

I can't find the link now, but I saw a similar issue on ha.ckers.org
recently where he was doing something similar with strings in javascript.

There's nothing surprising when you realize that creating a new
string or buffer object takes time O(M) to copy the data, where M is
the size of the new object.  You could possibly do zero-copy
operations, where one object points into the buffer used for another,
but then garbage collection and object deallocation becomes trickier.

For those who are more familiar with C than with Python and algorithm-speak,
I think it means the program is doing this:

        for (len = strlen(data); len > 0; len -= 1024) {
                for (i = 1024; i < len; i++)
                        data[i-1024] = data[i];
        }

Which was this in Python:

~        while data!="":
~            data=data[1024:]

Fixing the buffer underrun in the C version is left as an exercise for
the reader.

Attachment: signature.asc
Description: Digital signature

_______________________________________________
Dailydave mailing list
Dailydave () lists immunitysec com
http://lists.immunitysec.com/mailman/listinfo/dailydave

Current thread: