Dailydave mailing list archives

Re: Intersecting graphs...


From: Thomas Fischbacher <Thomas.Fischbacher () Physik Uni-Muenchen DE>
Date: Wed, 22 Sep 2004 22:28:26 +0200 (CEST)


On Wed, 22 Sep 2004, Halvar Flake wrote:

Hey all,

it might sound odd that I am asking a graph-theory question here,
but with DD being a list full of smart people with odd hobbies I
thought I might find an answer here:

Does anyone know of a reasonably fast algorithm which calculates
the intersection of two acyclic digraphs with a common "root"-node ?

Like, say, a tree containing filesystem stat information?

(I'm far from being an expert in graph theory, but let me just give a few 
thoughts how I would approach such a problem.. Always assuming that the 
data in both trees are "mostiy unique", i.e. a given entry does not 
re-occur too often within a tree.)

I haven't had a look at the tripwire source, but one way how to do 
something like this is to recursively walk through the first tree, bottom 
up, starting with the individual leafs, and mincing all the data for 
every single leaf tree through a cryptographic hash function. When you 
climb up, you take the data within each node, plus the cryptographically 
hashed fingerprints from the subtrees, and generate a new cryptographical 
hash for this. This way, you will get one fingerprint for every node, in 
something like O(nodes) time. All these fingerprints you store in a hash.

Next, you traverse the second subtree, again, bottom up, forming 
cryptographic hash values. When you encounter one which you've seen in the 
other tree, you might want to assume with reasonable confidence that the 
corresponding subtrees are identical. (Otherwise, check. One can employ 
clever tricks here as well - memoizing comparisons would be a strategy.)

Two issues:

* Once you identified two subtrees as identical, you still have to find 
their relative places w.r.t. the common intersection. This is facilitated 
by remembering for every subtree the node of which it is a child. This 
way, such identifications provide you with a set of possibilities for 
possible parents to be identified. There will be two kinds: such where 
comparison of parent nodes immediately gives you a failure, and such where 
this is not the case. Thus, you can reduce a few cases immediately, while 
backtracking should be used for the others. call-with-current-continuation 
would be of great value here.

* If subtrees are not identical, well, it may still be that the subtree 
sets of the corresponding node have a nontrivial intersection. I did not 
put too much thought into this yet, but to me it seems as if the proper 
strategy would be to re-calculate the hash values of the path up to the 
root node. But as we may have more options where to identify a subtree of 
graph #2 in graph #1, it seems as if we again would have to do some 
backtracking. Then, we perhaps should not use a hash mapping cryptographic 
hash keys to trees (and their parents), but instead binary trees, and work 
with binary tree insert/delete functions that do not destructively modify 
trees, but map given trees to new ones with the appropriate modifications 
in a functional fashion - this should greatly simplify the backtracking.

There are many more details that depend on the actual problem, like:

* When to consider subtrees as equal ("same object" vs. "equal").

* Are there data in the nodes, or only in the leaves?

* Is the number of childs a node may have fixed?

* Does the order of the childs of a node matter?

* Is it advantageous to be able to recognize transplantations of subtrees?

* What is the problem structure of a typical application?

So, if you could be a little bit more specific, I may be able to give a 
more useful answer.

-- 
regards,               tf () cip physik uni-muenchen de              (o_
 Thomas Fischbacher -  http://www.cip.physik.uni-muenchen.de/~tf  //\
(lambda (n) ((lambda (p q r) (p p q r)) (lambda (g x y)           V_/_
(if (= x 0) y (g g (- x 1) (* x y)))) n 1))                  (Debian GNU)
_______________________________________________
Dailydave mailing list
Dailydave () lists immunitysec com
http://www.immunitysec.com/mailman/listinfo/dailydave


Current thread: