tcpdump mailing list archives

Optimizer uses stale control flow graph data structures


From: "Archit P. Shah via tcpdump-workers" <tcpdump-workers () lists tcpdump org>
Date: Tue, 03 Nov 2020 22:22:06 -0800

--- Begin Message --- From: "Archit P. Shah" <archit.shah () alum mit edu>
Date: Tue, 03 Nov 2020 22:22:06 -0800
I submitted two pull requests fixing bugs in the libpcap optimizer (#972 and #976).  Both proposed patches are 
relatively narrow hacks to address specific bugs (links [1] and [2] below).  I'm writing here because it appears both 
bugs are specific instances of a broader issue that can be resolved with a larger restructuring of the optimzier.  I 
have mocked up a patch with such a larger restructuring of the optimizer's looping structure.  See 
https://github.com/the-tcpdump-group/libpcap/compare/master...tenarchits:2020-recalculate-cfg

Currently opt_loop builds a series of data structures related to the control flow graph (block levels, dominators, 
use/def information, edge dominators, etc.) and then calls opt_blks which walks through the control graph calling 
opt_blk on each block to optimize.  The issue is that individual optimizations to one block or statement may make 
alterations that render the computed data structures stale.  The next optimization may then be relying on stale data.  
In the bug in link [1] below, or_pullup rearranges control flow to invalidate the calculations made by find_dom.  The 
next call to or_pullup or and_pullup may rely on stale data.  In the bug in link [2]  below, opt_deadstore deletes 
instructions impacting the structures computed by find_ud, which may impact optimization of the of the next block.

First, my patch uses setjmp/longjmp to jump back to the calculation of the control flow graph-related data structures 
in opt_loop any time an optimization is made that invalidates any of the computed structures.  Obviously, a more 
finegrained recalculation would be preferable, but I believe this approach prioritizes correctness at a reasonable 
performance penalty.

Second, my patch restructures the control flow in opt_loop.  Currently, opt_loop runs opt_blks repeatedly until a fixed 
point is reached.  It is first run with do_stmts=false to find a fixed point and then with do_stmts=true to find a 
fixed point.  With the patch, opt_loop alternates between running opt_blks with do_stmts=false and running opt_blks 
with do_stmts=true until a fixed point is reached.  Thus first there is a recalculation of the control flow graph data 
structures, then a pass of opt_blks with do_stmts=false and then a pass of opt_blks with do_stmts=true.

I hope these observations and this proposal is useful.  I welcome any feedback for improvements to this patch or 
suggestions for alternate approaches.

I have also mocked a patch adding tests to the tcpdump project reflecting these two bugs.  Please let me know if it 
would helpful for me to create a pull request for these tests. See [3].

Links

[1] https://github.com/the-tcpdump-group/libpcap/issues/945
[2] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=144325
[3] https://github.com/the-tcpdump-group/tcpdump/compare/master...tenarchits:2020-optimizer-tests

Regards,

 -- Archit

--- End Message ---
_______________________________________________
tcpdump-workers mailing list
tcpdump-workers () lists tcpdump org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers

Current thread: