Dailydave mailing list archives

Re: Today's thought


From: Matt Hargett <matt () use net>
Date: Thu, 27 May 2004 13:25:28 -0700

Dave Aitel wrote:

Halvar Flake wrote:
|> Matt Hargett wrote: |> There are a lot of companies getting funding right now that do
|> source code analysis, varying from fancy regexp matching on gcc's
|> preprocessor output to real AST generation and inspection. No
|> interfunction value tracking (similar to code coverage in that
|> people underestimate its' usefulness in these scenarios) yet, as
|> far as I know, though.
|
| IIRC Coverity has interfunction value tracking -- if you hook at
| the AST layer in GCC, it should not be _that_ hard to pull off, and
| I am quite surprised that @stake's product doesn't seem to do it
| (as far as I can infer from the examples they showed). Ahwell,
| there's going to be v2 soon I assume.

Did they mention that in one of their papers? I only read a few that Newsham pointed me at a year or two ago and didn't see it mentioned. I seem to remember you telling me a year or so ago that you didn't see the point in interfunction value tracking, nice to know you and I are in agreement now :)

BTW, have any of you guys checked out the tree-ssa changes which finally got merged? (I think I remember hearing about it from a friend at Cygnus originally, a long time ago!) The gimplification might ease some of this analysis.


| It is very true that pure static analysis will not solve the
| problem, but the problem which I see is that many people "soften
| up" the requirements for the static part because it is "easier
| dynamically". Then again, many people would consider me a religious
| zealot for static analysis (complete with detachedness from the
| real world and weird delusions that are normally associated with
| religious zealots :-P)

Well, the more I analyze the problem the more I crawl in your
direction through the gravel. I'm not sure that doing this analysis is
"easier dynamically" on large problems. Even basic protocol reverse
engineering is easier done via decompilation than interative solutions.

I think static analysis can go quite a bit further, and it sounds like Halvar agrees. That being said, I believe a hybrid approach is necessary for any real improvement in productivity or accessibility to people who aren't experts in this rather large domain; something I learned doing QA for so many years. I think that static dataflow analysis (a la PC-Lint) combined with runtime data sampling (a la Insure++) can really help in understanding codecs, protocols or otherwise. I never thought all the crap I went through with these QA tools at NAI 6+ years ago would be so useful to me now.

The real problem I see is in collating all that data in a meaningful way so that people don't have to keep so much state in their heads or in massive notebooks. This becomes a data organization issue, and all the disassembler and debugger UIs that are overextended versions of 80x25 UIs from ~20 years ago are not well suited to it IMO. Time to dig out the Tufte books, people! ;>


Also, I think hooking the AST layer in GCC is harder than generating
your own AST layer. I.E. (optional step 1.) Doing a decompilation, and
then (step 2) compiling it and then (step 3) clicky-clicky tainting
variables or automatically testing and running scripts. If you build
your own tree, you can generate nicer meta-data.

Given the analysis and collation GCC now has to do for VLIW bundling, I think the metadata internal to the compiler is already getting "nicer". It's a matter of providing a nice API to get at it for these static analysis purposes; maybe the tree-ssa merge will help with this. Then again, you guys seem to have dug a lot farther into GCC recently than I have, so maybe your opinion is closer to objective reality :)

Are any of these startups getting funding integrating with GCC in this way? If I were to go the source code analysis route, that's how I would do it.
_______________________________________________
Dailydave mailing list
Dailydave () lists immunitysec com
http://www.immunitysec.com/mailman/listinfo/dailydave


Current thread: