Secure Coding mailing list archives

Retrying exceptions - was 'Coding with errors in mind'


From: leichter_jerrold at emc.com (Leichter, Jerry)
Date: Wed, 6 Sep 2006 10:07:17 -0400 (EDT)

| Oh, you mean like the calling conventions on the IBM Mainframe where a dump
| produces a trace back up the call chain to the calling program(s)?  Not to
| mention the trace stack kept within the OS itself for problem solving
| (including system calls or SVC's as we call them on the mainframe).   And
| when all else fails, there is the stand alone dump program to dump the whole
| system?
| 
| Mainframes have been around for years.  It's interesting to see "open
| systems" take on mainframe characteristics after all this time....
All these obsolete ideas.  Stack tracebacks.  Feh!

Years back at Smarts, a company since acquired by the "EMC" you see in
my email address, one of the things I added to the system was a set of
signal handlers which would print a stack trace.  The way to do this was
very non-uniform:  On Solaris, you had to spawn a standalong program
(but you got a stack trace of all threads).  On HPUX, there was a
function you could call in a system library.  On AIX (you'd think IBM,
of all vendors, would do better!) and Windows, we had to write this
ourselves, with varying degrees of OS support.  We also dump - shock!
horror! - the values in all the registers.  And we (try to) produce a
core dump.

My experience has been that of crashes in the field, 90% can be fully
analyzed based on what we've written to the log file.  Of the rest, some
percentage - this is harder to estimate because the numbers are lower; -
can be fully analyzed using the core dump.  The rest basically can't be
analyzed without luck and repetition.  (I used to say 80-90% for "can be
analyzed from core file", but the number is way down now because (a)
we've gotten better at getting information into and out of the log files
- e.g., we now keep a circular buffer of messages, including those at
too low a severity level to be written to the log, and dump that as
part of the failure output); (b) the remaining problems are exactly the
ones that the current techniques fail to handle - we've fixed the
others!)
                                                        -- Jerry



Current thread: