Full Disclosure mailing list archives

Re: "IO wait chains" in Linux??


From: "Cal Leeming [Simplicity Media Ltd]" <cal.leeming () simplicitymedialtd co uk>
Date: Mon, 7 Feb 2011 18:29:04 +0000

Thank you for your detailed reply!

Here's the kinda thing I was looking for (this is just a mockup):

21000 - /usr/local/sbin/nginx - [D]
 - /tmp/.somefile
    - other PIDs waiting on this file (not just children of the parent)
        - 51283 - /usr/local/sbin/apache (4.6 seconds)
        - 31028 - /usr/local/sbin/python2.6 (1.9 seconds)

Sadly, I don't know much about how the kernel and the IO schedulers handle
these things behind the scenes, so what I'm asking for may be impossible
(apart from your other suggestion using watchdog+dmesg).

On Mon, Feb 7, 2011 at 4:28 PM, <Valdis.Kletnieks () vt edu> wrote:

On Mon, 07 Feb 2011 06:41:53 GMT, "Cal Leeming [Simplicity Media Ltd]"
said:

Is anyone aware of a Linux based CLI equivalent, which will show the
processes stuck in IO wait, in a tree format?

ps ax | grep ' [D] '   gives a pretty good approximation of "currently in
I/O wait".
But remember that each process (or actually, each thread within a process)
can individually be stuck in I/O wait, so it's unclear what the "tree
format"
would consist of, exactly.  If you have a process that has parent,
siblings,
and children, what else would show up in the tree if it's in an I/O wait?

There's the slightly more difficult issue that if you're trying to do
system-level analysis, you're looking at really bad race conditions.
 Processes
often go into and leave I/O wait status in literally milliseconds.  At
best,
you can run through the process list several times and get a statistical
view
of "these 4 processes are in I/O wait most of the time".  'pstree' mostly
avoids that issue because if the system is small enough that the pstree
output
is still useful, the fork/exec rate is low enough that pstree can mostly
ignore
it.  That's not true for I/O.

If you're trying to identify processes that are truly and literally *stuck*
in
I/O wait due to a hardware or kernel error, you're probably better off
enabling
the watchdog timer in the kernel and watching dmesg for it triggering.


_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/

Current thread: