nanog mailing list archives

Re: Quick question.


From: "Alexei Roudnev" <alex () relcom net>
Date: Wed, 4 Aug 2004 00:23:01 -0700


I said - it WORKS. 1 spin - warning - someone opens system and kills a run
away process... Never saw 2 spins (because first one was killed before
second one). Btw, such systems (2 CPU) are even more stable in case of run
away device drivers.

I saw:
- run away tomcat server
- run away CA agent (!@#$)
- run away ssh daemon
- run away sandmail
All regular, at some periods of time. And all processed without any system
degradation because of a few CPU's. The same run-aways on 1 CPU systems
caused visible degradation.

It is all mattter of trade-off - if I must select 1 threaded or 2 threaded
P-IV, I'll select 2 threaded; if I must select from $900 1 CPU and $1100 2
CPU server, I select 2 CPU one.


----- Original Message ----- 
From: "Paul Jakma" <paul () clubi ie>
To: "Alexei Roudnev" <alex () relcom net>
Cc: "Michel Py" <michel () arneill-py sacramento ca us>; "Nanog"
<nanog () nanog org>
Sent: Tuesday, August 03, 2004 11:39 PM
Subject: Re: Quick question.


On Tue, 3 Aug 2004, Alexei Roudnev wrote:

It is not mad idea - 2 CPU servers are not sugnificantly more
expansive as 1CPU (and notice, we count P-IV MMultiThread as 2 CPU)

Well, you have to compare like for like, so system with multiple CPUs
versus exact system without. No diffference in cost, other than for
the CPUs.

And if you want reliability, you're not going to be buying your
machines from the nearest Lidl (unless your application is engineered
to take advantage of dozens of cheap throwaway PCs).

but increases system redundancy to the run-away processes. Of
course, it is not hardware redundancy, but it REALLY works.

Not really.. this is a resource exhaustion problem, and you can not
cure this, given buggy apps, by throwing more CPUs at it.

Let's say you have some multi-process or multi-threaded application
which regularly spawns/forks new processes/threads, but it is buggy
and prone to having individual processes/threads spin.

So one spins, but you still have plenty of CPU time left cause you
have two CPUs. Another spins, and the machine starts to crawl. So you
solve this problem by upgrading to a quad-SMP machine. And guess what
happens? :)

Sure, there are some application bugs you can mask a wee bit with
SMP, but it's not much cop, its not a solution, and you need
an infinite-SMP machine to guarantee that a bad application
can never hog all CPU time.

What you really want is a good OS with:

- a good scheduler (to prevent spinning tasks from starving other
tasks)

- ability to set resource limits, ie per-task and/or per-user (if
your apps run under dedicated user accounts) limits on cpu time,
resident memory, etc..

Both of these will allow you to constrain the impact bad tasks can
have on the system, whether your machine is 1, 2, ... or n CPUs.

The real solution though is to fix the buggy application.

regards,
-- 
Paul Jakma paul () clubi ie paul () jakma org Key ID: 64A2FF6A
Fortune:
The life which is unexamined is not worth living.
  -- Plato


Current thread: