nanog mailing list archives
Re: Solar Flux (was: Re: China prefix hijack)
From: Scott Howard <scott () doc net au>
Date: Sun, 11 Apr 2010 12:58:44 -0700
On Sun, Apr 11, 2010 at 7:07 AM, Robert E. Seastrom <rs () seastrom com> wrote:
We've seen great increases in CPU and memory speeds as well as disk densities since the last maximum (March 2000). Speccing ECC memory is a reasonable start, but this sort of thing has been a problem in the past (anyone remember the Sun UltraSPARC CPUs that had problems last time around?) and will no doubt bite us again.
Sun's problem had an easy solution - and it's exactly the one you've mentioned - ECC. The issue with the UltraSPARC II's was that they had enough redundancy to detect a problem (Parity), but not enough to correct the problem (ECC). They also (initially) had a very abrupt handling of such errors - they would basically panic and restart.
From the UltraSPARC III's they fixed this problem by sticking with Parity in
the L1 cache (write-through, so if you get a parity error you can just dump the cache and re-read from memory or a higher cache), but using ECC on the L2 and higher (write-back) caches. The memory and all datapaths were already protected with ECC in everything but the low-end systems. It does raise a very interesting question though - how many systems are you running that don't use ECC _everywhere_? (CPU, memory and datapath) Unlike many years ago, today Parity memory is basically non-existent, which means if you're not using ECC then you're probably suffering relatively regular single-bit errors without knowing it. In network devices that's less of an issue as you can normally rely on higher-level protocols to detect/correct the errors, but if you're not using ECC in your servers then you're asking for (silent) trouble... Scott.
Current thread:
- RE: China prefix hijack, (continued)
- RE: China prefix hijack Joe (Apr 08)
- Re: China prefix hijack Dan White (Apr 08)
- Re: China prefix hijack Martin A. Brown (Apr 08)
- Re: China prefix hijack Danny McPherson (Apr 08)
- Re: China prefix hijack Paul Vixie (Apr 09)
- Re: China prefix hijack Dobbins, Roland (Apr 09)
- Solar Flux (was: Re: China prefix hijack) Robert E. Seastrom (Apr 11)
- Re: Solar Flux (was: Re: China prefix hijack) Michael Dillon (Apr 11)
- Re: Solar Flux (was: Re: China prefix hijack) Valdis . Kletnieks (Apr 11)
- Re: Solar Flux (was: Re: China prefix hijack) Micheal Patterson (Apr 11)
- Re: Solar Flux (was: Re: China prefix hijack) Scott Howard (Apr 11)
- RE: Solar Flux (was: Re: China prefix hijack) Joe (Apr 11)
- Re: Solar Flux (was: Re: China prefix hijack) Andy Koch (Apr 11)
- Re: Solar Flux Leigh Porter (Apr 12)
- AS23724 oops? was Re: China prefix hijack Rob Thomas (Apr 08)
- Re: China prefix hijack Andree Toonk (Apr 08)