nanog mailing list archives

Re: OT: Traffic Light Control (was Re: First real-world SCADA attack in US)

From: Thomas Maufer <tmaufer () gmail com>
Date: Wed, 23 Nov 2011 18:41:58 -0800

I have to jump in on this thread. Traffic light controllers are a fun category of technical artifacts. The weatherproof
boxes that the relays used to live in have stayed the same size for decades, but now the controllers just take a teeny
tiny circuit board rattling around in this comparatively huge box. And it's full of software, dontcha know? So why not
have lots of newfangled features? Curiously, the people who make the insides of the box have a WHOLE DIFFERENT way of
thinking about "what a traffic light controller should do?" - the "insider" people are in the 21st century, while the
"outsider" people are in the early 20th century. Lemme splain.

A particular traffic light controller that I tested in 2007 had an FTP server inside it. I have no idea why. So I tried
fuzzing it. 5 minutes into the test, the test aborted because the DuT wouldn't restart anymore. Upon investigation, we
discovered that a particular FTP sequence had triggered a bug that had a rather unfortunate (side-)effect: The flash
file system of the traffic light controller was formatted or erased. As a bonus, the device also had crashed and it was
awaiting a ZMODEM file download since it didn't have a boot image any more. We couldn't test anything else because we
didn't have the special serial cable to (re-)install the OS. Fail-safe? Not hardly: Not when it has no software! It's a
lump of highly refined sand, in a plastic case.

There are many lessons here, not least of which is: Ship the device with the smallest possible attack surface! Why the
heck was FTP enabled? Clearly this device had never been subjected to any negative testing. And these devices are meant
to be networked, so that FTP bug will be tickled someday, I just don't know when. Yes, it was reported to the vendor,
and no, I have no idea if they ever fixed it.

Also, in this thread I have seen several references to "fail-safe" or "redundancy" features. In my experience, those
are often some of the weakest aspects of some systems. In one case, I my testing rendered a multi-million-dollar highly
redundant VoIP soft switch useless by constantly causing the primary to fail - and while the secondary was being
activated, there was a quiet period of 2-3 seconds during which time no calls went through. Shortly after the secondary
had become the primary, it failed again, continuing the cycle. Literally traffic amounting to one packet (about 100
bytes, IIRC) per second of carefully crafted SIP INVITES could make this switch completely useless. The bug I found
involved SIP INVITE messages that could not be filtered…unless you didn't want to accept VoIP phone calls at all, which
calls into question your purchase of the multi-million-dollar highly redundant soft switch. That bug was fixed.

Software is tricky stuff. The number of ways it can fail is practically infinite, but there is generally only a small
number of ways for it to work correctly. Networked software is particularly challenging to write because the software
engineers don't get to control their inputs. The intervening network can (does) fold, spindle, mutilate, truncate,
drop, reorder or duplicate packets and your code on the receiving end has to try to understand what was intended by the
sender. Oh, and the sender might be following an older version of the standard (if one even exists) or simply have
included some bugs of their own. Because the coders are so focused on making their code do what the MRD/PRD required -
on a tight schedule! - they have little time to imagine all the possible ways their code might fail. Their
error-handling routines are simply never imaginative enough to handle real-world brokenness. It *is* possible to test
this stuff, but time pressures in release schedules don't leave a lot of breathing room for developers to take on whole
new classes of tasks that are outside their expertise (security testing). So you end up with a traffic light controller
that erases its own flash file system when it receives a slightly strange but completely legal FTP command, or a highly
redundant VoIP soft switch that is only good at ping-ponging from primary to secondary CPUs. Don't even get me started
on problems I have found in carrier-class routers.

I don't need to name names: All software has bugs (except possibly the code in the main computers on the Space
Shuttle). Every engineer I have ever known has tried to write their code well, but automated negative testing has only
recently caught up to where the engineers and QA staff can focus on what they do best (write and test code that
implements features that someone can buy), and let purpose-built tools do the negative testing for them, so their
error-handling routines can be robust, too. Fixing bugs is generally straightforward. Finding them has always been the
challenge.

~tom

</unlurks>

On 23 Nov 2011, at 17:59 , Brett Frankenberger wrote:

On Wed, Nov 23, 2011 at 05:45:08PM -0500, Jay Ashworth wrote:


Yeah.  But at least that's stuff you have a hope of managing.  "Firmware
underwent bit rot" is simply not visible -- unless there's, say, signature 
tracing through the main controller.


I can't speak to traffic light controllers directly, but at least some
vital logical controllers do check signatures of their firmware and
programming and will fail into a safe configuration if the
signatures don't validate.

    -- Brett

Current thread:

Re: OT: Traffic Light Control (was Re: First real-world SCADA attack in US), (continued)
- - - Re: OT: Traffic Light Control (was Re: First real-world SCADA attack in US) Jay Hennigan (Nov 23)
    - Re: OT: Traffic Light Control (was Re: First real-world SCADA attack in US) Jay Ashworth (Nov 23)
    - Re: OT: Traffic Light Control (was Re: First real-world SCADA attack in US) Jay Hennigan (Nov 23)
- Re: OT: Traffic Light Control (was Re: First real-world SCADA attack in US) Jay Hennigan (Nov 22)
  - Re: OT: Traffic Light Control (was Re: First real-world SCADA attack in US) Bryan Fields (Nov 23)
    - Re: OT: Traffic Light Control (was Re: First real-world SCADA attack in US) Valdis . Kletnieks (Nov 23)
    - Re: OT: Traffic Light Control (was Re: First real-world SCADA attack in US) Mark Radabaugh (Nov 23)
    - Re: OT: Traffic Light Control (was Re: First real-world SCADA attack in US) Robert E. Seastrom (Nov 23)
  - Re: OT: Traffic Light Control (was Re: First real-world SCADA attack in US) Jay Ashworth (Nov 23)
    - Re: OT: Traffic Light Control (was Re: First real-world SCADA attack in US) Brett Frankenberger (Nov 23)
    - Re: OT: Traffic Light Control (was Re: First real-world SCADA attack in US) Thomas Maufer (Nov 23)
- Re: OT: Traffic Light Control (was Re: First real-world SCADA attack in US) Joel jaeggli (Nov 25)
  - Re: OT: Traffic Light Control (was Re: First real-world SCADA attack in US) Jay Hennigan (Nov 25)
    - Re: OT: Traffic Light Control (was Re: First real-world SCADA attack in US) Joel jaeggli (Nov 25)