nanog mailing list archives

Re: Disaster Recovery Process


From: Jamie Dahl <jamied () meatball net>
Date: Tue, 5 Oct 2021 10:14:45 -0700

The NIMS/ICS system works very well for issues like this.   I utilize ICS regularly in my Search and Rescue world, and 
the last two companies I worked for utilize(d) it extensively during outages.  It allows folks from various different 
disciplines, roles and backgrounds to come in, and provide a divide and conquer methodology to incidents and can be 
scaled up/scaled out as necessary.  Phrases like "Incident Commander" and such have been around for a few decades and 
are concepts used regularly by FEMA, CalFire and other natural disaster style incidents.  But those of you who may be 
EMComm folks probably already knew that ;-). 



this was pounded out on my iPhone and i have fat fingers plus  two left thumbs :)

We have to remember that what we observe is not nature herself, but nature exposed to our method of questioning.


On Oct 5, 2021, at 10:11, jim deleskie <deleskie () gmail com> wrote:


World broke.  Crazy $$ per hour down time.  Doors open with a fire axe.  Glass breaks super easy too and much less 
expensive then adding 15 min to failure.

-jim

On Tue., Oct. 5, 2021, 7:05 p.m. Jeff Shultz, <jeffshultz () sctcweb com> wrote:
7. Make sure any access controlled rooms have physical keys that are available at need - and aren't secured by the 
same access control that they are to circumvent. . 
8. Don't make your access control dependent on internet access - always have something on the local network  it can 
fall back to. 

That last thing, that apparently their access control failed, locking people out when either their outward facing 
DNS and/or BGP routes went goodbye, is perhaps the most astounding thing to me - making your access control into an 
IoT device without (apparently) a quick workaround for a failure in the "I" part.

On Tue, Oct 5, 2021 at 6:01 AM Jared Mauch <jared () puck nether net> wrote:


On Oct 4, 2021, at 4:53 PM, Jorge Amodio <jmamodio () gmail com> wrote:

How come such a large operation does not have an out of bound access in case of emergencies ???



I mentioned to someone yesterday that most OOB systems _are_ the internet.  It doesn’t always seem like you need 
things like modems or dial-backup, or access to these services, except when you do it’s critical/essential.

A few reminders for people:

1) Program your co-workers into your cell phone
2) Print out an emergency contact sheet
3) Have a backup conference bridge/system that you test
  - if zoom/webex/ms are down, where do you go?  Slack?  Google meet? Audio bridge?
  - No judgement, but do test the system!
4) Know how to access the office and who is closest.  
  - What happens if they are in the hospital, sick or on vacation?
5) Complacency is dangerous
  - When the tools “just work” you never imagine the tools won’t work.  I’m sure the lessons learned will be long 
internally.  
  - I hope they share them externally so others can learn.
6) No really, test the backup process.



* interlude *

Back at my time at 2914 - one reason we all had T1’s at home was largely so we could get in to the network should 
something bad happen.  My home IP space was in the router ACLs.  Much changed since those early days as this 
network became more reliable.  We’ve seen large outages in the past 2 years of platforms, carriers, etc.. (the Aug 
30th 2020 issue is still firmly in my memory).  

Plan for the outages and make sure you understand your playbook.  It may be from snow day to all hands on deck.  
Test it at least once, and ideally with someone who will challenge a few assumptions (eg: that the cell network 
will be up)

- Jared


-- 
Jeff Shultz


Like us on Social Media for News, Promotions, and other information!!

                 







*** This message contains confidential information and is intended only for the individual named. If you are not the 
named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by 
e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission 
cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, 
arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or 
omissions in the contents of this message, which arise as a result of e-mail transmission. ***

Current thread: