nanog mailing list archives

Re: Responsible Network Management Guidelines


From: "Jay R. Ashworth" <jra () scfn thpl lib fl us>
Date: Wed, 24 Sep 1997 23:27:17 -0400

On Wed, Sep 24, 1997 at 07:29:38PM -0500, Sean Donelan wrote:
In addition to any substantive comments, now is the time to correct
the grammer and spelling nits.  I plan on throwing this into the
Informational RFC process before the next IETF meeting.

Here goes.  Didn't realize it was that small...

(Warning: I got about halfway through, and realized I was editing, rather
than just copyediting -- feel free to ignore those parts if you see fit.)

Operational Requirements Area                                 S. Donelan
INTERNET DRAFT                                                       DRA
<draft-donelan-rnmg-01.txt>                               September 1997


               Responsible Network Management Guidelines

Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts as reference
                                                ^
   material or to cite them other than as ``work in progress.''

   To learn the current status of any Internet-Draft, please check the
   ``1id-abstracts.txt'' listing contained in the Internet- Drafts
                                                             ^
   Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
   ftp.isi.edu (US West Coast).

Rational and Scope
  Rationale
 
   This document provides Responsible Network Management personnel of

All three of those words likely should not be capitalized; you're using
the term generically, not as a job title.

   Internet Service Providers (ISPs) and Internet Service Customers

I know you had to make _something_ up to call them there... but I always
have a vague, unallocated unease about new initialisms.  Might you just say
"their customers"?

   (ISCs) with guidelines for network management when the following
   conditions arise:

       - Routine Maintenance Activity
       - Problem Reporting and Referral
       - Escalation
       - End-to-End Testing
       - Customer Notification
       - Emergency Communications
       - Network Service Interuption Measurement

   Specific procedures will require negotiations between the
   organizations involved.  These guidelines do not replace or supersede
                                               ^^^^^^
"are not intended to"?

   agreements or any other legally binding documents.

Responsible Internet Service Provider

   A more familar term in Internet Standards is an Autonomous System.
   Since this document has additional requirements than an entity
   represented by an Autonomous System or Systems, this document creates a
   new entity.

"has...than" is a clumsy construct at best.  Are you trying to say

Since this document defines requirements additional to those customarily
expected of the operators of an Autonomous System, it must define a new
entity, encompassing AS's and also other organizations.

?

   The Responsible Internet Service Provider (RISP) has overall
   responsibility for Internet service between its Internet Service
   Customers and other Internet Service Providers making up the
   Internet.

Ok, so, basically, a RISP is a repository for a contact?

   An Internet Network, Autonomous System or group of Autonomous Systems
   may designate another entity to act on its behalf as its Responsible
   Internet Service Provider.  In this document, Internet Service
   Customer (ISC) shall refer to the collective network, Autonomous
   System or Systems which designated the Responsible Internet Service
   Provider as their agent.

Roughly.  An agent, in legal terms.

   The Responsible Internet Service Provider is responsible for:

   -- Providing a contact that is readily accessible 24 hours a day, 7
   days a week.

   -- Providing trained personnel.

   -- Acting as the Internet Service Customer's (ISC) primary contact in
   all matters involving Internet Service between Internet Providers.

   -- Accept problem reports from Internet Service Customers and casual

        Accepting

   end users or other parties receiving Internet Service problem
   reports.  The RISP may prioritize problem reports from its own ISCs,
   or refer casual end users to their primary RISP, if known.

This graf sounds like it's making an assumption that _I_, at least,
apparently am not equipped to make, as I fell off a couple turns back.

The first sentence could use to be recast.

   -- Advising the ISC when there is an ISP failure affecting the ISC

   -- Isolating problems to determine if the reported trouble is in the
   ISP's facilities or in other providers' service.

   -- Testing cooperatively, when necessary, with other providers to
   further identify a problem when it has been isolated to another
   provider's service.

Suggest moving the parenthetical after "providers".

   -- Keeping its ISC advised of the status of the trouble repair.

   -- Maintaining complete and accurate records of its own customers and

So, basically, a RISC is an administrative and technical Point of Contact
designee?

Routine Maintenance Activity

   Responsible Internet Service Providers should perform routine
   maintenance work during hours of minimum traffic to impact the least
   number of customers.  In most areas, the period of lowest Internet
   traffic is between 1am and 6am local time.  Trans-contential and
   inter-contential connections should consider the local time on each
   end of the connection.

It's worthy of note (it was in one of the last 4 RISKS Digests) that, for
some things -- backbone gear, NAP's, webfarms, etc -- there _is_ _no_
good time to do maintenance.  The audience is world wide and,
statistically, you simply can't find a good hour to do it.  It might be
suggested that each category of operators ought to keep their own
traffic logs, to roughly hourly granularity, maybe, to facilitate the
determination of "the best time to down the router".

   Activities which may affect other Internet Service Providers should
   be coordinated with the affected providers.

Channels should be designed in advance for this sort of communication
(email, voice, pager, etc.), and tested regularly?

Problem Reporting and Referral

   The Responsible Internet Service Provider is responsible for
   performing all the necessary tests to determine the nature of the
   problem detected, or reported by its customers or by referral from
   other ISPs.  If the trouble is isolated to an ISC or another ISP, the
   RISP will report the trouble to the appropriate ISC or ISP point of
   contact.

   An example of the information exchanged in the problem referral
   report:

   -- Description of the problem, including source address/name,
   destination address/name, application or protocol involved, when it
   last worked, when it stopped working, and any diagnostic messages or
   test data (i.e. ping, traceroute).

   -- Customer reported problem severity

   -- RISP determination of problem severity

   -- The name and contact information of the person referring the
   problem

   -- The referee's trouble ticket number, and origination date/time

   -- The name of the person accepting the report

   -- The acceptor's trouble ticket number, and acceptance data/time

Oh, _ghod_ if we could design a standardized trouble ticket interchange
format.  Excuse me, I feel an RFC coming on.  :-)

   Periodic status reports shall occur when the problem has been
   isolated, when there is a significant change in the status of the
   problem, and when negotiated time intervals expire.  Escalation will
   be according to negotiated procedures.

And prior negotiation should probably take place to decide on
equivalencies of severity levels and escalation justifications, etc.

Sorry; I'm a systems designer by trade; the stuff just runs out of my
fingertips.  :-)

   Problem isolation may require cooperative testing between the ISC and
   ISP(s), which shall be provided when requested.  The provider making
   the test is responsible for coordination.

   When the problem has been cleared, the ISP/ISP or ISP/ISC shall
   advise the other the problem has been cleared.  When closing a problem
   report between ISP/ISP or ISP/ISC, the disposition should be
   furnished by the organization closing the ticket.

Are thos slashed abbreviations _correct_?  I guess I missed something; I
don't have an expansion ready to hand that fits.

   An example of the information exchanged in the problem disposition:

   -- Trouble ticket number

   -- Referral datetime

   -- Returned datetime

   -- Trouble identified as

   -- Resolution details

   -- Service charges, if the ticket resulted in a service charge

   If there is a disagreement about the disposition of a problem ticket,
   the parties involved should document their respective positions and
   the names of the individuals involved.  Escalation will be made
   according to each organizations escalation procedures.

Glad this is in here... :-)

Escalation

   Each ISP and ISC shall establish procedures for timely escalation of
   problems to successive levels of management.  The procedures should
   include the provision of status reports to the other provider or
   customer regarding the ticket status.  Both technical and management
   contacts should be included in the escalation procedures.

I suspect that's not enough... but we'll see...

End-to-End Testing

   Networks may experience problems which cannot be isolated by each
   provider individually testing and maintaining its own services.  Each
   providers' service may appear to perform correctly, but trouble
   appears on an end-to-end service.  The ISC's RISP should coordinate
   end-to-end testing with each sectional provider by problem referral
   through their Responsible Internet Service Provider.  Each Internet
             ^^^^^
Pronoun without a referent.  Whose?  The ISC?  The RISP?  The sectional
provider?  (There's another new piece of terminology.)

   Service Provider should accept the referral request for end-to-end
   testing coordination, and provide the contact information for the
   next sectional provider to the original requestor.

This assumes to some extent that the customers -- even though they're
paying for the lines -- can actually _get_ the information from the
vendors... something which isn't always true.  Perhaps a statement
encouraging that?

Customer Notification

   During a major outage a potential concern is customer goodwill and
                          ,
   network congestion caused by repeated customer attempts to access the
   down network.  An informed customer can reduce customer frustration,
   and network congestion.

   Pre-planning for quick notification can be most beneficial in
   alerting customers.

   Some example methods to notify customers include:

   -- If operational, network access equipment can display an alert when
   customers connect.  The alert should be displayed before the customer
   logs into the network.  If the network fails during or after
   attempting to validate the access information, the alert should not
   compromise any authentication information.

Particularly consumer software _really_ ought to have provision for a
messaging system, like the motd and/or wall.  The lack of this on, say,
Win95 drives me up a tree...

   -- Customer service calls increase dramatically during network
   failures.  An informed customer representative can advise the
   customer on the best course of action.  A method to quickly instruct
   customer service representatives on the available options should be
   implemented.

Putting known outages on the automated attendant, like the cable
companies do, would be nice.  I know good engineering will _never_ win
out over paranoid management, but if I'm paying for a service, I don't
wanna _guess_ when it's broken.  I don't _care_ if the announcements
make life harder for the sales team.  Maybe they won't have so many
outages...

   -- The media, radio or television, can be used to inform the public.
   Pre-arrangements, and planning are needed to ensure only designated
   contacts are made with the media.

Is there _any_ part of the net that's this globally critical?

   -- Other automated announcements, such as World Wide Web pages or e-
   mail distribution lists with backup through other providers, recorded
   telephone status lines, or broadcast FAX/Pager notifications.

   Public notifications, when utilized, should not make reference by
   name to the organization believed causing the problem unless the
                                      ^ to be
   organization causing the problem has been confirmed.  Internet
   network problems can be difficult to isolate, and can give misleading
   indications to their true origin.

Confirmed is a sticky concept.  I wouldn't _ever_ announce it, myself.

Unless that party did, and "who's allowed to say you can announce it" is
something you need to track.

Emergency Communications

   Recognizing that all Responsible Internet Service Providers have a
   responsibility to provide an adequate level of support for their
   service and/or products, it is recommended they participate in an
   backup emergency communications system.

Like having valid whois(1) info?  :-)

   The backup emergency communications system should not depend on the
   operation of the primary network for obtaining contact,
   authentication, or other communications information during a network
   problem.  Each RISP is responsible for providing a Emergency Point Of
   Contact.  It is recommended each Emergency POC have at least one
   out-of-band contact method, such as an internationally dialable (non
   1-800) voice and/or fax telephone number.  Each RISP should pre-
   arrange a method for verifying the identity of the Emergency Point of
   Contacts using alternative communications methods, such as a

     Contact

   challange/response code-word or call-back to a known telephone

     challenge

   number.

Note that this isn't always good enough, if the problem is an attack.
Call-forwarding and butt-sets, doncha know.

   Each RISP should maintain a current off-line copy of the emergency
   contact procedures for each gateway inter-connection.  Each RISP
   should establish procedures for keeping the off-line emergency
   contact procedures updated.  Each RISP shall test and verify its own
   emergency POC procedures are accurate and functioning on a regular
   basis, no less than once a year.

On the net?  Monthly...

Network Service Interuption Measurement

   Each ISP/ISC should maintain accurate records about service
   interruptions to measure and develop trend analysis of their network
   availability.

Security Considerations

You may wish to choose a different section title.  "Security
Considerations" is customarily used to mean "...of implementation of the
procedures in this RFC", which is, I think, not what you mean here...

   -- Maintain a complete and accurate record of a RISP's own customers
   and inter-provider gateways.

   -- Public notifications, when utilized, should not make reference by
   name to the organization believed causing the problem.

   -- If the network fails during or after attempting to validate the
   access information, the alert should not compromise any
   authentication information.

   -- Each RISP should pre-arrange a method for verifying the identity
   of Point of Contacts using alternative communications methods, such
   as a challange/response code-word or call-back to a known telephone

          challenge
   number.


Author's Address

   Sean Donelan
   Data Research Associates, Inc.
   1276 North Warson Road
   Saint Louis, MO 63132

   Phone: +1-314-432-1100
   EMail: sean () DRA COM

Not bad.  But, from down here in the trenches, I think it could use
another round of flogging.  How much commentary have you gotten on it?

Cheers,
-- jr 'will stick fingers in others' RFCs for food' a
-- 
Jay R. Ashworth                                                jra () baylink com
Member of the Technical Staff             Unsolicited Commercial Emailers Sued
The Suncoast Freenet      "People propose, science studies, technology
Tampa Bay, Florida          conforms."  -- Dr. Don Norman      +1 813 790 7592


Current thread: