nanog mailing list archives
Re: Service provider story about tracking down TCP RSTs
From: Lee <ler762 () gmail com>
Date: Sun, 2 Sep 2018 02:38:57 -0400
On 9/1/18, William Herrin <bill () herrin us> wrote:
On Sat, Sep 1, 2018 at 6:11 PM, Lee <ler762 () gmail com> wrote:On 9/1/18, William Herrin <bill () herrin us> wrote:On Sat, Sep 1, 2018 at 4:00 PM, William Herrin <bill () herrin us> wrote:Better yet, do the job right and build an anycast TCP stack as described here: https://bill.herrin.us/network/anycasttcp.htmlAn explosion in state management would be the least of my worries :) I got as far as your Third hook: and thought of this https://www.jwz.org/doc/worse-is-better.htmlHi Lee, On a brief tangent: Geographic routing would drastically simplify the Internet core, reducing both cost and complexity. You'd need to carry only nearby specific routes and a few broad aggregates for destinations far away. It will never be implemented, never, because no cross-ocean carriers are willing to have their bandwidth stolen when the algorithm decides it likes their path better than a paid one. Even though the algorithm gets the packets where they're going, and does so simply, it does so in a way that's too often incorrect. Then again, I don't really understand the MIT/New Jersey argument in Richard's worse-is-better story.
The "New Jersey" description is more of a caricature than a valid description: "I have intentionally caricatured the worse-is-better philosophy to convince you that it is obviously a bad philosophy and that the New Jersey approach is a bad approach." I mentally did a 's/New Jersey/Microsoft/' and it made a lot more sense.
The MIT guy says that a routine should handle a common non-fatal exception. The Jersey guy says that it's ok for the routine to return a try-again error and expect the caller to handle it. Since its trivial to build another layer that calls the routine in a loop until it returns success or a fatal error, it's more a philosophical argument than a practical one. As long as a correct result is consistently achieved in both cases, what's the difference?
That it's not always a trivial matter to build another layer. That your retry layer needs at least a counter or timeout value so it doesn't retry forever & those values need to be user configurable, so the re-try layer isn't quite as trivial as it appears at first blush.
Richard characterized the Jersey argument as, "It is slightly better to be simple than correct." I just don't see that in the Jersey argument. Every component must be correct. The system of components as a whole must be complete. It's slightly better for a component to be simple than complete. That's the argument I read and it makes sense to me.
Yes, I did a lot of interpreting also. Then I hit on s/New Jersey/Microsoft/ and it made a lot more sense to me.
Honestly, the idea that software is good enough even with known corner cases that do something incorrect... I don't know how that survives in a world where security-conscious programming is not optional.
Agreed. I substituted "soft-fail or fail-closed: user has to retry" for doing something incorrect.
I had it much easier with anycast in an enterprise setting. With anycast servers in data centers A & B, just make sure no site has an equal cost path to A and B. Any link/ router/ whatever failure & the user can just re-try.You've delicately balanced your network to achieve the principle that even when routing around failures the anycast sites are not equidistant from any other site. That isn't simplicity. It's complexity hidden in the expert selection of magic numbers.
^shrug^ it seemed simple to me. And it was real easy to explain, which is why I thought of that "worse is better" paper. I took the New Jersey approach & did what was basically a hack. You took the MIT approach and created a general solution .. which is not so easy to explain :)
Even were that achievable in a network as chaotic as the Internet, is it simpler than four trivial tweaks to the TCP stack plus a modestly complex but fully automatic user-space program that correctly reroutes the small percentage of packets that went astray?
Your four trivial tweaks to the TCP stack are kernel patches - right? Which seems not at all trivial to me, but if you've got a group of people that can support & maintain that - good for you! Regards Lee
Current thread:
- Service provider story about tracking down TCP RSTs frnkblk (Sep 01)
- Re: Service provider story about tracking down TCP RSTs William Herrin (Sep 01)
- Re: Service provider story about tracking down TCP RSTs William Herrin (Sep 01)
- Re: Service provider story about tracking down TCP RSTs Lee (Sep 01)
- Re: Service provider story about tracking down TCP RSTs William Herrin (Sep 01)
- Re: Service provider story about tracking down TCP RSTs Lee (Sep 01)
- Re: Service provider story about tracking down TCP RSTs William Herrin (Sep 01)
- Re: Service provider story about tracking down TCP RSTs Bjørn Mork (Sep 02)
- Re: Service provider story about tracking down TCP RSTs William Herrin (Sep 02)
- Re: Service provider story about tracking down TCP RSTs Bjørn Mork (Sep 02)
- Re: Service provider story about tracking down TCP RSTs William Herrin (Sep 02)
- Re: Service provider story about tracking down TCP RSTs William Herrin (Sep 01)
- Re: Service provider story about tracking down TCP RSTs James Bensley (Sep 02)
- Re: Service provider story about tracking down TCP RSTs nanog (Sep 02)
- Re: Service provider story about tracking down TCP RSTs Tarko Tikan (Sep 02)
- Re: Service provider story about tracking down TCP RSTs Timothy Manito via NANOG (Sep 04)