nanog mailing list archives

Re: Bandwidth distribution per ip


From: Karsten Elfenbein <karsten.elfenbein () gmail com>
Date: Thu, 21 Dec 2017 11:35:08 +0100

Hi,

sounds like you are hosting the origin for the CDN which causes issues.
Does the CDN care where it is pulling the data from?
Could you place a cheaper origin somewhere else? Like AWS, Italy,
Katar or Amsterdam? For 150k/month you can get a lot of
bandwidth/storage/rack space somewhere else.
An other option could be to use something like origin storage where
the content is stored on a CDN provider server already.
Other than that you could check the hashing with your upstream
provider and make them use layer 4 info as well. If they refuse you
might be able to free up some IPs by reducing ptp links to /31 or some
ugly NAT tricks where ports are pointing to different services. (Mail
ports go to mailserver and http to CDN unit)
For ~$37.5k you can also buy some more prefixes to announce.


Karsten

2017-12-20 18:04 GMT+01:00 Denys Fedoryshchenko <denys () visp net lb>:
On 2017-12-20 17:52, Saku Ytti wrote:

On 20 December 2017 at 16:55, Denys Fedoryshchenko <denys () visp net lb>
wrote:

And for me, it sounds like faulty aggregation + shaping setup, for
example,
i heard once if i do policing on some models of Cisco switch, on an
aggregated interface, if it has 4 interfaces it will install 25% policer
on
each interface and if hashing is done by dst ip only, i will face such
issue, but that is old and cheap model, as i recall.


One such old and cheap model is ASR9k trident, typhoon and tomahawk.

It's actually pretty demanding problem, as technically two linecards
or even just ports sitting on two different NPU might as well be
different routers, they don't have good way to communicate to each
other on BW use. So N policer being installed as N/member_count per
link is very typical.

ECMP is fact of life, and even thought none if any provider document
that they have per-flow limitations which are lower than nominal rate
of connection you purchases, these do exist almost universally
everywhere. People who are most likely to see these limits are people
who tunnel everything, so that everything from their say 10Gbps is
single flow, from POV of the network.
In IPv6 world at least tunnel encap end could write hash to IPv6 flow
label, allowing core potentially to balance tunneled traffic, unless
tunnel itself guarantees order.

I don't think it's fair for operator to demand equal bandwidth per IP,
but you will expose yourself to more problems if you do not have
sufficient entropy. We are slowly getting solutions to this, Juniper
Trio and BRCM Tomahawk3 can detect elephant flows and dynamically
unequally map hash results to physical ports to alleviate the problem.

As person who is in love with embedded systems development, i just watched
today beautiful 10s of meters long 199x machine, where multi kW VFDs manage
huge motors(not steppers), dragging synchronously and printing on thin paper
with crazy speed and all they have is long ~9600 link between a bunch of
encoders
and PLC dinosaur managing all this beauty. If any of them will apply a bit
wrong
torque, stretched paper will rip apart.
In fact nothing complex there, and technology is ancient these days.
Engineers who cannot synchronize and update few virtual "subinstances"
policing ratio based on feedback, in one tiny, expensive box, with
reasonable
update ratio, having in hands modern technologies, maybe incompetent?

National operator doesn't provide IPv6, that's one of the problems.
In most of cases there is no tunnels, but imperfection still exist.
When ISP pays ~$150k/month (bandwidth very expensive here), and because
CDN has 3 units & 3 ingress ips, and carrier have bonding somewhere over
4 links, it just means ~$37.5k is lost in rough estimations,
no sane person will accept that.
Sometimes one CDN unit are in maintenance, and 2 existing can perfectly
serve demand, but because of this "balancing" issues - it just go down,
as half of capacity missing.

But, tunnels in rare cases true too, but what we can do, if they don't have
reasonable DDoS protection tools all world have (not even blackholing).
Many DDoS-protection operators charge extra for more tunnel endpoints with
balancing, and this balancing is not so equal as well (same src+dst ip at
best).
And when i did round-robin on my own solution, i noticed except
this "bandwidth distribution" issue, latency on each ip is unequal,
so RR create for me "out of order" issues.
Another problem, most popular services in region (in matters of bandwidth)
is facebook, whatsapp, youtube. Most of them is big fat flows running over
few ips. I doubt i can convince them to balance over more.


Current thread: