nanog mailing list archives

Re: Anycast but for egress


From: Vimal <j.vimal () gmail com>
Date: Thu, 29 Jul 2021 09:37:16 -0700

Great point.  We don't need geo-diversity for websites with the IP address
issue, so we could design for that case specially on a one-off basis.

For throughput it shouldn't be an issue where we're located, but we often
find websites serving different content based on the source IP of the
traffic.  So, having a presence closer to the user is useful.  But then
again, this is a different concern that's orthogonal to the original
question, because geo-ip doesn't make much sense with an anycast IP.  For
those websites that need a stable IP for NACLs *and* serve different
content based on source IP, we have to use the predictable 3-5 IPs per site
suggestion of yours.



On Wed, Jul 28, 2021 at 11:27 AM Glenn McGurrin via NANOG <nanog () nanog org>
wrote:

I'd had a similar thought/question, though keeping the geo diversity,
you manage the crawlers, and are making contact individually with these
sites from what you have stated (and so don't need a one size fit's all
list for public posting), so why not have a restricted subset of the
crawlers handle sites with these issues (which subset may be unique per
site, which makes maintaining even load balancing not overly complex
/limiting, especially as you are using nat anyway, so multiple servers
can be behind each ip and that number can vary).  That let's you have
geo diversity (or even multi cloud diversity) for every site, but each
site that needs this IP whitelisting only needs 3-5 IP's at any site,
but yet you can distribute load over a much larger overall set of
machines and nat gateways.

As I understand it even CDN's that anycast TCP (externally or internally
[load balancing via routers and multi path]) do similar by spreading
load over multiple IP's at the DNS layer first.

As the transition to IPv6 happens you may have it easier as getting a
large enough allocation to allow for splitting it out into multiple
subnets advertised from different locations without providers dropping
the route as too long a prefix is much easier on the v6 side, so you
could give one /36 or /40 or even /44 out to whitelist but have /48's at
each location.  For sites with ipv6 support that may help now, but it
won't help all sites for quite some time, though the number that support
v6 is slowly getting better.  For the foreseeable future you still need
to handle the v4 side one way or another though.

On 7/28/2021 10:21 AM, William Herrin wrote:
On Wed, Jul 28, 2021 at 6:04 AM Vimal <j.vimal () gmail com> wrote:
My intention is to run a web-crawling service on a public cloud. This
service
is geographically distributed, and therefore will run in multiple
regions
around the world inside AWS... this means there will be multiple AWS
VPCs,
each with their own NAT gateway, and traffic destined to websites
that we crawl will appear to come from this NAT gateway's IP address.

Hello,

AWS does not provide the ability to attach anycasted IP addresses to a
NAT gateway, regardless of whether it would work, so that's the end of
your quest.

The reason I want a predictable IP is to communicate this IP to website
owners so they can allow access from these IPs into their networks.
I chose IP as an example; it can also be a subnet, but what I don't
want to
provide is a list of 100 different IP addresses without any
predictability.

If you bring your own IP addresses, you can attach a separate /24s of
them to your VPCs in each region, providing you with a single
predictable range of source addresses. You will find it difficult and
expensive to acquire that many IP addresses from the regional
registries for the purpose you describe.


Silly question but: for a web crawler, why do you care whether it has
the limited geographically distribution that a cloud service provides?
It's a parallel batch task. It doesn't exactly matter whether you have
minimum latency.

Regards,
Bill Herrin






-- 
Vimal

Current thread: