nanog mailing list archives

Re: An appeal for more bandwidth to the Internet Archive


From: Denys Fedoryshchenko <nuclearcat () nuclearcat com>
Date: Wed, 13 May 2020 11:43:20 +0300

On 2020-05-13 11:00, Mark Delany wrote:
On 13May20, Denys Fedoryshchenko allegedly wrote:
What about introducing some cache offloading, like CDN doing? (Google,
Facebook, Netflix, Akamai, etc)

Maybe some opensource communities can help as well

Surely someone has already thought thru the idea of a community CDN?
Perhaps along the lines of pool.ntp.org? What became of that
discussion?

Maybe a TOR network could be repurposed to cover the same ground.


Mark.
I believe tor is not efficient at all for this purposes. Privacy have very high overhead.

Several schemes exist:
1)ISP announce in some way subnets he want to be served from his cache.
1.A)Apple cache way - just HTTP(S) request will turn specific IP to ISP cache. Not secure at all. 1.B)BGP + DNS, most common way. ISP does peering with CDN, CDN will return ISP cache nodes IP's to DNS requests. It means for example content.archive.org will have local node A/AAAA records (btw where is IPv6 for archive?) for
customers of ISP with this node, or anybody who is peering with it.
Huge drawback - archive.org will need to provide TLS certificates for web.archive.org each local node, this is bad and probably no-go. Yes, i know some schemes exist, that certificate is not present on local node, but some "precalculated" result used, but it is too complex. 1.C)BGP + HTTP redirect. If ISP has peering with archive.org, to all subnets announced users will get 302 or some HTTP redirect. Next is almost same and much better, but will require small modifications of content engine or frontend balancers. 1.D)BGP + HTTP rewrite. If ISP <*same as before*> URL is rewritten within content e.g. http://web.archive.org/web/20200511193226/https://git.kernel.org/torvalds/t/linux-5.7-rc5.tar.gz will appear as
http://emu.st.node.archive.org/web/20200511193226/https://git.kernel.org/torvalds/t/linux-5.7-rc5.tar.gz
or
http://archive-org.proxy.emu.st/web/20200511193226/https://git.kernel.org/torvalds/t/linux-5.7-rc5.tar.gz
In second option ISP can handle SSL certificate by himself.
2)BGP announce of archive.org subnets locally. Prone to leaks, require TLS certificates and etc, no-go.

You can still modify some schemes, and make other options that no one has yet implemented. For example, to do everything through javascript (CDNs cannot afford it, because of way they work), and for example, website generate content links dynamically, for that client request some /config.json file (which is dynamically generated and cached for a while), so we give it to IPs that have a local node - URL of the local node, for the rest -
default url.



Current thread: