nanog mailing list archives

RE: AWS S3 DNS load balancer


From: Deepak Jain <deepak () ai net>
Date: Tue, 15 Jun 2021 16:38:03 +0000




I've just taken a squiz at an S3-based website we have, and via the S3 URL it is a CNAME with a 60-secod TTL pointing 
at a set of A records with 5-second TTLs.

Any one dig returns the CNAME and a single IP address:

dig our-domain.s3-website-ap-southeast-2.amazonaws.com.
our-domain.s3-website-ap-southeast-2.amazonaws.com.     14 IN CNAME s3-
website-ap-southeast-2.amazonaws.com.
s3-website-ap-southeast-2.amazonaws.com. 5 IN A 52.95.134.145

If the query is multiply repeated, the returned IP address changes, roughly every five seconds.

What's interesting is the name attached to the A records, which does not include "our-domain". It seems to be a record 
pointing to ALL S3 websites in the region. And all of the addresses I saw reverse-resolve to that one name. So there is 
definitely some under-the-bonnet magic discrimination going on.

In Route53 the picture is very different, with the published website host name (think "our-domain.com.au") resolving to 
four IP addresses that are all returned in the response to a single dig query. There is an A-ALIAS (a non-standard AWS 
record type) that points to a CloudFront distribution that has the relevant S3 bucket as its origin.

Using the CNAME bypasses the CloudFront distribution unless steps are taken to forbid direct access to the bucket. It 
would be usual to use (and enforce) access via CloudFront, if for no other reason than to provide for HTTPS access. 

---

So, depending on what query you make... you get very different answers. For example. If you try s3.amazon.com you get a 
CNAME to a rewrite.amazon.com which seems reasonable for any subdomain request that they would have a better response 
for. 

I don't remember, and they may be moving to deterministic subdomains as you've shown above, and only "legacy" uses go 
to s3.amazonaws.com. I remember hearing a big uproar about it. Perhaps an AWS person will chime in with some color on 
this.

So deterministic subdomain to a group of relatively deterministic endpoints, even round-robin, makes sense to me as 
in... "usual in the practice of the art." Even if those systems end up being load balancers for other systems behind 
them.

The s3.amazonaws.com is different than that. I'm guessing that no one (else) uses this sort of single IP from a pool 
trick and therefore it's not standard. Further, given that AWS appears to be moving *back* to the traditional way of 
doing things, there must be undesirable limitations to this model.

[just spitballing here]

Deepak

Current thread: