nanog mailing list archives
Re: Anyone have contacts at the Amazon or OpenAI web spiders?
From: Patrick Clochesy <patrick () mach net>
Date: Tue, 13 Feb 2024 21:38:43 -0600
Both robots respect robots.txt, of course they’re not going to answer. On Feb 13, 2024, at 8:35 PM, John Levine <johnl () iecc com> wrote:
One day I set up the world's lamest content farm. You can see it here: https://www.web.sp.am/ While humans tend not to find its six billion pages very interesting, some web spiders are entranced. In the past week or so, Amazon's amazonbot has visited it 6 million times, and OpenAI's gptbot 2.6 million. (If you were wondering what they use to train ChatGPT, now you know.) I don't care that googlebot comes by every 5 or 10 minutes, but gptbot is every few seconds and amazon as fast as the server will respond. They both come from predictable IPs so I can set packet filters but they're still hammering pretty hard. Each has a URL in the user agent string, Amazon's page has an address to write to but OpenAI's doesn't. I wrote to the Amazon address, no response. If anyone has contacts at either I would appreciate it. A few years ago the bingbot got trapped but fortunately I knew someone at Microsoft who could pass the word. He reported back that while he could not go into detail, there was a great deal of animated conversation at the other end of the hall, and shortly after that it stopped. R's, John
Current thread:
- Anyone have contacts at the Amazon or OpenAI web spiders? John Levine (Feb 13)
- Re: Anyone have contacts at the Amazon or OpenAI web spiders? Patrick Clochesy (Feb 13)
- Re: Anyone have contacts at the Amazon or OpenAI web spiders? John Levine (Feb 14)
- Re: Anyone have contacts at the Amazon or OpenAI web spiders? Lincoln Dale (Feb 13)
- Re: Anyone have contacts at the Amazon or OpenAI web spiders? John R. Levine (Feb 14)
- Re: Anyone have contacts at the Amazon or OpenAI web spiders? Patrick Clochesy (Feb 13)