Interesting People mailing list archives

Re Lauren's Blog: "A Terrible Decision by the Internet Archive May Lead to Widespread Blocking"


From: "Dave Farber" <dave () farber net>
Date: Sat, 22 Apr 2017 18:46:44 +0000

---------- Forwarded message ---------
From: Jonathan S. Shapiro <jonathan.s.shapiro () gmail com>
Date: Sat, Apr 22, 2017 at 2:03 PM
Subject: Re: [IP] Lauren's Blog: "A Terrible Decision by the Internet
Archive May Lead to Widespread Blocking"
To: <dave () farber net>, ip <ip () listbox com>


It's not clear to me that RES has no legal force. Perhaps somebody with
greater knowledge could say here whether it has actually been tested.

It seems to me that a spider acting in contravention to a site's RES
specification is engaged in unauthorized access to a computer system, which
is a federal crime.

Has this been tested.


Jonathan


On Sat, Apr 22, 2017 at 10:26 AM Dave Farber <farber () gmail com> wrote:




Begin forwarded message:

*From:* Lauren Weinstein <lauren () vortex com>
*Date:* April 22, 2017 at 1:01:33 PM EDT
*To:* nnsquad () nnsquad org
*Subject:* *[ NNSquad ] Lauren's Blog: "A Terrible Decision by the
Internet Archive May Lead to Widespread Blocking"*


 A Terrible Decision by the Internet Archive May Lead to Widespread
Blocking


https://lauren.vortex.com/2017/04/22/a-terrible-decision-by-the-internet-archive-may-lead-to-widespread-blocking


We can stipulate at the outset that the venerable Internet Archive and
its associated systems like Wayback Machine have done a lot of good
for many years -- for example by providing chronological archives of
websites who have chosen to participate in their efforts. But now, it
appears that the Internet Archive has joined the dark side of the
Internet, by announcing that they will no longer honor the access
control requests of any websites.

For any given site, the decision to participate or not with the web
scanning systems at the Internet Archive (or associated with any other
"spidering" system) is indicated by use of the well established and
very broadly affirmed "Robots Exclusion Standard" (RES) -- a
methodology that uses files named "robots.txt" to inform visiting
scanning systems which parts of a given website should or should not
be subject to spidering and/or archiving by automated scanners.

RES operates on the honor system. It requests that spidering systems
follow its directives, which may be simple or detailed, depending on
the situation -- with those detailed directives defined
comprehensively in the standard itself.

While RES generally has no force of law, it has enormous legal
implications. The existence of RES -- that is, a recognized means for
public sites to indicate access preferences -- has been important for
many years to help hold off efforts in various quarters to charge
search engines and/or other classes of users for access that is free
to everyone else. The straightforward argument that sites already have
a way -- via the RES -- to indicate their access preferences has held
a lot of rabid lawyers at bay.

And there are lots of completely legitimate reasons for sites to use
RES to control spidering access, especially for (but by no means
restricted to) sites with limited resources. These include technical
issues (such as load considerations relating to resource-intensive
databases and a range of other related situations), legal issues such
as court orders, and a long list of other technical and policy
concerns that most of us rarely think about, but that can be of
existential importance to many sites.

Since adherence to the RES has usually been considered to be
voluntary, an argument can be made (and we can pretty safely assume
that the Archive's reasoning falls into this category one way or
another) that since "bad" players might choose to ignore the standard,
this puts "good" players who abide by the standard at a disadvantage.

But this is a traditional, bogus argument that we hear whenever
previously ethical entities feel the urge to start behaving
unethically: "Hell, if the bad guys are breaking the law with
impunity, why can't we as well? After all, our motives are much better
than theirs!"

Therein are the storied paths of "good intentions" that lead to hell,
when the floodgates of such twisted illogic open wide, as a flood of
other players decide that they must emulate the Internet Archive's
dismal reasoning to remain competitive.

There's much more.

While RES is typically viewed as not having legal force today, that
could be changed, perhaps with relative ease in many circumstances.
There are no obvious First Amendment considerations in play, so it
would seem quite feasible to roll "Adherence to properly published RES
directives" into existing cybercrime-related site access authorization
definitions.

Nor are individual sites entirely helpless against the Internet
Archive's apparent embracing of the dark side in this regard.

Unless the Archive intends to try go completely into a "ghost" mode,
their spidering agents will still be detectable at the http/https
protocol levels, and could be blocked (most easily in their entirety)
with relatively simple web server configuration directives. If the
Archive attempted to cloak their agent names, individual sites could
block the Archive by referencing the Archive's known source IP
addresses instead.

It doesn't take a lot of imagination to see how all of this could
quickly turn into an escalating nightmare of "Whac-A-Mole" and
expanding blocks, many of which would likely negatively impact
unrelated sites as collateral damage.

Even before the Internet Archive's decision, this class of access and
archiving issues had been smoldering for quite some time. Perhaps the
Internet Archive's pouring of rocket fuel onto those embers may
ultimately lead to a legally enforced Robots Exclusion Standard --
with both the positive and negative ramifications that would then be
involved. There are likely to be other associated legal battles as
well.

But in the shorter term at least, the Internet Archive's decision is
likely to leave a lot of innocent sites and innocent users quite badly
burned.

--Lauren--

Archives <https://www.listbox.com/member/archive/247/=now>
<https://www.listbox.com/member/archive/rss/247/1613637-fd9f26ee> | Modify
<https://www.listbox.com/member/?&;>
Your Subscription | Unsubscribe Now
<https://www.listbox.com/unsubscribe/?&&post_id=20170422131623:66A8981E-277F-11E7-882C-C5AB0CCD9BFF>
<http://www.listbox.com>




-------------------------------------------
Archives: https://www.listbox.com/member/archive/247/=now
RSS Feed: https://www.listbox.com/member/archive/rss/247/18849915-ae8fa580
Modify Your Subscription: https://www.listbox.com/member/?member_id=18849915&id_secret=18849915-aa268125
Unsubscribe Now: 
https://www.listbox.com/unsubscribe/?member_id=18849915&id_secret=18849915-32545cb4&post_id=20170422144704:10085F0A-278C-11E7-B25A-DE02E2D947A5
Powered by Listbox: http://www.listbox.com

Current thread: