Interesting People mailing list archives

New whitehouse.gov robots.txt file


From: David Farber <dave () farber net>
Date: Wed, 21 Jan 2009 09:23:02 -0500



Begin forwarded message:

From: Joseph Lorenzo Hall <joehall () gmail com>
Date: January 21, 2009 8:03:57 AM EST
To: Dave Farber <dave () farber net>
Subject: New whitehouse.gov robots.txt file

(see here: http://www.kottke.org/09/01/the-countrys-new-robotstxt-file
via Aaron Burstein)

Hi Dave,

Here's another fascinating sign of increased transparency in the new
administration:

The whitehouse.gov robots.txt file -- a file that specifies what areas
of a web site that web spiders may crawl[1] -- has gone from 2400
lines to just two lines:

  User-agent: *
  Disallow: /includes/

This means that most of whitehouse.gov will now be available to search
engines and other web resources that use automated crawlers to
retrieve, index, etc. content.

best, Joe

[1]: http://en.wikipedia.org/wiki/Robots.txt

--
Joseph Lorenzo Hall
ACCURATE Postdoctoral Research Associate
UC Berkeley School of Information
Princeton Center for Information Technology Policy
http://josephhall.org/




-------------------------------------------
Archives: https://www.listbox.com/member/archive/247/=now
RSS Feed: https://www.listbox.com/member/archive/rss/247/
Powered by Listbox: http://www.listbox.com


Current thread: