Interesting People mailing list archives

Re: New whitehouse.gov robots.txt file


From: David Farber <dave () farber net>
Date: Wed, 21 Jan 2009 10:37:57 -0500



Begin forwarded message:

From: "David P. Reed" <dpreed () reed com>
Date: January 21, 2009 10:20:53 AM EST
To: dave () farber net
Cc: ip <ip () v2 listbox com>
Subject: Re: [IP] New whitehouse.gov robots.txt file

Who decided that "whitehouse.gov", which is by definition a PR site, not a news site or an information source, has "the country's" robots.txt file? Obama is not "the country". He's just another elected official, presenting his side of the story.

That he wants the entire site to be indexed by spidering is not "amazing". If you are pushing propaganda, that's exactly what you would do.

What's more interesting is what the executive branch chooses to reveal about, say, its actual ongoing surveillance activities, the names of the prisoners at Guantanamo and in CIA rendition sites. In other words, what The Sunlight Foundation would index, if only it could get to it, or what the National Security Archive would archive, if only ...

Or what lobbyists are meeting in the West Wing each day, and the subject matter of those meetings.

There will be nothing on whitehouse.gov that is likely to change these matters. High symbolism, but empty symbolism of that sort sets an expectation that is unlikely to be met unless *we* as Americans look beyond the cheap symbolism and demand transparency and sunlight.

David Farber wrote:


Begin forwarded message:

From: Joseph Lorenzo Hall <joehall () gmail com>
Date: January 21, 2009 8:03:57 AM EST
To: Dave Farber <dave () farber net>
Subject: New whitehouse.gov robots.txt file

(see here: http://www.kottke.org/09/01/the-countrys-new-robotstxt-file
via Aaron Burstein)

Hi Dave,

Here's another fascinating sign of increased transparency in the new
administration:

The whitehouse.gov robots.txt file -- a file that specifies what areas
of a web site that web spiders may crawl[1] -- has gone from 2400
lines to just two lines:

 User-agent: *
 Disallow: /includes/

This means that most of whitehouse.gov will now be available to search
engines and other web resources that use automated crawlers to
retrieve, index, etc. content.

best, Joe

[1]: http://en.wikipedia.org/wiki/Robots.txt

--
Joseph Lorenzo Hall
ACCURATE Postdoctoral Research Associate
UC Berkeley School of Information
Princeton Center for Information Technology Policy
http://josephhall.org/




-------------------------------------------
Archives: https://www.listbox.com/member/archive/247/=now
RSS Feed: https://www.listbox.com/member/archive/rss/247/
Powered by Listbox: http://www.listbox.com





-------------------------------------------
Archives: https://www.listbox.com/member/archive/247/=now
RSS Feed: https://www.listbox.com/member/archive/rss/247/
Powered by Listbox: http://www.listbox.com


Current thread: