Interesting People mailing list archives

Subpoena for 1 million random web searches

From: David Farber <dave () farber net>
Date: Fri, 20 Jan 2006 15:53:41 -0500

So how long before peronal info is requested djf


Begin forwarded message:

From: Seth Finkelstein <sethf () sethf com>
Date: January 20, 2006 3:43:45 PM EST
To: ip () v2 listbox com

Cc: David Farber <dave () farber net>, stevenstevensteven () yahoo com,sean () DONELAN COM

Subject: Re: Subpoena for 1 million random web searches

[There's no personal information involved. Here's the best summary
I've seen, from Gary Price writing for SearchEngineWatch.com]

http://blog.searchenginewatch.com/blog/060119-161802
http://blog.searchenginewatch.com/blog/060119-060352

Court Documents & Summary Of United States Versus Google Over SearchData


   Earlier we reported in [79]Bush Administration Demands Search Data;
   Google Says No, Yahoo & MSN Said Yes that the US Government seeks
   to force Google to hand over search data. That story explains more
   about the situation, and there have been a number of postscripts
   from when it was first written.  Along with that, we've been able
   to obtain copies of the three court documents filed in the
   case. Below you'll find links to each document, along with a
   summary of what's in each of them.

   Alberto Gonzalez, as Attorney General of the United States
   vs. Google [80]Notice of Motion to Compel Compliance (PDF File) Two
   quick points.  Remember, that this brief was filed by the
   Government and does not offer a response to their claims. I'm sure
   that will be coming. Second, I'm not an attorney and haven't played
   one on tv.  My purpose was to summarize what was presented in the
   document.

     * The motions requests that Google comply with a subpoena filed
       by the Attorney General and "produce" for inspection and
       copying the materials the Government is asking for.

     * After the lead government attorney conferred with Google,
       Google has chosen not to comply with subpoena.

     * Google is asking the court to make Google comply

     * The filing then goes into a background explanation about the
       Children's Online Protection Act (COPA) and how the government
       is developing its defense of the constitutionality of
       COPA. They believe that COPA is, "more effective than filtering
       software in protecting from harmful exposure to harmful
       material on the Internet."

     * In preparation of the case, subpoenas were issued to Google and
       "other entities" that operate search engines to produce two
       sets materials.

     * First, the subpoena asks Google to produce an electronic file
       contain, "[a]ll URL's that are available to be located on your
       companys' search engine as of July 31, 2005.

     * However, after "lengthy negotiation" the government changed and
       "narrowed" their request and asked for a "multi- stage random
       sample of one million URLS from Google's database ie, a random
       selection of the various databases in which those URL's are
       stored, and a random sample of the URL's held in those selected
       databases.

     * Second, Google was asked to "produce an electronic file
       containing [a]ll queries entered into the Google engine between
       July 1 and July 31 inclusive.

     * Again, after lengthy negotiations the government the government
       changed their request and asked for an electronic file
       "containing the text of any search string entered into Google's
       search engine for a one week period (absent any personal
       information identifying the person who entered the query).

     * Google has still refused to comply with these requests in any
       way.

     * The Government says that access to this information would be of
       "significant significance" in the preoperation of the their
       case.  Specifically why?

     * "The production set of queries entered into Google's search
       engine would assist the Government in its efforts to understand
       the behavior of current web users, to estimate how often web
       users encounter harmful-to-minors material in the course of
       their searches, and to measure the effectiveness of filtering
       in screening that material."

     * This information would also help the Government understand
       what, "web sites people find through the use of search engines,
       to determine the character of those sites, to estimate the
       prevalence of harmful-to-minors material on those sites, and to
       measure the effectiveness of filtering software on that harmful
       to minors material.

     * The document continues into a discussion with plenty of
       legalese and citations and again points out the Google has
       failed to comply and lists some of the reason Google objects to
       this.

     * Google first objects to this on the grounds of relevancy.

     * Google also objects on the grounds that if they would provide
       what the government asks for, they would be required to produce
       information identifying the users of its search engines.

     * The Government claims that this is "illusory" since they have
       specifically asked for a random sample containing no personally
       identifying information to any search string.

     * The Government said that it has received compliance from search
       entities with files containing no personally identifying
       information.

     * Google also contends that the information they're being asked
       to produce is "redundant" since the Government has asked other
       engines to produce similar files. The Government argues that
       this "misunderstands" what's being requested. "The production
       set of queries from Google's database, in combination with
       similar productions from other search engine operators will
       assist the Government in developing a sample of the overall
       universe of search engines queries, while accounting for the
       potential of any variations in the type of queries that are
       entered into different search engines."

     * The Government says that since Google is the market leader, its
       response, "would be of value" in developing the Governments
       overall sample of queries.

     * Google says that complying would also force Google to share
       trade secrets because the total number of queries receives in a
       day is a trade secret.  The Government adds that if this was
       the case, a district court has said that these numbers would
       not be disclosed.

     * Finally, according to the filing, Google says that it will be
       subject to an "undue burden" in complying. The Government
       claims that this is not the case whatsoever. The Government
       adds that they would be "willing to work" with Google to
       specify a multistage sample. They are also willing to
       compensate Google for its work and complying with the subpoena.

     * The filing ends with the Government saying that, "This court
       should require Google to comply with the subpoena on the same
       terms it's competitors have."

   [81]Declaration Of Joel McElvain (PDF File)

   The second filing is a declaration by Government attorney, Joel
   McElvain, who I believe the lead attorney for the U.S. Department
   of Justice in this matter. It also helps produce a timeline of
   events to this point. It includes:

     * A copy of the original subpoena, originally signed on August
       25, 2005

     * Detailed info and definitions about Google was to submit to the
       Government.

     * A several page letter, dated October 25, 2005, from Ashok
       Ramani, Commercial Litigation Counsel, Google sent to Joel
       McElvain with his objection to the subpoena. THIS IS A MUST
       READ!!!  Key Quotes and Passages from the Letter

     * "It is against Google's competitive interest to be viewed as
       reflecting the whole world wide web."

     * Worth noting that Google says that the government tried to use
       Archive.org/Wayback Machine and found the results
       unsatisfactory.  From the letter, "...given the
       [82]www.archive.org's stated purpose, one would expect them --
       with an appropriate consulting relationship to create the
       results the DEFENDANT wanted.

     * The Governments request is seen as redundant because they
       already has URLs from at least one other engine

     * From the letter, "Though the search engines doubtlessly have
       some differences in the URLS, they store, what distinguishes
       Google from it's competitors is the sophistication of Google's
       search engine in locating and ordering relevant results."

     * On the burden to Google.  "Google would have to spend a
       disproportionate amount of engineering time and resources to
       (i) number (even in rough terms) in real time the URLs
       contained in its search database and (ii) extract based on that
       initial numbering the URLs selected by Professor Stark.

     * Google also objects because it could "endanger" its
       "crown-jewel trade secrets."  Specficially, they would have to
       disclose the approximate number of URLs in its database and
       "some" details on how it crawls URLs, "such as the number of
       servers, server distribution, and how often Google crawls the
       World Wide Web."

     * More objections.  "Google objects to the Defendant's view of
       Google's highly proprietary queries database as a free resource
       that Defendant can use, some levels removed, to formulate its
       own defense."

     * "Moreover, Google's acceeding to the Request would suggest that
       it is willing to reveal information about those who use its
       services.  This is not a perception Google is willing to
       accept. And one can envision scenarios where queries alone
       could reveal identifying information about a specific Google
       user, which is another outcome we cannot accept.  Next, we find
       another letter. This time it's from DOJ's McElvain to Google's
       Ramani. This later is dated December 23, 2005.

     * The letter discusses how the Government is willing to narrow
       what's asked for in the subpeona This is summarized in the
       Alberto Gonzalez, as Attorney General of the United States
       vs. Google section of this post.

     * McElvain discusses how Google asked for and was granted two
       extensions to serve their objections to the subpeona until
       October 10, 2005. He then writes, "In our several discussions
       prior to the service of those objections we had offered to
       limit the scope of of the requests for production, and you had
       indicated Google's willingness to consider compliance with the
       subpeona along with the narrowed terms that we had
       suggested. Your written objection also reiterated your hope to
       reach a resolution regarding Google's compliance with the
       subpeona. However, shortly after the service of your
       objections, you telephoned me to inform me that Google would
       decline to comply with the subpeona.

     * More conversations between the Government and Google take place
       on December 12th and December 21st to discuss the technical
       aspects of the request. Finally, on December 21st, MacElvain
       was informed that Google would not comply with the subpeona.

* The final document is a protective order in the ACLU v. U.S.case.


   [83]Declaration Of Philip B Stark (PDF File)

   This document is a declaration by [84]Philipp Stark, Ph.D who was
   the person to work on the project. Dr. Stark is a Professor of
   Statistics at the University of California, Berkeley.

     * Stark explains how he has had conversations with the USDOJ,
       Google and other search providers, "to develop practical
       approaches to sampling their databases or URLs and search
       queries."

     * He adds that he has started to analyze the samples produced by
       search providers other than Google.

     * He writes, "Reviewing user queries to search engines will help
       us understand the search behavior of current web users, to
       estimate how often web users encounter HTM materials through
       searches, and to measure the effectiveness of filters in
       screening those materials.

   Stark goes on to add more about his approach while including Google
   results are directly relevant.

   Posted by [85]Gary Price on Jan. 19, 2006 | [86]Permalink

See related stories in these categories! (available to[87]SEW

   members)
   [88]Legal: Privacy
   _________________________________________________________________

References

   Visible links
  79. http://blog.searchenginewatch.com/blog/060119-060352

80. http://blog.searchenginewatch.com/blog/pdf/Google_motiontocompel.pdf81. http://blog.searchenginewatch.com/blog/pdf/Google_McElvainDeclaration.pdf

  82. http://www.archive.org/

83. http://blog.searchenginewatch.com/blog/pdf/Google_NoticeofStarkDeclaration.pdf

  84. http://www.stat.berkeley.edu/~stark/
  85. http://searchenginewatch.com/about/article.php/3411711.
  86. http://blog.searchenginewatch.com/blog/060119-161802

87. http://searchenginewatch.com/benefits/article.php?source=searchtopics

  88. http://blog.searchenginewatch.com/blog/topics/legal_privacy


--
Seth Finkelstein  Consulting Programmer  http://sethf.com
Infothought blog - http://sethf.com/infothought/blog/
Interview: http://sethf.com/essays/major/greplaw-interview.php


-------------------------------------
You are subscribed as lists-ip () insecure org
To manage your subscription, go to
 http://v2.listbox.com/member/?listname=ip

Archives at: http://www.interesting-people.org/archives/interesting-people/

Current thread:

Subpoena for 1 million random web searches David Farber (Jan 20)
- <Possible follow-ups>
- Subpoena for 1 million random web searches David Farber (Jan 20)