Interesting People mailing list archives
Google and Data Retention - Policies and Possibilities
From: Dave Farber <dave () farber net>
Date: Tue, 31 Jan 2006 13:23:29 -0500
-------- Original Message -------- Subject: Google and Data Retention - Policies and Possibilities Date: Tue, 31 Jan 2006 09:08:22 -0800 From: Lauren Weinstein <lauren () vortex com> To: dave () farber net CC: lauren () vortex com Dave, That Google can track user searches is hardly an "alert the media" revelation. This status was effectively obvious since we know that Google responds affirmatively to various law enforcement-related data-retrieval orders (and quite possibly to others that we don't know about, such as national security letters), that would be largely useless without such data -- and Google has never claimed to operate anonymously in this respect. A more interesting question in terms of data retention is *how long* various aspects of the data are retained. That is, does this fine grain of data "expire" over time, or is retrospective data mining ofthe detailed data possible back into the indefinite past?
This issue is rapidly moving into the spotlight, as Congress appears poised to discuss laws that would *mandate* data retention rules for search engines and perhaps other Internet services -- and we all know that when Congress gets involved in technical matters, the results are often -- shall we say -- less "optimal" than if industry had addressed these concerns themselves voluntarily. Obviously there are certain enhanced Google services (mostly related to logged-in users in the search and Gmail spaces, including but not limited to users availing themselves of Google's search historyfeatures) that require long-term detailed data to function.
But viewed from the outside, there are steps that Google could take to minimize privacy-related risks while not significantly interfering with the value of that data for ongoing R&D and innovation. This is only a thumbnail conceptual description of course, based on external observations alone. 1) Minimize the length of time that full log records are maintained for users not using enhanced services. For instance, full records might be maintained for 30 days (an arbitrary figure for this example). These would be available to law enforcement queries and the like for ongoing investigations. However, after the expiration period, records would be anonymized (stripped of IP, cookie, and other connection-related data identifying the user). Logged search query strings (though they also can contain personal information, as we know) would not be affected at this stage and would continue to be available for R&D and other purposes, but now with a significantly lower outside abuse potential. 2) After some longer period of time (a year? -- again, an arbitrary period for the sake of this example) the remaining portion of the records for non-enhanced service users would be purged (deleted). I of course cannot address the non-trivial issues of system and related data backups in this regard, since I have no idea how Google has structured backup activities across their enterprise, but this aspect in particular might make for an interesting technical challenge. 3) Users of Google's enhanced search-history-based services, etc. represent another interesting problem, since detailed data must be maintained for these users in some form for the services to function. However, it seems likely that the outside abuse potential of this detailed data could be greatly reduced through various cryptographic techniques, while still permitting the required functionalities. It should be noted that cryptographic methods may also be applicable in various ways to alternative solutions for the issues described in sections (1)and (2) above.
Since I am not privy to Google's internal topology, the above ideas can quite reasonably be categorized as speculative. However, the point is that there do exist a range of technological approaches to dealing with this data that could be harnessed to strike a reasonable balance between data usefulness and privacy-related concerns -- permitting R&D and innovation to proceed while minimizing the inherent abuse potential in sensitive data of this sort. --Lauren-- Lauren Weinstein lauren () vortex com or lauren () pfir org Tel: +1 (818) 225-2800 http://www.pfir.org/lauren Co-Founder, PFIR - People For Internet Responsibility - http://www.pfir.org Co-Founder, IOIC - International Open Internet Coalition - http://www.ioic.net Moderator, PRIVACY Forum - http://www.vortex.com Member, ACM Committee on Computers and Public Policy Lauren's Blog: http://lauren.vortex.com DayThink: http://daythink.vortex.com- - - - - - -
Begin forwarded message: From: Adam Fields <ip20398470293845 () aquick org> Date: January 30, 2006 10:05:48 PM EST To: dave () farber net Subject: More detailed queries of what Google stores I asked two very specific questions in a conversation with John Battelle, and he's received unequivocal answers from Google: 1) "Given a list of search terms, can Google produce a list of people who searched for that term, identified by IP address and/or Google cookie value?" 2) "Given an IP address or Google cookie value, can Google produce a list of the terms searched by the user of that IP address or cookie value?" The answer to both of them is "yes". http://battellemedia.com/archives/002283.php -- - Adam
------------------------------------- You are subscribed as lists-ip () insecure org To manage your subscription, go to http://v2.listbox.com/member/?listname=ip Archives at: http://www.interesting-people.org/archives/interesting-people/
Current thread:
- Google and Data Retention - Policies and Possibilities Dave Farber (Jan 31)