Interesting People mailing list archives

Google Buys reCAPTCHA, Creating a Potential Privacy Issue


From: Dave Farber <dave () farber net>
Date: Wed, 16 Sep 2009 19:06:43 -0400





Begin forwarded message:

From: Lauren Weinstein <lauren () vortex com>
Date: September 16, 2009 17:00:38 EDT
To: dave () farber net
Subject: Google Buys reCAPTCHA, Creating a Potential Privacy Issue





         Google Buys reCAPTCHA, Creating a Potential Privacy Issue

                http://lauren.vortex.com/archive/000612.html


Greetings.  Google has announced ( http://bit.ly/2r4BOL )_their
acquiring of Carnegie Mellon University's "reCAPTCHA" system.
You've no doubt seen reCAPTCHA in action -- it is very widely used by
a vast array of sites.  CMU's reCAPTCHA is a specific implementation
of the more generalized CAPTCHA concept, which attempts to validate
user input as coming from a human, not a (typically spam-related)
robot.

The reCAPTCHA system presents pairs of words optically scanned from
books, and asks the user to identify them.  In the process, it also
uses the resulting data to help "decode" those scanned words into
their correct machine-readable textual representations as part of
larger book scanning efforts.

This obviously makes reCAPTCHA a perfect match for Google, who is
faced with the challenge of processing vast numbers of books in their
Google Books project, some of which have fairly high OCR (Optical
Character Recognition) error rates due to the difficulty of machine
recognition of odd fonts, faded printing, and so on.

However, there is a potential privacy problem with reCAPTCHA (or any
centralized CAPTCHA system, for that matter), that Google will need to
face.

Early this year, while in the process of setting up an Internet-based
forum, I considered using reCAPTCHA as part of the validation
procedures.  Since centralized CAPTCHA servers will typically collect
IP address and potentially other data from users at the time of page
display, and again when users interact with the CAPTCHA systems (for
registration, message sending, etc.), these servers receive a running
log of information regarding the users of the sites who are
incorporating those CAPTCHAs into their pages.

So I was very surprised to discover that I could not find any
reCAPTCHA privacy policy explaining to ordinary Web users displaying
those pages, or interacting with the reCAPTCHA system, how that
collected data would be handled from a privacy and data protection
standpoint.

I queried CMU about this, and the reCAPTCHA support team replied that
they did have an extensive privacy policy, but that it only appeared
when reCAPTCHA API keys were created -- that is, when a Web site
administrator wanting to incorporate reCAPTCHA into a site applied for
reCAPTCHA access.  There was nothing to tell conventional users how
their IP address or other data would be handled by reCAPTCHA as a
result of their viewing or interacting with a Web site page
incorporating reCAPTCHA functionalities -- that is, no privacy policy
to be found at all for those users at that time.  Partly for this
reason, I chose not to use reCAPTCHA for my forum.

With reCAPTCHA moving under the Google umbrella, it will be crucial
that Google clearly explain, in a visible and specific privacy policy,
how they will collect, correlate, and otherwise use IP address and
other data associated with reCAPTCHA display and use.

Fundamentally, this situation is similar to that with ad display
systems, where the very act of viewing a page that includes external
ads may pass IP address info (and sometimes other data) to third
parties.  However, while Web users can usually choose to block
external ads in various ways if they wish (something I do not
recommend or promote -- see "Blocking Web Ads -- And Paying the
Piper" - http://lauren.vortex.com/archive/000281.html ), blocking
CAPTCHAs would usually mean losing access to the associated sites in
significant ways.

As an enthusiastic supporter of Google Books ("The Joy of Libraries, a
Fireman's Flame, and the Google Books Settlement" -
http://lauren.vortex.com/archive/000611.html ), I fully appreciate the
value that reCAPTCHA will bring to Google, and ultimately to all users
of Google Books.

But I also believe that it's very important for the privacy issues
associated with reCAPTCHA to be properly handled by Google, hopefully
in a manner significantly better than Carnegie Mellon's own approach
earlier this year.

--Lauren--
Lauren Weinstein
lauren () vortex com
Tel: +1 (818) 225-2800
http://www.pfir.org/lauren
Co-Founder, PFIR
  - People For Internet Responsibility - http://www.pfir.org
Co-Founder, NNSquad
  - Network Neutrality Squad - http://www.nnsquad.org
Founder, GCTIP - Global Coalition
  for Transparent Internet Performance - http://www.gctip.org
Founder, PRIVACY Forum - http://www.vortex.com
Member, ACM Committee on Computers and Public Policy
Lauren's Blog: http://lauren.vortex.com
Twitter: https://twitter.com/laurenweinstein





-------------------------------------------
Archives: https://www.listbox.com/member/archive/247/=now
RSS Feed: https://www.listbox.com/member/archive/rss/247/
Powered by Listbox: http://www.listbox.com

Current thread: