WebApp Sec mailing list archives

Re: Defeating CAPTCHA

From: Mark Burnett <mb () xato net>
Date: Thu, 25 Aug 2005 11:32:29 -0600

The problem with coming up with effective CAPTCHA's is that the dataset should not rely on obscurity or secrecy to
work. Anyone can come up with hard questions that can consistently trip up a computer, but how effective would those
questions be if the adversary had access to your question/answer database? Many ideas I hear for CAPTCHA's rely solely
upon the secrecy of the data set. And that is security that relies upon obscurity.

Try coming up with a CAPTCHA where the code is public, the dataset is public, and the only secret is the randomness
generated for each individual test, and you will find that it is quite difficult. The problem, in part, is that we want
a machine to generate a test that another machine with the same data cannot solve, but that a human can. We can bypass
that by having humans come up with the questions, but that means they also need to store the answers for verification,
again bringing us back to the problem that we are relying upon the secrecy of the data.

Another mistake that people often make with CAPTCHA's are questions with multiple choice answers. If you asked a
question like "Which of these strawberries is most rotten?" you would have to provide enough pictures to reduce the
significance of a luck guess. Even if you had 10 possible answers to select from, that might not be effective in
stopping a spammer from setting up massive free e-mail accounts. It might statistically take them 10 times as long but
they can still do it. However, if you provide too many answers, the chances of several good answers increases, making
it less effective. How many times have you taken a multiple choice test and there are two answers that, in your
opinion, would work? Especially in the case of a subjective question such as which strawberry is most rotten.

It is definitely a good challenge and it will be cool seeing someone someday solve this problem.

Mark Burnett

On Thu, 25 Aug 2005 08:40:40 -0700, Jayson Anderson wrote:

 That was an interesting article, I definetely got caught up clicking
 thru for awhile.. One has to wonder, why hasn't a more effective system
 been placed into production let alone conceptualized and largely
 accepted as a solid approach for the future ? More specifically, the
 claim that CAPTCHA as it stands now is not a Turing machine. I'm not
 sure if that's entirely true as symbols pre-date their interpretation by
 machine.=20
 Regardless, like one gentleman mentioned in an article, a much more
 clear method to differentiate man vs. machine would be to ask abstract
 questions. Barring the cultural, linguistic and socioeconomic
 implications, why not ask things like "which one is a pachyderm?". Or
 "which texture most resembles stipple?". Or "Which of these strawberries
 is most rotten?". Or "Which person is taller?" with same-sized figures,
 but one the same sized as the car she stands next to, the other only
 half. etc. etc. Ya know ? Sure it would take a significant multi-faceted
 approach utilizing an amazingly heterogeneous set of contributors, but
 that's where open source comes in. Pool a huge bank of acceptable
 abstracts based on image size, obscurity and all the other standards
 (which do NOT need to be complex at all), then refine that, seed the
 array and answer presentations with some decent entropy, use yet more
 entropy to randomize the units by which answers are delineated,
 "a,b,c,d", "circle[~],eye{=3D],carrot[%],money[E]" each different each
 time, and all the hundreds of other variables i've not thought of. It
 seems like it is workable to me. Keep the project always living so that
 submissions and refined objects are always being added to an update-able
 system.....  SOMETHING is going to have to be done that is superior to
 "crazytext", as ultimately it will be rendered nothing worse than a
 speedbump. I think CAPTCHA still qualifies as Turing, just not an
 effective one in it's environment. Seems that machine-proofing should
 use anything BUT that which is found in almost every machine that would
 be used to circumvent it :)=20
 
 Sorry for the chatter but I've ALWAYS felt that crazytext(tm) was an
 amazingly poor way to differentiate machine from man, and these articles
 just prove what I and so many others I'm sure had always felt.....
 
 Jayson
 
 -
 On Wed, 2005-08-24 at 14:29 -0400, robert () webappsec org wrote:

 This was linked off of slashdot (http://it.slashdot.org/article.pl?sid=05/08/24/1629213&tid=172&tid=95)
 and explains some of the ways people are breaking CAPTCHA (http://en.wikipedia.org/wiki/Captcha) based systems.
 
 http://sam.zoy.org/pwntcha/
 
 - Robert
 robert_at_webappsec.org
 http://www.cgisecurity.com

Current thread:

Defeating CAPTCHA robert (Aug 25)
- RE: [WEB SECURITY] Defeating CAPTCHA Debasis Mohanty (Aug 25)
  - RE: [WEB SECURITY] Defeating CAPTCHA focus (Aug 25)
    - RE: [WEB SECURITY] Defeating CAPTCHA Michal Zalewski (Aug 25)
- Re: Defeating CAPTCHA Jayson Anderson (Aug 25)
  - Re: Defeating CAPTCHA Mark Burnett (Aug 25)
    - Re: Defeating CAPTCHA Chris Shiflett (Aug 25)
    - Re: Defeating CAPTCHA Jayson Anderson (Aug 25)
    - Re: Defeating CAPTCHA Andrew van der Stock (Aug 25)
  - Re: Defeating CAPTCHA Stephen de Vries (Aug 25)
    - RE: Defeating CAPTCHA Glenn Euloth (Aug 26)
    - Re: Defeating CAPTCHA Christopher Kunz (Aug 31)
- Re: Defeating CAPTCHA Subs (Aug 26)
  - Re: Defeating CAPTCHA Michal Zalewski (Aug 26)
- Re: Defeating CAPTCHA Paul M. (Aug 26)
- Re: Defeating CAPTCHA victor (Aug 29)

(Thread continues...)