New antibot measure, need reviews.

Teensweb · Feb 6, 2011

Hi guys, I've been trying to develop a new anti-bot measure based on simple image recognition.
For example. The page will throw up an image and ask you to identify what it is.
Here is a live demo.
The currently included objects are:
chair, clouds, trees, ship, river, clock, books icon, car, tv, plane and mountain.
What do you think about the efficiency of this system? The images are generated randomly from the web, so accuracy is not 100%...

cybrax · Feb 6, 2011

Works well, as for accuracy it's hard to say after just playing with it for a minute.
I suppose it would depend where the images and descriptions are obtained from, pulling the results out of a random google search page may not give as accurate descriptions as using Flickr or Mourgefile would.

Are you going to share the source code with the community here?

lemon-tree · Feb 6, 2011

The problem with this sort of captcha is that it is very much dependant upon what a user perceives an image to be of. For example, this image:

It has more than just one possibility: Boat, Ship, Ocean, Sea, Tree and any other number of derivations. I entered boat and got an invalid captcha; so what one person calls a boat may be called a ship by another. Essentially, with this sort of captcha, in which there is any form of ambiguity as to what the answer is, will only act to frustrate your users.

What you are trying to do here is reinvent the wheel when the are already viable systems that have been shown to be effective. Additionally, image recognition captcha has been positively shown to be insecure in the past. For example, I could index the first thousand or more images that are returned by a search for 'Ship' and then every time you show an image of a ship I could break the system first time. If I don't have an image than I may then be able to make a good guess based on the content of the images, i.e if the image is lots of blue with a white shape then I may guess 'Ship' or 'Cloud'. If this process is repeated for a whole range of phrases then your system becomes effectively useless, as eventually you will show an image that I have data for.

This is why randomly generated images are so popular for captcha, as there is no way to predict what will show up. The only way to break these is to decode it. Relying upon the chance that I don't have that image is not a good system for a captcha.

hazar90 · Feb 6, 2011

lemon-tree said:
The problem with this sort of captcha is that it is very much dependant upon what a user perceives an image to be of. For example, this image:

It has more than just one possibility: Boat, Ship, Ocean, Sea, Tree and any other number of derivations. I entered boat and got an invalid captcha; so what one person calls a boat may be called a ship by another. Essentially, with this sort of captcha, in which there is any form of ambiguity as to what the answer is, will only act to frustrate your users.

What you are trying to do here is reinvent the wheel when the are already viable systems that have been shown to be effective. Additionally, image recognition captcha has been positively shown to be insecure in the past. For example, I could index the first thousand or more images that are returned by a search for 'Ship' and then every time you show an image of a ship I could break the system first time. If I don't have an image than I may then be able to make a good guess based on the content of the images, i.e if the image is lots of blue with a white shape then I may guess 'Ship' or 'Cloud'. If this process is repeated for a whole range of phrases then your system becomes effectively useless, as eventually you will show an image that I have data for.

This is why randomly generated images are so popular for captcha, as there is no way to predict what will show up. The only way to break these is to decode it. Relying upon the chance that I don't have that image is not a good system for a captcha.

You're right about that.

Teensweb · Feb 7, 2011

Are you going to share the source code with the community here?

Sure I will, but before that I need to know if it's worth it, that's why I set up the poll. I expected more response, is it because it's a public poll?

warlordste · Feb 7, 2011

As someone else posted it relies on what the image means to you also another thing is spellings one of them was a pic of a ship now i would call it a photo but it might of been photograph personally without having writing at the bottom like whats in the middle of the picture or what color is the ship i don't think its going to work inless you do somthing like this other wise you will prob loose vistors

Livewire · Feb 7, 2011

The problem is the ease of defeating it; that makes its accuracy low, but also because of the number of possible answers for each image. For instance, should I put a picture of a boeing 747 up in that captcha, what should it take as a valid answer?

The list I can see:
Airplane
Jet
Jumbo Jet
Jumbo-Jet
Queen of the Skies
Wide-body Commercial Airliner
Boing 747
Boeing 747

Which one should it accept, ignoring that there's more than what I've stated here? The worse news is if you accept -every- answer, then it's easier for the bots to guess, which defeats the purpose. Captchas that are randomly generated can fare better here as there's 1 solution and only 1.

Plus, we run into the issue of image-count. If we take a standard 7 character US 26 letter alphabet randomly generated captcha, we have approximately (meaning I actually checked it in a calculator) 8,031,810,176 different possible combinations. Add to that each image is actually randomly generated on the fly to skew and distort the letters so a bot can't read it and a human can, and we've got a letter combination that is virtually unguessable by a bot. We can't exactly store that many pictures for the captcha, along with all their possible answers. Even if we compared it to a 3 character captcha, we'd still need 17,576 different pictures.

tl;dr?

Drawbacks: Not nearly enough combinations when compared to a standard US Alphabet captcha, and too many possible solutions for each image.

Teensweb · Feb 7, 2011

@warlordste:
As you said about photo and photograph, I had already solved that problem: if you type trees, or tree if a tree comes up; or plane instead of airplane; it'll still work.
P.S: I had to read your post twice to understand it correctly, it would be very nice of you if you could use full stops wherever required...

@Livewire:
That's an important issue that you pointed out. I am more of a mathematician than a programmer and my very reason of building this system is that captchas are becoming outdated,even I have a pluggin in firefox that'll scan captcha for certain sites I visit and fill it up for me, then what less do you expect of spammers? I guess that ruins your probability calculation(could've just written 26^7).

But what sort of scanning program can you write for recognizing objects in an image (forget objects, can someone at least show me how they would make the computer recognize just a book from a bunch of random images?)
Of course, indexing all the images of a search is an obvious breach, but how many such images will people index, and how do people know which search engine is used? I am not sure about this part but if people would do it, then I'll probably give up with this thing.:frown:
And about storing that much images, that's a real issue, but who's talking about "us" storing all the images? I am just getting images stored on the web by others!
But no system is perfect and since this is my first venture in actually building something for the web, this one's far from perfect.
The disadvantages I have come across so far are:
1. The accuracy of the images generated from the web ( I want statistics from others that that's one of the main reasons of posting it here )-:
-Have you tested that yourself ? ( I am not claiming anything but I just want to make sure...)
I'm still working on improving that.
2. As you pointed out, the various possible answers of the same image:
lemon-tree was also right as to what part of the image should be considered.
But still, I just want to ask, what part will you consider? ( That'll decide if i should work more on this or not, again one of the reasons for me asking reviews...)
Well, people who are more like computers will be confused I guess!:smile:

But still, won't the problem be solved ( to some extent ) if a list of possible objects is provided? (I don't plan having more than 20, for now )

If anyone else finds more drawbacks, you are welcome...

Salvatos · Feb 7, 2011

I'm not sure if you were actually asking the question, but yes, there are programs that will identify objects on a picture using general shapes and colors. One of my friends worked on one last summer, I believe it was to count how many cars were on a picture. As far as I know, that's also how Google Maps censors faces and license plates. It's not perfect, but it can definitely help beat your antibot quickly.

Teensweb · Feb 7, 2011

I'm not sure if you were actually asking the question, but yes, there are programs that will identify objects on a picture using general shapes and colors. One of my friends worked on one last summer, I believe it was to count how many cars were on a picture. As far as I know, that's also how Google Maps censors faces and license plates. It's not perfect, but it can definitely help beat your antibot quickly.

Yep, I've heard about neural networks, but its definitely more easier to write programs that identifies word captchas, don't you think?
What I'm saying is, they are getting outdated, I just thought about this idea and went ahead to write it because it took only 10 lines of code in php, well I guess I can happily drop the idea due to lack of accuracy, as most say over here. Maybe someone else can improve upon this or even come up with some new idea...
And about reinventing wheel, I am trying to "improve" the wheel, which can be useful, you know...
BTW, most sci-fi writers even think that wheels have got outdated and we should switch to hovercrafts or something!:smile:

lemon-tree · Feb 8, 2011

Just because something is old does not made it redundant; as far as I've heard, the only way to currently break the better, more common word based captchas en-mass is to use real people to sit there and solve them for a few pennies a pop.

essellar · Feb 8, 2011

CAPTCHA is hard, and there's a very fine line between too many false positives (bots pass) and too many false negatives (humans fail). When you throw a Mechanical Turk (it's people!) into the mix, everything is crackable, of course, but even if we just restrict ourselves to purely electronic attacks, they're just getting too good. Google has recently been experimenting with its reCAPTCHA, and I'll confess to being stumped by most of the words with the stroked letters (I can get the filled letters no matter how badly they're distorted).

The problem with a "name the thing" scheme is that if you don't do multiple-choice, the user has too many synonyms even for a single isolated object (if they recognize it at all). Humans will fail. On the other hand, if you supply a word list to choose from, even a naïve bot has a 1/n probability of passing, and that's not nearly good enough.

If you don't want to use an established CAPTCHA API (like reCAPTCHA), then sometimes going to a domain-specific scheme can work very well. In real life, and using my real name, I've been a part of the IBM Lotus software development community, and a common CAPTCHA-type scheme is to have the user evaluate the result of a simple Lotus Notes Formula Language statement. The sort of thing that anybody who's been working with the product for more than a week could answer without resorting to the doco. Sure, it's bot-defeatable -- but that bot either has to implement the Notes formula language OR use Notes or Domino as part of its back end, and either way we'd find that very clever and salute it. (It even keeps out a lot of the "kindly do the needful" types who would rather exploit the community rather than join it. We like n00bs well enough, but we like n00bs who try a whole lot more.) Just don't ask about the airspeed velocity of an unladen swallow without specifying African or European.

New antibot measure, need reviews.

How much accuracy do you think the system has?

more than 90%

70-90%

50-90%

40-50%

very poor(<40%)

Teensweb

New Member

cybrax

Community Advocate

lemon-tree

x10 Minion

hazar90

New Member

Teensweb

New Member

warlordste

New Member

Livewire

Abuse Compliance Officer

Teensweb

New Member

Salvatos

Member

Teensweb

New Member

lemon-tree

x10 Minion

essellar

Community Advocate

Free Web Hosting

Our Community

Legal