Human Interaction Proof

December 4th, 2011 1 comment

I lied when I said CAPTCHA in my last post, what I really should have said was, “Human Interaction Proof” (HIP), but since then I’ve done quite a bit of work and the idea has evolved quite a lot. The practice of getting the “human” to circle objects within a natural image seemed to work quite well, after some tweaking. However the problem with that is that it isn’t possible to introduce enough entropy, either you would have to distort the image, which is quite limited. Or get a hold of enough indexed images, which is quite impractical. This has lead to me creating images containing a number of objects on the fly. The prototype as it is now needs a lot of work. The important bit to get right (as with most HIPs) is the balance between distortion and usability and I feel I’m quite far off the mark right now.

The demo is here: http://jonnyfairfull.co.uk/tic/objRand.php. As it uses HTML 5’s canvas for user input, it probably won’t work in IE 8 or less, but it’s supported by most other browsers.

This demo output 4 objects to a random point on the screen, and asks the user to draw a rough outline around one of the objects. These objects are overlaid by a random scene (only 2 at present) in an attempt to distort them from a bot, but to keep them recognizable to a human. This is the part that requires the most work. At the moment the edges of the objects stand out quite a lot, making it easier for both humans and bots to see, but it is also not uncommon for a human to miss the object they are being asked to indicate. Let me know how you found it; was it easy to find the object? Are you human?

Right now I’m working on ways of distorting the image/introducing noise to the image, in an attempt to subvert a vision attack. But work also needs to be done to increase the number of possible solutions to the test.

CAPTCHAs

December 1st, 2011 No comments

Note: this is very much a work in progress

So, after a lecture on system security and CAPTACHAs, I’ve been doing some thinking. Firstly it was interesting to learn that characters usually contain the same number of pixels after distortion and also that Microsoft only ever rotated their CAPTCHA phrase by 30 degrees. The notion of image CAPTCHAs was also brought up, as humans are generally better at recognizing images than computers (although there are images hashes, but we’ll get to those). If you ask a computer find the cats out of a number of cat and dog pictures, it’s going to struggle, and this is exactly what Microsoft’s ASIRRA does. I quite like the idea; it seems to be rather effective.

ASIRRA consists of over 3 million pictures of dogs and cats, it asks the user to click all the cat photos out of 12 cat and dog photos and scores them on how many cats they get right. The Wikipedia article for CAPTCHAs (I know, not a great source) suggests that if all the pictures of cats & dogs could be downloaded, they could all be indexed (hashed, still getting there), therefore, when presented with the 12 images, an attacker could just hash these, compare those hashes with the pre-indexed ones, and return all the ones she has as cats to Microsoft. This is unlikely, but the reliance on the obscurity of the images does seem to be a weakness.

I have been working on a system similar to ASIRRA (only since Tuesday, so it’s still pretty flimsy), which relies on a human, identifying objects with in a picture. It uses an html 5 canvas overlaid on top on of an image and asks the user to circle an object from the image. This relies on the same basic principle as ASIRRA – computers suck at identifying things in pictures.

I have a simple demo here, are you human? (Chrome, Opera, Firefox, Safari and Maybe IE9 only) http://jonnyfairfull.co.uk/imgcaptcha/rand.php (there are only 2 demo images)

This is only an early prototype, and as such, there are a number of plausible attack vectors I’d like to address. The two most prominent problems are:

1. The object that the user has to circle is output in plain text, and is always the same. This is very easy for a computer to recognize time after time. There are a number of solutions to this. The most ‘secure’ and least practical is to have what the user is to do, output as audio with lots of random background noise. But I don’t know how realistic that is, I’m currently thinking that the action should be printed over the top of the image (i.e., as part of the image, using GD5 or similar), in a random place. This is still quite easy for a computer to find and scan, but not too straight forwards. I have a few other ideas for this, but they need some work.

2. The images themselves do not change, i.e., there is no distortion. This would make it easy for an attacker to hash these images, which would allow a bot identify them easily. Image hashes, or perceptual hashes are different to standard cryptography hashes; they attempt to create similar signatures for media. What this means is, if we hash an image, then rotate the image slightly and hash it again, the hashes will be similar. This allows a computer to deduce that the two hashes are probably of the same image. This causes a problem for image based CAPTCHAs, as they rely on computers not being good at identifying images. My solution to this so far, is to overlay a random image on top of the target image. I haven’t been able to test this yet, but it will hopefully change the hashes of an image enough to fool computers, but still allow humans understand what’s going on.

I’ve created a prototype for that idea here: http://jonnyfairfull.co.uk/imgcaptcha/img.php, all it does is, take the 2 demo images, and randomly place one over the other. Refresh a few times to see how it works. However I don’t think this introduces enough entropy. I think the position of the object we are asking the user to circle should move, randomly. To achieve this, buffers of a similar colour and texture to the target image could be place on the left and top of the image. This should stop and attacker recording the pattern once and just resending it. I have thought that the final system will reject identical received within a certain time span, but it’s not too difficult for an attacker to slightly change the coordinates of a pattern. Maybe the humanness of how patterns are drawn is something to be examined.

While examining CAPTCHAs I had the thought that “Anything a computer can automate on the back end, an attacker can probably automate on the front end”. Therefore I think that any sort of effective CAPTCHA will take lots of human preparation on the back end, something which may not be practical.
This is just a quick post/dump of my thoughts on the matter; I’ll write anther post when I’ve done some updating.

RESTandra/ – Update

November 8th, 2011 No comments

Quick update:

RESTandra/ Has received a *much* needed overhaul and now has a better implemented back end, with new URL design.  More to come soon.

Node.js, reddit, Twitter & Google

May 20th, 2011 1 comment

For a while I’ve wanted to play with Node.js, I like JavaScript, and I like making things for the web, so node.js seemed quite appealing.

Well the other day there was a story floating round the intertubes about a guy who had apparently told his 81 year grandpa that twitter is a search engine. So this is the account in question, I’m not convinced that the 81 year old is just simply tweeting his searches, but you could write a little program in node.js to tweet the search, and then redirect the user to the search page. This is what I came up with:

It only took around 2 hours to install Node and create the script. I found Node.js to be really powerful yet simple. I got done exactly what I wanted to pretty quickly, with out hassle. All it does is accept a GET request, parse the header for the GET variable, send this to twitter, then respond to the client with a HTTP 302 redirecting them to the Google results page for their search (which loses the POST request, but it seems to work).

I did however make one mistake… I posted a link to a live version of the script as a comment in the reddit thread about the Grandpa search, this wasn’t the best idea, as the twitter feed needed quite a bit of moderation (I had almost forgotten how mental people on the internet are)… I didn’t host that version on my personal server, I just spun up a new rackspace instance and had it running on there. But I’ve moved it across here for now. You can try it out here:

Search: http://jonnyfairfull.co.uk/searchTweet/ I made it obvious that the searches are going to be tweeted just to be safe.

The Twitter account is here, note that it is public facing, and anyone can send it a tweet. I think I got rid of most of the bad stuff, but be warned: https://twitter.com/SearchSpy

Categories: Node.JS Tags: , , ,

RESTandra/ – A RESTful HTTP API for the distributed structured data store Cassandra

May 20th, 2011 1 comment

RESTandra/ is(was) my final year honours project. The point of it was to provide RESTful access to Apache Cassandra Resources. Having just written a 10,000 word report on it, I’m going to keep this brief.

The project started out in python, using thrift and Pycassa to provide access to Cassandra over HTTP. However this wasn’t an ideal implementation so I moved on to building the API into Cassandra’s source. You can have a look at an early build of it here:

https://github.com/jonnyboris/RESTandra

RESTandra/ is all about applying HTTP verbs to Cassandra nouns. The interface to Cassandra was provided through a slightly tricky URL:

http://domain:18220/keyspace/columnfamily/row/columnStart/columnEnd

/consistencylevel.filetype

It was quite hard fitting Cassandra’s data model into a URL, so the one above has been slightly altered. To interact with the data in Cassandra HTTP methods are used on variations of the above URL

so a

GET /Twissandra/User/fairfull///QUORUM.json HTTP/1.1[CRLF]
Host: domain:18220[CRLF]

Would read all the columns of the Fairfull row from the User column family with in the Twissandra keyspace and return them in JSON. A HTTP POST to that URL would be used to create the fairfull row, a PUT would be used to edit it and a delete would, well… delete it.

A lot of my research was based around how well HTTP methods and status codes map to the functions and errors of a database connector. For example, everyone is familiar with an HTTP 404 this maps well to a row being looked up, that cant be found.

Anyway, like I say, I’ve done enough writing on this subject, if you want you can find my report here:

http:/jonnyfairfull.co.uk/RESTandra.docx

Oh, one thing worth mentioning is how I tested RESTandra. As Cassandra is designed for big data applications serving many requests, I was unable to put it under any sort of stress. So to test it I came up with a distributed method of testing, to do that I wrote a test application in JavaScript, and pretty much spammed the school of computing with the link, this was quite successful, as RESTandra ended up handling just over 4,000,000 million requests over 2 days without leaking memory all over the place. I wrote a much better description of the test here:

http://restandra.blogspot.com/2011/04/javascript-testing.html

Mini Projects/Hacks

May 20th, 2011 No comments

Every now and then, I feel the urge to program something I find interesting. The success of these programs is quite varied, but I have fun making them and thats whats important to me. Most of these used to have their own posts, however as I’ve just updated my blog, there all in a list below, with links to demos and the source. However, as I switched servers, some of these projects have suffered, this is due to API keys needing updating, and slight file structure changes.

This Post is going to take a while to finish, Im going to come back to it and split it up.

 

Reddit Post Tagger

A while ago there was a post on reddit.com/r/iama  about making it easier to search for posts and about the ability to tag posts. Well this post was made in the summer, and I didn’t have too many productive things to do at that time, so I gave it a shot.

Here it is: Warning, this is pubically accessible, and people post some weird things on reddit, so there is some ‘interesting’ content on here:

http://jonnyfairfull.co.uk/reddit/

This wasn’t too hard to make, ketralnis, a reddit admin at the time created a json dump of all the past iama posts (27MB of text) and sent it my way. I then set up a small php script to organise the data into a MySQL database so it could be indexed & searched. This is just a really simple 2 table database, a table for posts, and a table for tags. The interface is dead straight forwards as well, there is the home which allows you to do a full text search of post titles, find a random post and browse by recent tag through the tag cloud (tag searching never got implemented, more on that later). Then there is the view post page, which displays the reddit.com/r/iama post in an iframe and a bar at the top allowing you to add tags and navigate posts.

Tags are added through a simple ajax call back to the server, so you can add a bunch of tags real quick, and not have to reload the page, just type it in and hit enter and its added unfortunately this project broke down, the other Rreddit user who posted asking if this could be done, just disappeared and I moved on. Still not a bad little project.

Some simple code (not really worth putting on github):

The HTML doesn’t seem to get any formatting on paste bin, sorry about that

 

uHack Submission – What You Talking about, WYTA?

This really deserves a post of its own, however moving it across servers has broken it quite nicely, this is largely due to API keys expiring, however theres also an error in one of the JS files which appeared from somewhere.

This was quite a complex project, but it’s homepage has quite a good explanation:

Remember its still quite broken: http://jonnyfairfull.co.uk/uhack/

In short (as possible) WYTA is a translation proxy, you send your request to WYTA, it forwards that request to the site you want. Pulls down the whole page and parses it to make all the links absolute (so the page renders properly) . Then all the text from <p>’s is parsed and sent off to the Yahoo! Term Extraction API, this returns all the big/scary words that are to be pre-emptively translated. These are then sent off to Google to get translated into the desired language.  The text of the page is then altered, so that the big/scary words are surrounded in a <span> which when clicked changes the word to its translated version. I hope that made some sense…

There is also quite a bit going on client side as well, WYTA allows users to translate entire paragraphs. To do this you double click the paragraph, which is then sent off to Google with an ajax request to be translated, when its returned, the translated paragraph is shown beside the original in an overlay, allowing the user to compare the 2, and hopefully learn something about the language (Google started doing something similar a few months after we finished, we had no influence over them, it was just cool to know we were along the right lines).

Broken bits: Single word translation has died, I’m not sure why and I’m going to look into it soon. Also the definition of single words has died. Originally when you clicked a pre-emptively translated word it would appear in the nav bar at the top. If you click the word at the top, WYTA used to send an ajax request to a YQL script to go grab the first paragraph from the associated Wikipedia article in the right language, However the API key has expired, so I will have to sort that out too.

This project was just a hack, so its not particularly solid as it was completed on limited time however I was pretty pleased with the outcome.

Try it out here: http://jonnyfairfull.co.uk/uhack/, if you want to go to a page of your choosing, just click the wire globe icon on the nav bar and type in the address.

 

<canvas> Experimentation

Just like a lot of folk who know how to use JavaScript, I’ve had fun messing about with the canvas tag, making silly games.First off I made a really simple side scrolling shooter, nothing spectacular, but I had fun making it

You have 5 lives, click the box to star, refresh to play again: http://jonnyfairfull.co.uk/phys/

This was just an experiment to see how simple hit detection could be implemented in JavaScript and I really like side scrollers…

I also made a little car driving thing to play with acceleration and deceleration, it uses some of the original GTA graphics which I probably shouldn’t use. Also it could really do with some optimising, the JavaScript in there is pretty sloppy (JSlint does make you cry…)

http://jonnyfairfull.co.uk/sts/

I took this idea a little further and thought ‘hey, wouldnt it be cool if you could control that with your phone…’ Turns out thats pretty hard to do with ajax polling, but I had a go.

It’s best to open these in 2 separate browser windows, and make the controls into a small window:

Game: http://jonnyfairfull.co.uk/msts/

Controls: http://jonnyfairfull.co.uk/msts/ctrl.htm The controls are really slow to respond, and its dead easy to loose the car off the side of the screen, but its still pretty cool controlling one browser window from the other.

You can just view source to get most of the code for these games (chrome developer tools are great).

 

More projects to add later…

 

 

 

 

 

 

Categories: Web Development Tags:

New Blog

May 18th, 2011 No comments

My original blog was a little lacking, so I have finally installed WordPress. So far so good, The source seems to be well documented and it’s been damn easy to set up.

As this is the first post, I feel I should set the tone for the rest of the blog. In here I will write about computing topics and thing which interest me, nothing too personal… that’s what Facebook is for…

However as I’m in the progress of updating and move old content across, the dates/timestamps on the next couple of posts will be wrong.

The next post will be me documenting all the little projects I’ve made for fun/competitions, but thats going to take some work has my file structure as gotten a bit messy.

Categories: Personal Tags: ,