Hacker News new | comments | show | ask | jobs | submit login
The Photographic Science of Detecting Fake Lottery Tickets (hackerfactor.com)
254 points by ecaron 1669 days ago | hide | past | web | 49 comments | favorite

Is it possible to reverse engineer the image such that it passes all tests? Knowing the various algorithms that could be used, what would be involved in constructing the modification such that it doesn't reveal evidence of any modifications?

One thought would be to actually render an entire scene in something like POV-Ray. Imagine if something like this rendering http://www.povray.org/community/hof/chado.php contained a winning lottery ticket on the table. If you save the rendered image using the same compression algorithm and same EXIF heading information as a camera, how would one tell the difference?

I imagine it would be hard for renderer to fake all subtle qualities digital camera introduces. Things like "colors too clean", "noise too uniform", "geometric distortion doesn't match any existing lens" would turn up.

In this specific case I think I'd print a fake ticket with dot matrix printer and take photos of it.

I have worked so hard to completely replace dot matrix that I am actually trying to think were I could get access to one and am coming up blank! Many receipts are now the crappy thermal paper that magically disappers from being left on the dash.. Its like magic in the retailers favor...and just about everything else is laser. Where would you get the dot matrix?

I imagine that getting one's hands on blank lotto ticket paper stock roll is probably beyond the scope of this "hack".

From the comments, you can reprint the ticket with a receipt/laser printer and take pictures of the copy, losing the artifacts. Depends on the length of trouble you would want to go through to reproduce a hard copy document.

Simply blowing the contrast out in Photoshop makes the fakery pretty apparent too: http://i.imgur.com/6niwJ.png

What's funny about this to me is that the "04 02" reveals the fake without any kind of photographic analysis whatsoever. The numbers are supposed to be sorted smallest to largest. A valid ticket would read "02 04".

The author mentions that anomaly in the article, but goes on to assume that the sorting is picked by the state printing the ticket (probably for the benefit of being able to continue the case study).

I also noted that the differing color of the "04 02" in that image can be spotted with the naked eye if zoomed in - no advanced analysis techniques necessary.

I think the techniques used were far more interesting than the particular case study - while the media and non-camera-original nature of the images made the author's most basic techniques somewhat ineffective, the fakes were overall very poorly executed.

If it were the case that the numbers on this ticket were unsorted, you would expect all of the rows to be unsorted, whereas everything except 04 & 02 are in order, including the numbers that follow in the same row. Still interesting to see the "forensic" approach to this.

To fake a lottery ticket for posting online, it'd be much easier just to print a fake ticket and then photograph it. I suppose, however, that the purpose of doing it digitally is to practice skills and "because I can."

To fake a lottery ticket for claiming a prize ... well, that'd be fraudulent and would result in loss of freedom for quite some time.

I find it utterly frustrating that a scientific seeming article would end with this conclusion:

"A single algorithm can trigger false-positive or false-negative results... if something is really real, then it should pass everything."

How you going to be so thorough about detecting a crappy photoshop job and then trip over your own words in the conclusion?

Not for nothing, but if you merely zoom in on the picture you can tell that the 'winning' line is a different color with the naked eye. And if you want to 'prove' this, a 10-second color replacement in photoshop does the job: http://i.imgur.com/7Mg4z.png

There is nothing scientific about any of these tests. What's a 'high' ELA or 'low'? The thing that triggers a 'positive' seems to be only the authors intuition. While some of the tests might show something, I'm pretty sure ELA is absolute garbage - you can't separate the number of needed resaves because of the content frequencies from number of times it has already been resaved.

One basically looks for roundings caused by quantization and places in the image where the compression (again quantization) is either inconsistent or non-optimal.

A nice tutorial (and a lot of articles) can be found on the Dartmouth site of Hany Farid.


Personally, I have had very very mixed results with this method, and never managed to model it correctly. Interpreting results always was a very human job.

I understand the theory and I certainly agree resaving jpegs will reduce the error level. If you have an undoctored image to compare it to, you could probably use that to determine which parts have been changed. But given only an image that may or may not have been doctored the error level will vary so much with the image content that it won't be meaningful.

What, may I ask, makes a test scientific?

For one, there should be a control. In this case, analyzing another ticket that has not been faked, but preferable he would not know if the control was real or not.


You don't need to get technical about it, but I think most people would expect any test described as scientific to be objective.

Do you mean free of bias or reproducible?

I think the author was clearly free from bias, and his results are easily reproduced by applying the same algorithms.

The subjectivity which you complain about may be the conclusions drawn from the results of the tests, which I think are distinct. Specifically, you cite his "intuition" as the origin of the conclusion.

I think the author's intuition is reliable because, like "real" scientists, he's an expert and speaks publicly about his work.[1] Or at least he appears to be. Are you prepared to challenge him as an expert?

[1] http://www.hackerfactor.com/papers/bh-usa-07-krawetz-wp.pdf

I could, but shouldn't we demand he show some proof for his claims?

It seems to me that if you pull a bit of maths and technical magic out the normal skepticism the tech community melts away into a compliant bundle of gullibility.

He used PCA as well as ELA, making your first statement unfounded.

While the result of the PCA is quite well defined, there is nothing well defined about the inferences he draws from that. In the end it shows a marked difference in the suspicious area, but there is no proof that the technique will only show those results on tampered areas.

If you can simply run filters until the suspicious area 'looks a bit different' and thats your success criteria, you haven't proven anything.

Neal specializes in this sort of thing. This blog post wasn't written to convince you that PCA (or ELA) work. If you've read his other posts on forensics, you'd see that he always starts with the common tests, then works through the most likely explanations, trying to see what you can rule out and where you should focus. A lot of the time, you can rule things out with common sense (the out-of-order numbers jumped out at me), though he always goes on to ferret out what manipulation was performed, because that's what the whole blog is about.

Anyhow, people have done lots of tests with lots of different tools to see what they report in different circumstances. So this isn't the first time that someone has used ELA or something. You will notice if you read past explanations that he always tries to find the exact manipulations done, rather than just running a tool and declaring something to have been Photoshopped.

I'm no expert, but I've read his blog for long enough to know that he knows an awful lot about the various quirks of many different image editing programs (Photoshop being by far the quirkiest). But if you still have questions, you can always email him. He was kind enough to reply with a lot of useful information when I asked him something a long time ago. Actually, I think he even blogged about it.

It doesn't matter how good your intentions are if you place human intuition into the loop you will fool yourself.

That's true, but trivially so. Things that aren't infallible can still be useful, but understanding their limitations is important.

If you read that blog, you can find plenty of discussion of those limitations, for example, how the absence of markers of digital manipulation does not prove that an image is genuine. After all, there are plenty of staged photos out there, images that were framed in a misleading way, etc.

I'm not making a trivial point though. I believe the tools, techniques and expertise this guy professes in image analysis are modern day snake oil. I just think that he has deluded himself instead of being deliberately dishonest.

How then do you explain the fact that his methods have worked? He has outed doctored photos that appeared in news stories and they have been retracted after investigation.

That's not quite what I'm saying, I don't think his tests are scientific not that they show absolutely nothing (although probably fairly close to nothing with ELA). I also haven't doubted his ability to detect fakes but I do doubt the particular efficacy and explanation of his techniques. There is no doubt using PCA can highlight unusual changes in an image - but there are also other explanations. His famous suggestion that terrorist videos had books inserted into them could simply have been a slightly different coloured spot light.

If his techniques are unreliable, he should have some notable failures by now after having analyzed so many images publicly. Where can these failures be found?

No not really, he techniques may be unhelpful and he still might be good at picking fakes. But analyzed so many images? There is a handful on the site and the outcome is almost never in doubt. You could test his ability to detect fakes but even if it were supreme you really couldn't separate that into the portion provided by his skills and those of his techniques.

Much more interesting would be to find out what the techniques really show, which faking techniques show which signatures and what other natural occurrences mimic that.

What about the times he found cheating by the winners of photographic contests? Those were hardly obvious choices. And falsely accusing someone would have really hurt his reputation. Of course, investigation proved that he was right. He analyzes something every few weeks it seems like (I've read the blog for years now), so yeah, there's a long history for you to look at. The outcome is only "never in doubt" if you're using hindsight bias. Besides, even when he already knows an image is fake, he figures out what is fake and how it was faked. One example would be showing which of those lottery numbers was real: the whole row was fake... except for the last number. Something which helped explain why the 2nd and 3rd photograph were the way that they were.

He's quite up front about the fact that some tests give inconclusive results. I believe that he has discussed the limitations of the tests. But even a test that gives an inconclusive result half the time can be useful if it shows you which areas of the image need attention the rest of the time. That data is indeed useful. He has talked about it. And you could just politely email him if you wanted to know more.

And yes, it probably is hard to separate the success due to technique and skills. But it's hard to believe that he would be good at picking out fake images without understanding why he was good at picking them out.

Well I enjoyed the discussion but I don't think I'm making the point especially well anymore, its not about picking fakes, its about reliably saying something about the numbers return by a mathematical function on an image. It needs more study and less experts making consulting money from their special magic.

This site has a GIMP script for doing Error Level Analysis: http://sites.google.com/site/elsamuko/forensics/ela (After saving it to your ~/.gimp-2.6/scripts folder, the tool hides in the Image menu.)

I just tried it on a couple of my own images, the results are very interesting.

Can you use Photoshop filters to do this, or are there more professional programs to do so?

Slightly related, I was wondering how the actual lottery tells if your ticket is real - do the machines record your ticket's serial number and number(s) chosen and then send the info to the lottery HQ? Or do the lottery machines compute an HMAC of some kind and encode it in the serial number on the ticket itself?

Considering they can tell which vendor sold a winning ticket, they must have a central database logging the numbers chosen for each ticket. Also, considering the amounts of money involved, I can't imagine them doing anything less secure.

For "online" games (like this one), your picks are transmitted to the lottery office which stores them and then transmits the serial number back to the terminal, which prints the ticket. This prevents a ticket from being issued that the lottery's central computer doesn't know about.

When it is time to verify the ticket, they simply send the serial number to the lottery central computer and that computer sends back the payout command (and invalidates that serial number from being paid by any other terminals.) The retailer is supposed to do some basic sanity checking for small prize amounts (is the ticket really on lottery paper, etc.) and obviously for larger prize amounts, there are several other methods to verify the authenticity of ticket stock that aren't public knowledge.

The lottery HQ has records of all tickets for a particular lottery, which gets checked against when you claim a prize.

I'm not sure whether the has-this-ticket-won-anything machines phone home to check it against those records, or whether they just assume the ticket is authentic and leave the proper verification for when a prize is claimed.

The lottery machine creates a random number that is attached and encoded to your ticket and both the number, and the random bit are sent to HQ.

In California at least one ticket was verified by closed circuit video of the person buying it.

I don't know about this US lottery specifically but in most systems everything is recorded at HQ. Not just the numbers, but the place, the date and time, serial number etc

That way they can also check with CCTV.

It's an interesting topic in database design because the link to the store may fail at any time (remember lotteries started years ago on dialup) and you must not have an issued-unrecorded ticket at any point in the process

From my own experience with buying them, they solve that by saying "machine is down, i can't print any tickets". Not a very high tech solution but it certainly solves that problem completely.

Then why did it take so long for them to announce how many winners there were? First they said "at least one", and then they said "three". If everything was stored in a central database, they could have found out the exact figures (and a lot more) as soon as the last winning number was announced.

Maybe it's due to lack of coordination between states? Or maybe they need to do fraud checking, etc. by watching CCTV footage from the point of sale?

It's because of coordination between states. From the MegaMillions website:

> Unlike some multi-state or multi-country lotteries that have central offices, all Mega Millions duties are shared by member states as part of their membership in the game.

Each state handles their own tickets and they don't all use the same software systems to manage them.

I doubt they use recordings for fraud prevention. I'd be pretty upset if my winning lottery ticket were denied because the store lost the recording, or never had them. It's just too costly to record every purchase.

They don't insist on CCTV but it's a useful tool.

Actually the major source of fraud in most lotteries is the store owner . For small wins the ticket is taken back to the store to scan it to check for a win, the store owner will tell the buyer that it lost, or that it only won a much smaller prize, pay that out of the till and then claim the prize. Especially in poor/immigrant communities where many player may not speak English or have internet access.

There was a story on here about an analyst for an oil company who worked out the random number sequence for a lottery in Ontario. He also analysed the winning claims and discovered that certain stores where claiming a disproportionate number of middle wins.

Most of these lotteries date from the late 80s/early 90s and there isn't a lot of profit in updating them.

I went to some talks about the database design for the UK lottery where they admitted they didn't think it would be so popular and so hadn't really considered sharding.

They also were surprised that the number weren't uniformly distributed. They assumed numbers would be picked totally randomly while the first win was all numbers that could be birthdays and so there was an unusual number of winners and it took several days to work out how many.

In the US, each state hosts its own lottery program. Mega Millions and Powerball are programs where states have teamed up to pool their resources and gambling markets.

I've sold Washington State's "lotto" tickets, and you're entirely correct about the reliability of the system - if the device can't connect to servers in Olympia, tickets can't be sold.

CCTV is up to the retailers here, though it's entirely possible that CCTV is a requirement to be an eligible lottery vendor.

not directly related, but this made me remember some reading I did awhile ago on EXIF analysis.

a really cool tool for EXIF http://www.sno.phy.queensu.ca/~phil/exiftool/

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact