Hacker News new | past | comments | ask | show | jobs | submit login
Fuzzing between the lines in popular barcode software (trailofbits.com)
179 points by ingve 11 days ago | hide | past | favorite | 55 comments





> You might ask: how do you know whether or not software has been fuzzed?

zbar has great barcode reading performance! I've seen far newer software that's nowhere near as good in terms of real-world performance.

But it seems the original developer hasn't updated it since 2009 [1] - and fuzz testing only rose to prominence in ~2012 with the rise of tools like afl-fuzz.

I would be absolutely astonished if it had ever been fuzzed.

> Cut out any unnecessary features to limit attack vectors. ZBar by default scans all code types, which means that an attacker can trigger a bug in any of the scanners. If you only need to scan QR codes for instance, then ZBar can be configured to do so in the code

Absolutely sensible, yes.

Not just for security, but also because packages sometimes have extra barcodes. If you're scanning an EAN-13 on a pack of pasta, decoding a QR code for a pasta recipe website is just going to confuse things :)

[1] https://sourceforge.net/projects/zbar/files/zbar/


I've seen the "overzealous barcode scanner" issue happen with some gas station POS systems, to the point where the seasoned cashiers know to cover the QR codes with their fingers before attempting to scan an item.

Sounds like the POS software isn't controlling the reader well, maybe because it wasn't adjusted for this model of reader. Or the reader's firmware could have been misconfigured, from what it's supposed to be for that POS setup.

The modern reader firmware tend to have multiple modes and many options. Some modes are as simple as "scan whatever you see out of the many formats you support, and spit out the decoded value of something as USB Serial". Or, worse, "...as USB Keyboard".

You can imagine how easy those modes are to integrate with POS software, without implementing the proprietary protocol for that device, and you can also imagine how poorly that can work out.

If you owned a store with a POS setup with flaky reader behavior like this, and were stuck with it, you could try reconfiguring the reader (to, say, disable QR support). This reprogramming can sometimes be done via documented protocol, via sketchy Windows software, or via... barcode... Careful you don't make it worse.

(Our startup used modern readers (multiple 1D formats, QR, NFC) for a factory station, and had to do a lot of experimenting with different brands and models, to get the behavior and speed we needed. We even managed to brick a reader, just with configuration changes, not flashing firmware.)


The shop may use QR codes for coupons or loyalty programs even if the merchandise doesn't use it. So being able to scan these items without switching mode is often an important feature.

I went to a meeting the other day in a building with a touch screen registration system. The woman in front of me was struggling with it. Every time she tapped the register button the system decided that some part of her was a badly formed barcode, printed an error message and exited back to the menu. She eventually got it working by moving to the side until it wanted to take her picture.

Absolutely. I helped with a physical inventory count project using smartphones as the "terminals". The barcode app we didn't allow us to selectively turn off symbologies. We ended up with a ton of links to recipes, websites, etc in the data.

Reminds me of the Jurassic Park novel where they ask the computer to find 10 velociraptors on the island and it finds 10. And they actually have 20.

The Jurassic Park novel had some of the best depictions of technical failures like that, IMO. The scene where they realize they've been running on backup power and everything goes down is another good one. I won't call them realistic per se, but they just felt right. Andromeda Strain, another Crichton work, was also pretty good at this.

The Andromeda Strain is the most underrated of Chrichton's books, to my mind. The movie adaptation is very good, too.

They can run the whole park with minimal staff for up to 3 days. You think that kind of automation is easy? Or cheap?

It is when you outsource to one Nedry and his coke habit!

Can you really blame the computer tho? That sounds more like a case a PEBCAC, if you ask me...

More like bad requirements. The system knew how many dinosaurs of each species had been released into the park, and the inventory system was only supposed to figure out if any were missing. No sense in looking further than that, after all where should the additional dinosaurs come from.

That was the main theme of the book. Everything was well designed with failsafes, but too many of the design assumptions turned out to be wrong. Expecting only the expected lead to many small mistakes that were harmless individually but together snowball into a disaster.


Crichton novels are excellent at that kind of technical dystopian/disaster.

(Well, not obviously dystopian, more ‘oh shit, that is how we’d be fucked isn’t it). Alien and Aliens also had a similar feel in their writing, except in real life there is rarely a Ripley there when you need one.


It's also a common annoyance in grocery store apps.

Kroger, for example, has an app that allows you to scan items to add them to a virtual cart as you shop and avoid scanning them at the register... however the same app is used to read QR codes on in-store coupons, which are "helpfully" placed very close to the price tags with UPC barcodes on them.

If I want to scan one of those coupon QR codes, I need to either start with the camera very close to the QR code or cover the barcode with my finger.


It appears to have been forked: https://github.com/mchehab/zbar

That's the repo Trail of Bits was working with; the PR they ended up submitting is at https://github.com/mchehab/zbar/pull/294.

I once reported a bug to a barcode decoding library, reporting that it crashed when the barcode contained a zero byte. They responded that they wouldn't fix it because barcodes aren't supposed to contain zero bytes.

"But it crashed. That's bad. I can't stop people scanning bad barcodes."


"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning." - Rick Cook

Combined with the all to human reflex of engineers to insist that it isn’t their implementation/design that is wrong, it is reality which is wrong. Clearly.

Because if we just didn’t do that, then it would all work.

In particular, see folks talking about Self Driving hah.


> They responded that they wouldn't fix it because barcodes aren't supposed to contain zero bytes.

Sad. What a poor understanding of our field.

The number one rule of them all is: "Never trust (user) input".

A slightly more powerful variation being: "assume all input is malicious until proven otherwise".

I mean: on one hand there are people who fuzz, who test, who think about edge cases, who think about security, who think about uptime, etc. And OTOH you have people saying "such input shouldn't happen". It's just really pathetic.


I think a difference between an application and a library (or module, etc) is that it is ok for the latter to expect sanitized input and be wrapped in try/catch blocks. The world is less finite than code and a module might be deployed in a variety of contexts which might make some checks undesirable.

In computing, the robustness principle is a design guideline for software that states: "be conservative in what you do, be liberal in what you accept from others". It is often reworded as: "be conservative in what you send, be liberal in what you accept". The principle is also known as Postel's law, after Jon Postel, who used the wording in an early specification of TCP.

https://en.wikipedia.org/wiki/Robustness_principle


If that's the case, the library should also have another function or method that can validate the barcode if the application should so choose. The library is the barcode expert, the app is the business logic expert. Expecting every app to now become barcode experts doesn't make sense.

Also, that law gets quoted, and IMO is a rather large design mistake.


The library also has the best chance to fix and prevent security issues systemically. I have played this game for a while now. Library engineers often want to pass the buck onto users of their tools. That is not good developer or user experience. Also crashing is the opposite of robust.

Malformed data is a fact of life. A parser should gracefully fail when this eventuality happens.

Do you by chance remember which library, and which barcode symbology? (barcode library developer here :-)

I do remember it was a large 2D barcode. Like QR but with a square in the middle. (AZTEC?)

I was trying random barcodes I had lying around to test my own component. The one with the zero byte happened to be a large one they had added to my passport when I visited the USA. It had "US-VISIT" printed next to it in big letters.

The device was a rugged industrial handheld device with a screen and a camera, designed for mailrooms and warehouses. This was around 20 years ago and I remember the OS (including the barcode component) was completely bespoke and it ran without any process protections. This meant that the barcode would crash the whole device and you had to perform a hard reset.


Square in the middle sure sounds like Aztec. It‘s used alot for airline boarding passes. What‘s more common with zero bytes instead of crashes is truncation… some part of the code assumed the zero byte terminates a string. Thanks for replying!

> Surprisingly, libFuzzer struggled to figure out that input should be of size 1024 and couldn’t start fuzzing.

Is this surprising? Does libFuzzer support Redqueen or laf-intel like AFL++ [0][1] which will pick up on any comparisons (like a comparison to size=1024) and fuzz with the intention of changing that comparison to become true or false (to put it overly simple)?

0: https://github.com/AFLplusplus/AFLplusplus/blob/stable/instr...

1: https://github.com/AFLplusplus/AFLplusplus/blob/stable/instr...


libfuzzer has features to solve comparisons including a comparison table and value profile. in either case, it should be pretty easy to find that a 1024 size input unlocks new coverage without any of those fancy features. i doubt that was the problem here.

If I wanted to learn more about fuzzing, does anyone have suggestions?

I'd love to get to a point I could fuzz a program but the gulf of execution is vast -- I enjoyed attempting OSCP, but I can't keep paying for lab extensions.

(I also have a gut feeling there's a lot of unfuzzed apps which people don't look at because they're utilitarian and don't use the network much. So if I can phish you, then leverage some innocuous tool for RCE or whatever... useful.)

But I've struggled to find resources on this topic -- anyone know of a book, course, or wiki?


The authors of this blog (FD: my company) have a testing handbook[1], which has a full chapter dedicated to fuzzing[2]. We're always open to feedback on it!

[1]: https://appsec.guide/

[2]: https://appsec.guide/docs/fuzzing/


This is great - thanks for posting!

I would start with the AFL++ documentation (https://aflplus.plus/features/), and an open source program that you want to fuzz. The easiest programs to fuzz with AFL are ones that parse a file format from the command line, the smaller the better and written in C or C++ (just for ease of recompiling with instrumentation).

Parsing network protocols and ABIs is possible, but usually requires a fair amount of coding.


>The easiest programs to fuzz with AFL are ones that parse a file format from the command line, the smaller the better and written in C or C++ (just for ease of recompiling with instrumentation).

Thanks, this is useful context -- it's easy to get overwhelmed and quit early on with these sorts of things. It looks like someone else posted a set of exercises[1] using AFL that seem to be aimed at smaller programs like you describe.

[1] https://github.com/antonio-morales/Fuzzing101


LLVM ships with a fuzzing library, docs at https://llvm.org/docs/LibFuzzer.html. I get the impression that AFL is considered better. The authors of llvm fuzz stopped working on it in favour of some other thing, which they then stopped working on in favour of https://github.com/google/fuzztest, which seems to be broadly useless as a fuzzer implementation. But whatever, the llvm fuzzer lives on and has uses in tree and occasional updates. I found it much easier to get started with than AFL.

I wrote a program that takes a byte array as input and drives the library under test with it, attached that to llvm's fuzzer and left it running. You end up with a lot of files containing some bytes that did something vaguely interesting with the program. Good experience overall.

You might get some meaning out of https://github.com/JonChesterfield/bigint/tree/trunk/fuzz_bi... but ymmv, I got sidetracked by interesting stuff at work ~3 months back and don't currently remember what state that repo was in when I paused work on it.


> get the impression that AFL is considered better. The authors of llvm fuzz stopped working on it in favour of some other thing, which they then stopped working on in favour of https://github.com/google/fuzztest

Thanks, this kind of social stuff can be useful -- it looks like all the resources folks shared seem to favor AFL.



I'm learning about fuzzing too, and I just wrote a tutorial about what I learned so far.[0]

The issue I found with a lot of fuzzing tutorials is that they're difficult to reproduce because there's a lot of work in setting up the environment and toolchain. In my tutorial, you can kick off fuzzing with one command, but I also walk through how I created the workflow step by step.

[0] https://mtlynch.io/nix-fuzz-testing-1/


Andreas Zeller has written a great online fuzzing book covering different SOTA fuzzing techniques: https://www.fuzzingbook.org/

I don't quite follow the input - does this mean they created Barcodes or Data Codes that crashed the library? I.e. something that I can print out and that might break a few devices if printed on, for example, my luggage before checking it in?

Crashing the library - and potential arbitrary code execution!

However, zbar isn't used all that widely in industry. The airport's baggage handling system is much more likely to have a self-contained scanner from Cognex or Omron or Zebra running propriety, closed-source software.


You got it. Crashing the device where the barcode is being interpreted (and possible getting arbitrary code execution).

Secondarily, there's probably also a rich vein to be mined scanning barcodes like "'); DROP TABLE Item" that would exploit systems further up the chain. That's not what this article is covering (since they're just looking at the barcode scanning library).

There would be some fun in carrying around a bunch of "edge case" barcodes ("programming" barcodes for various kinds of scanners, SQL injection attacks, etc) and feeding them to unsupervised barcode scanners "in the wild" to see what happens.


My interpretation of the original article is they use the fuzzer to find an arbitrary very small bitmap input which when passed to the library causes it to crash. It’s unclear if the input image is even a valid bitmap image format that would correctly open in an image viewer.

This is definitely still a problem because there might be situations where you’re allowing an end user to pass an image file in and are then passing it unmodified to this library to interpret the barcode in it, but it’s not the same as some special barcode that encodes data that crashes the library.

So for example this blog entry does not describe a situation where you can just print out a barcode and when you scan the barcode then the library crashes or has the opportunity for arbitrary code execution. That would be a very exciting exploit. They don’t actually rule out the possibility, but they didn’t get anywhere near fuzzing at that level in this blog post.


I'm working with barcode scanners and difficulties handling a variety of inputs.

My boss keeps telling me "it's not that difficult". I keep telling him "it's more difficult than you believe".


I think this really demonstrates how valuable nixpkgs is. It’s the Wikipedia of building packages, and 10 years ago I wouldn’t believe it could exist, or be this good.

Only slightly related but on the topic of barcodes and security I'd like to recommend this excellent talk by Felix Lindner, it is quite a few years old but I'd guess stuff like barcode scanners are not the most frequently updated things:

Toying with barcodes - https://www.youtube.com/watch?v=QCtdEYnlykA


Kind of sad to see that the library "custodian" as it were seemingly uninterested in fixing the software in question. This may not effect most commercial scanners but the fact that it is even out there in wild is a bit disconcerting to say the least. Just another "brick in the wall" insofar as supply-chain (in)security goes....

This is extremely common. Otherwise licenses wouldn't include clauses like

> 15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO > WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. > EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR > OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY > KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE > IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE > LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME > THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

They're not required to fix anything, and by including that disclaimer imply that they won't necessarily even intend to fix anything. They disclaim liability, and you, the user, "ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION".

Proprietary software pretty much always has similar clauses too. It's not an issue with open-source, it's an issue with software in general.


There could be any number of reasons for that apart from negligence. AFAIK it’s a single person, so „bus factor“ comes to mind.

Fork and steal users, and pull their new changes until the totality of patching new pulls into the new project becomes too arduous, then let the original project and author float into the sunset as you are the new big kid on the block and have the bully pulpit!

[flagged]


Regardless, it's still an interesting glimpse into fuzzing, for those of us who know little to nothing about it.

See, I've never tried to do barcode decoding in software via images - I've always used an imager with internal decoding.

That just punts to software in the device. That... could be better if it's contained to the device, but that's a big if, and even then the problem can still occur, it's just that you hope the damage is limited to needing to restart the device or so.

it usually is? you still need to do some manner of input validation on the decode. It helps when the barcodes you're reading have a known structure - then you can validate for the structure and its pretty easy.



Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: