Hacker News new | past | comments | ask | show | jobs | submit login
Null (popey.com)
726 points by JNRowe on Jan 14, 2021 | hide | past | favorite | 191 comments



> I went through a phase a while back of holding down keys to see what they did.

Back when I was a gamedev at EA, one of the things QA would do is button-mash test the games. Just smash as many buttons as they could at the same time at all sorts of random points in the game. This was a constant source of bugs. It was surprisingly easy to get the game into a state where it was totally hung because of this.

One of the main culprits was transitions between screens in the UI. So much of the UI code assumed that the initial state of a screen is that no buttons are currently pressed. But if you mash a bunch down in the middle of a transition, the screen can end up receiving a button up event that did not precede any button down event. If the screen's code assumed every up has a preceding down, it could get into a broken state.

I never did see any clean systematic solution to this problem. I still think about it a lot when I do UI programming. In the back of my head, I'm always wondering, "what will happen if the user presses X in the middle of this animation?"

Programmers are particularly prone to these bugs because we have unconsciously trained ourselves to baby our own software. We're careful to wait for transitions to complete and only send input when the app is in a known state.


No good generalized solution for this it seems. The best you can do is to specialize per situation and leverage all the assumptions you can while being cautious not to leverage the assumptions you can’t.

This problem contains lots of incidental complexity and it doesn’t help that folks do the opposite of the above and add a ton of accidental complexity on top. For example, for a situation with a crossfade or morphing set of 2 buttons you should:

- leverage the assumption that there are only 2, and static, buttons. - not assume that there are only 1 button present at any time.

Some end up inclined to create some “reusable” abstraction on top and end up doing the opposite: - generalize to handling n dynamic buttons. Without realizing that some important properties of that specific situation would have been lost. - (usually due to the lack of experience/focus/interest on the problem) oversimplify and assume there’s only 1 button present at any time.


If it was up to me, there were no transitions. I am happy I can disable them on Android... Everything that does or manipulates animation is terrible. Scroll hijackers, image carousels, icon transitions, they are all wasting cpu cycles to please UX/designers!


I'm glad there's at least one other person vocal about doing this. I use a several-year-old Android phone, and when anyone new sees me use it, the first thing they ask is what phone it is and if I just bought it.

Those animations were for a time and place where phone processors were genuinely slow. That time and place is nearly a decade ago, and most of the world is stuck with lower productivity due to this vestigial cruft.

Think of all of those teens that can type a billion letters a minute without batting an eye, and yet have to unwittingly endure all of these silly transitions...

But of course, it breaks a whole bunch of poorly tested apps, even mainstream ones like Uber and Lyft.


Sounds right, I've even had a lot of modern games crash or hang when the action gets hot and I'm spamming inputs (Path of Exile was frequent, but even Hades crashed the other day like this).


Couldn't you just stop receiving keypresses when a UI transition begins and then re-enable input once the new UI state has been established?


Yes, but it still means that when input is re-enabled later, the buttons are in an unknown state and you don't know if you'll be receiving unexpected button up events.

Also, it's very frustrating for users when input gets dropped on the floor at times where they don't expect. Determining when to queue that input and when to forget it is a real art. Humans have subtle, complex expectations there and there's no simple solution.


The problem is "key down" and "key up" are usually separate events. If you write software in a way that you always expect a "key up" to be preceded by a "key down", you'll have problems like the above mentioned.


Seems like a great place to use state machines. That’s not a total solution, of course: if “hold X” and “hold A” have very different behaviors, developers still have to code those states to be exclusive...unless they shouldn’t be. Ain’t easy, that’s for sure.

A robust solution should probably use state machines coupled with generative testing, so that all of those unexpected combinations can be generated and handled before the testers try button-mashing, without relying on developers to think of and write all the individual test cases.


of course, but that doesn't stop a case where a button was already pressed down when input get re-enabled which would mean a button up event would still fire without a corresponding button down.


>While I’m not a QA or security professional, I have developed a knack for doing “stupid” things with software which causes it to malfunction.

A person after my own heart.

I've had many a dev go "why would you do that"

In which I answer "it doesn't matter, but if you accept my input it's your job to ensure the app doesn't crash"


We have an excellent (and big) QA department, but 13 years ago when I started at this company we were only just beginning to hire dedicated testers. We had a mature product which was a communication handset and it worked well and was stable. Our software engineers had pressed every button they could think of in every menu and there weren't any problems.

Then we hired Kevin.

Kevin had the handset for 40 minutes before piping up "crashed it". The lead comes over to have the sequence explained to her, and says "huh, nice edge case". Half an hour later "crashed it again" (in a completely different way). Explains the sequence to the lead again. An hour later this happens again and he explains the sequence and she finally bursts out "Why would you even do that?! How did you think of pressing those buttons like that with that timing?!!".

Good testers just think differently than software engineers.


I'm pretty sure I subconsciously try to use my software safely and as-intended cause I don't want to crash it. Obviously saying out loud I know this is dumb, but why would I want to break something I created?!

There aren't any bugs as long as I don't look for them!


A good tester looks at all the expected cases, and then infers all the cases that exist in between those. They explore the negative space between what we're supposed to do.


> Good testers just think differently than software engineers.

I'm sure in some cases it's useful to detect all possible crashes, e.g. to make an app as secure as possible. In other cases I'd watch out for diminishing returns; perhaps instead of "think differently than software engineers" it would be enough to "think in the similar way as product users".


There's a consideration about number of users. If 100k users are using your product a lot, in a similar way to the 'million monkeys' thing they're accidentally going to find bugs.

Software engineers tend to use products in a consistent way (based on how they know it's meant to be used) whereas good testers explore the space of possible inputs in a much more 'creative' way.


> good testers explore the space of possible inputs in a much more 'creative' way.

My point is that the extent of testing should depend on the actual product; eliminating all bugs is not the primary goal of every team.

Some companies might decide on other goals and prioritize e.g.: just paying users (giving them better support to resolve issues), or acquiring new users, or something else.


For a long time I've had my full name as user name on my machine, which meant that my user profile path contains both a space (evil) and a non-ASCII character (even more evil, although it's in Latin 1 at least). A lot of things break on one or both of those things and at times it's a bit annoying to deal with. Some bug reports have also been closed as "Won't fix, just don't do that. Who needs spaces in paths, anyway?". I haven't tried to use non-Latin Unicode in my user name and profile directory, which would break everything that uses the old ANSI APIs on Windows instead of the Unicode ones, and that's probably way too much breakage. But broken nonetheless, and that includes a lot of new, recent, and still-maintained software :-/

In a similar vein, I've also used U+2212 as minus sign in my regional settings. There's a lot of software that refuses to parse numbers it previously happily emitted.

I've given up on that too, by now, though. The only thing I still do is using English as UI language (so I don't have to deal with bad translations of software), but German as my regional settings (with ISO 8601 dates). There's a lot of software out there (I think GNU gettext is broken in that way on Windows) that assumes that the way I want my dates and numbers formatted has any bearing on the language I want to see in an application. Many others don't care about the regional settings and use the UI language to also format dates, times, and numbers. That's annoying, but at least nothing breaks, so that's the only deviation from the standard user I still use, to still be able to work.


Mine too.

I write a lot of javascript and the string "null" is pretty harmless in most code. But there's all sorts of fun bugs (and often security vulnerabilities) you can find if you make an identifier "__proto__". (If code ever uses that as the key in an object, you're off to the races!)


Nice, I never thought of testing with "__proto__"! "constructor" is also a fun one.


Why do you require software to be more resilient than other things?

If I pour water in the gastank of my car, it will also fail to drive. Or gas in the sprinkler tank. So the car should somehow prevent the enduser putting the wrong thing in the tank?


The answer to this should vary by the functionality of the software. If I have a program that calculates projected weight loss and it's only used by me on my computer, crashing or displaying obviously incorrect values from accepting negative values for weight or body fat percentage only impacts me so it's mostly a user friendliness issue. If I'm entering my billable time into my employers time tracking system, anything that could corrupt data impacts more than just me. While I could intentionally lie about my billable hours without detection, I shouldn't be able to accidentally or intentionally enter that I worked -5 hours or 5000 hours on a particular day. Accepting either is undesirable and impacts others.

I'm confused why the parent comment is being downvoted. It's a valid question. It might sound naïve to some but it's still worth discussing.


If cars were used like software, fire engines would be dragging around truckloads of cars turned sideways with their gas tanks filled with water. Long-range drones would be flying around filling tanks of random parked cars with glue/sugar/squirrels/nuts covered in glue/sugar/other cars/fire engines/the tank itself.


We require software to be resilient because it is used as a building blocks for large (sometimes exteremely so) systems. The deeper in the stack something is, the more costly the failure. A human may not input any of those strings. Another software (which does not reason in any way about the data it is processing, unlike the human) may do whatever.


If you look at the bugs in the article, they're not that foolish. They're basically perfectly valid things to be able to do: entering text in a text field, or pressing buttons to do things. There is a specific set of valid input for your gas tank, but anything that is text should be accepted for text input.

The analogy would be something like that:

- if I throw spaghetti on my windshield, my car shouldn't break down

- if I hold the wiper's stick to the position that runs it once (instead of putting it in the position to continually run) my car shouldn't break down


If I pour water in my gastank and it gives me the private social media posts of a million people, that might be a problem for more than just me.


> Why do you require software to be more resilient than other things?

Because software by virtue of not requiring physical access is much easier for bad actors to mess with.

Abuse of such also seems to be classified very differently than abuse of physical systems by human brains e.g. almost no rando would think of putting sugar in your gas tank while walking near your car, but nobody blinks at fucking with your input fields.


I think the answer is - because the software can. And most of the time, the software can be more resilient with a trivial amount resource usage.

If there was a physical device that could filter gas and non-gas liquids that could be installed in a car we would expect car manufacturers to do that.

I have seen software that puts the onus on users to use it correctly "Hey user, don't enter more than 5 items in this list". Because the software can't be bothered. And if the user enters 6 items in the list and the software crashes and the user loses all their work, everybody can point at the user and tell them it's their fault for not following directions. But personally I'd be embarrassed if that was my software.


If there was a physical device that could filter gas and non-gas liquids that could be installed in a car we would expect car manufacturers to do that.

There is one thing they could do to stop people putting the wrong liquid in their car. That is to key the nozzle to only fit cars which support their fuel, so petrol pumps nozzles only correctly fit petrol cars, diesel pumps only fit diesel cars.

Completely agree with you on you main point, software should be made correct and resilient (more than other things) because it can be made so.


Having owned a diesel car, I can confirm this is actually the case. The nozzles where I live are are keyed to prevent mishaps.


Holding down the button for your hazard lights should not break your car. Holding down a key should not crash an app.


Holding down a pedal probably is going to crash your car.


Only 1 in 3 chance


I see you don't drive an automatic.


I think I can sit in my car with a foot on the brake pedal all day long and it won't break the car.

Your proposed problem will probably only affect bad spellers, who have a break pedal.


Because software has hundreds or thousands of tanks, compartments, cogs, levers, buttons and switches. And if it's not designed robustly, using any one of those things slightly wrong (or even right, but there's a bug in the software) can make the entire thing disappear in a cloud of smoke, taking your groceries with it.


The funny thing about this comment is that it is also applicable to the software in the car. The amount of software in today's car is mind boggling, and it grow exponentially as more and more features make it to the public roads.


Absolutely! And we're even less tolerant of software errors (even ones without safety implications) in vehicles (and other appliances) than on computers.


“Using it wrong” is usually quite obvious with real-life machinery. Software has many more ways of using it wrong that are very unobvious, that the user has no reason to suspect could be dangerous, and that have never been seen before.


Pumps for different types of fuel have differently shaped nozzles here, making it very hard/impossible to fill up with the wrong kind.

Of course if you're trying to break it, everything is possible.


I don't know where your "here" is, but here in the USA, you cannot put diesel in a vehicle's gas tank, but you can put gas in a vehicle's diesel tank. Putting gas in a diesel vehicle is very bad.

Reason: The gas nozzles dispensing unleaded gas were made smaller to prevent people from putting leaded gas into a vehicle that required unleaded gas (which would poison the catalytic converter). The diesel nozzles remained unchanged and leaded gas (with the big nozzles) went away.


Consumer diesel vehicles often have a mechanism to (try to) prevent the insertion of a gas nozzle.

Source: I had one of those “evil” Jetta’s


As imoverclocked mentioned, some vehicles (VWs mostly AFAIK) have contraptions in the filler tube to attempt to prevent filling with unleaded gas. Of course, they don't quite work (at least the retrofit ones don't), which I know because some of the diesel pumps have the unleaded size nozzle, and I've filled with those, it just takes a lot longer.



Sugar is a fuel and it just slides right in to the tank. Try it.


maybe not relevant here but my first thought to your question out of context is scale. My house can be easy entered by anyone determined to enter it. They can bust the door down, break the windows, crash a vehicle into it. And yet, almost no one is actually trying to get into my house. Conversely, 1000s of people and possibly hundreds of thousands of bots are trying to break into any software they can that is exposed on the internet (or possibly exposed in other ways like I have no idea if every app on my PC/Mac/Phone/Tablet is scanning my network for devices with known exploits)

So, the security of my house (at least where I live) does not have to be so resilient but the security of much of my software does.

https://www.youtube.com/watch?v=VPBH1eW28mo


With physical things, if I do something stupid, I blame myself. With software, if it allows me to do something stupid, I blame the software. Unfortunately, the same mindset carries for clients, employers, and other various people using software who will report said stupidity to me, my client, or my employer.


Like, putting water on the gas tank is obviously wrong, as it's called a gas tank, and gas cars don't tend to run on water.

Clicking a button that was enabled for me and does many things in the background is not so obvious.

Also, cars have been accessible to everyone way longer that PCs and smartphones. Most people alive today (in developed countries/areas) saw their dads driving when they were kids. A person in their 60s didn't have a PC when growing up


I think not even most people in their fifties did. Forties, maybe.


There are also warning labels all over the various fill caps on a vehicle. "Unleaded gasoline only". Which are like real-world input validation. It's just not feasible to check the fluid before allowing it in. But is is feasible to add a data validation layer to your application.


The car equivalent would rather be something like entering "null" as address in the car's GPS and this causing the entire town's traffic light control to crash. Wouldn't be that funny, would it?

Then there's just annoying stuff like the case where someone paid to have "null" on their car's license plate, which suddenly caused him to get all traffic tickets that could not be correctly addressed.


Because with software you can’t generally predict the consequences, even more so as software tends to not be static, but evolves and starts to interact with more and more other software.

Validate your inputs. Be very careful with in-band special values and escaping syntax. Don’t make any assumptions about what is “reasonable” input. If you have to make assumptions, document them and validate all input for conformance. Always check what requirements and preconditions the code you call has on the values you pass to it. Don’t just make assumptions about it.


Because it's just software. You can just change it and it costs nothing. Redesigning the powertrain in your car to be resilient when non-gasoline is introduced into the combustion chamber is expensive. Redesigning software is just changing some code; you just need to say something like 'Zoom. Enhance.' or 'It's a Unix system! I know this!' and it will take care of itself after a dozen rapid random keystrokes.


> If I pour water in the gastank of my car, it will also fail to drive.

But I'm pretty sure the manual say you shouldn't do that. Like microwaves say you can't put living animals in it.


Because often times crashes in an app lead to information exfiltration or remote code execution.


Security implications, I would imagine.


There's a few answers here.

First off, we expect all things to be as resilient and reliable as makes sense via a cost benefit analysis. If it's cheap to fix, and expensive not to fix, we expect it to be done. If it's expensive to fix and cheap to ignore, we expect it not to be done. And of course, if it's impossible to fix, we definetly expect it to be ignored. :)

So, we expect that cars should NOT catch fire when rear-ended, because it's possible to design them not to do so, it's not that expensive to design them not to do so, and innocent people could be seriously harmed through no fault of their own.

But water in a gas tank? I can't think of a way to stop someone doing that. And it would just disable the car if you did it. And since cars have locking fuel tank covers, you're really limited in your ability to maliciously harm other peoples cars.

So in the case of cars "explodes when rear ended" is not okay but "stops driving when you fill the tank with water" is okay.

Software, by its design, is often more fixable than other things. You can filter the inputs to a log in form whereas you can't really filter "things people might put in their kitchen blender".

Second, note that software is, bluntly, a lot less resilient than most things. I've got a hammer sitting in my garage, and it's just going to sit there until I do something with it. It won't randomly stop working, it won't auto-update to a version that is incompatible with my nails, it won't be remotely hijacked by Russian scammers to break into local businesses, it doesn't need patching. There will never be a CVE for this hammer. :) I've had it for many, many years, and I'll have it for many many more, and it will be just as good a hammer in 10 years as it was when I got it. We can't say the same thing for software. And since software is just way more of a dumpster fire than "normal" things, we have to expect that more work will need to be done to counteract that.

(There is, as always, at least one relevant XKCD here. In this case, I think https://xkcd.com/2030/ is on point. The more you know about software engineering, the more you'll realise the entire thing is held together with bailing wire, duct tape, luck, and an intern trying to live edit the production database to fix the data errors before anyone notices.)

Third, and very much related to the last two points, consider the scale. Your car's gas tank, or your building;s sprinkler tank, are vulnerable to various attacks, but it's not vulnerable to being attacked remotely and untraceably by almost anyone on earth via a number of low skill attacks. And of course, software also can yield larger rewards. If the local corner store has a dodgy lock, maybe you could break into it (at significant risk to yourself!) and steal some cash from the till. If you can compromise the head office network of a major retailer, you could steal millions of dollars.

Edit: Also, I'm aware of a nationwide outage for a pizza chain caused by a phonebook. There was an internal webpage that some stores looked at occasionally to show stock levels or something similar. It was quite a slow/expensive page to load, but because it was loaded quite rarely, and only by a small number of internal users, it didn't have a lot of cacheing or rate limiting on it. Someone in one store shifted something on their desk, then walked off. This caused a phonebook to shift, and depressed their F5 key. That caused their browser to start refreshing this page very, very rapidly - multiple times per second. The load from this actually overloaded the central servers, and the entire system went down, stopping orders from being placed or printing out. So it might seem silly to say "hey, what happens when I do this thing I shouldn't do", but actually, over time, all sorts of things that "shouldn't happen" will happen for some reason or other. If the result is that every store of a nation-wide chain goes offline, that is....not great. And software is just way more prone to these sorts of things than others. If you tell me there's a vulnerability that lets people easily open the door of any hotel room at a large chain, I'll instantly bet you $20 it involves smart locks, NOT traditional mechanical keyed locks. (And indeed, that's happened more than once, and it's always been a smart lock to my knowledge.)


A favourite technique of one of my colleague's was just to mash the keyboard randomly to see if the app breaks.

It's very crude and not at all foolproof. For the lack of sophistication it's shockingly effective at highlighting a huge amount of assumptions we make about how software is / can be used.


This even has a name, "monkey testing". Basically some software that pretends to be a monkey in front of a computer and mashes random buttons and keys to make the application behave badly. Usually you only care about the application not breaking in monkey tests. Can be used similarly to "fuzzing" but for UIs as well, see Gremlins.js: https://marmelab.com/blog/2020/06/02/gremlins-2.html



This is how, at age 12 or so, I discovered in the old Windows game “Chip’s Challenge” that you could cheat and unlock (nearly all) levels.

If you tried ctrl+n (I think), you could advance to the next level, but only if you had beaten the current level or had the pass code for the level. In a bout of frustration, I rando-mashed the keyboard and advanced to the next level, and could then ctrl+n to the last level! I could reproduce the effect, but never worked out the actual combination that unlocked it. Good times :)


One of my professors back in college loved to do that. "Shockingly effective" is exactly the right term. Learned a lot from that little exercise.


This makes me wonder if there's a tool that can auto-generate random inputs and send them to the application. (maybe excluding Alt-Tab to avoid switching away from the application)

Fuzzing is a pretty popular testing technique for libraries, but GUI software has not seen the same attention.


GUI fuzzing is a thing, yes. I've seen it mostly on mobile apps but I'm sure you have it for other type of GUIs.

Example of tools: https://www.fuzzingbook.org/html/GUIFuzzer.html


GUI fuzzing was a thing way back on the original Macintosh: https://www.folklore.org/StoryView.py?project=Macintosh&stor...

Something similar should be reasonably easy to build these days using AutoHotKey or the like. I bet it's been done.


>In which I answer "it doesn't matter, but if you accept my input it's your job to ensure the app doesn't crash"

This is how try/catch alls get added :(


There was and still is a website called The Daily WTF about discussing especially funny bugs and programmer's mistakes. In 2012 I registered on it as a user "undefined" to make a comment about JavaScript oddities under one of their articles and almost forgot about its existence, then they migrated their comments and forums to Discourse and in 2015 I got a bunch of email notifications about people mentioning me as suddenly all "likes" in the forum were linked to my profile:

https://what.thedailywtf.com/topic/17637/undefined-liked-thi...


I worked on an API that regularly got requests from the mobile app for GET /users/(null). I think that's Swift, or Obj-C's way of to-string'ing a null?

I have a generational suffix on my name. I often include it, and quite often as the proper Unicode character, e.g., "Ⅲ". (Assuming HN displays it after I post this, try to select it; that's one character.) That wreaks a fair bit of havoc.

When I was in high-school, I took physics. I was assigned to room, say, 309, to a teacher whose name I didn't recognize. But I knew the teacher in room 309, and she even taught physics. So, I approached her, and asked, "I've been assigned 'Ms. Stewart', but it lists her as being in your room. Do you know what the correct room number is, Ms. Cook?" Right room; it was her maiden name, of course.

In my company's HR system, we have to note some contacts, for things like life insurance payouts. My fiancée is one. Then we transitioned to a new system, and the data from the old system was migrated over. Now she's my "fiancée". (And in a separate system, she's a he, because there was no option for "fiancée", only "fiancé".) Similarly (and a long time ago) I had to fix a contact/directory system when it escaped a '. E.g., it would emit "Marie O\'Conner". PHP magic quotes… shudders

(Character encodings and anything outside of ASCII, in particular, are an unending fountain of bugs.)

Just today, Azure's support system can't handle (among many things) the outlandish characters of "<" or ">". Which is great fun, since it's not like anyone would file a highly-technical support request with Azure… right?

The missing hour in the DST spring-forward and the duplicate one on the fall-back are great hunting grounds for bugs, too. E.g., Google Calendar has issues with them.

We have a git branch prefix at work that triggers a special CI action. Let's call it "branchprefix/". Every now and then a dev will make a branch with "BranchPrefix/" and the OS X machines all start having issues since OS X's file hierarchy isn't case sensitive. (We've also had issues w/ two files, same name different case. git supports it, but OS X can't cope.)

(All the names in this post are changed from their originals, of course. But you get the idea.)


> Just today, Azure's support system can't handle (among many things) the outlandish characters of "<" or ">". Which is great fun, since it's not like anyone would file a highly-technical support request with Azure… right?

Someone in Azure is definitely using the Windows reserved characters for filenames.


Reminds me of the fact that filenames in OneDrive are even more restrictive because of some legacy SharePoint software that was representing filenames in URLs.

"?" and "#" were not allowed in OneDrive for Business for a long time because they have a special meaning in URLs: https://techcommunity.microsoft.com/t5/microsoft-sharepoint-...

The Rclone documentation has a full list of problematic OneDrive characters: https://rclone.org/onedrive/#restricted-filename-characters


I think misguided XSS protection is the most likely culprit here.


> Every now and then a dev will make a branch with "BranchPrefix/" and the OS X machines all start having issues since OS X's file hierarchy isn't case sensitive. (We've also had issues w/ two files, same name different case. git supports it, but OS X can't cope.)

FWIW macOS is perfectly fine with it. The FS (both HFS+ and APFS) can be configured to work in CI or CS modes. The default is CI. Since git uses the FS for part of its data storage, things break.

That’s more of an issue with Git not supporting CI FS, really.


A lot of software actually breaks running macOS in case sensitive mode, incl Adobe stuff


True, I was going to put a note on that but apparently forgot or removed it while editing: some software, mainly cross-platform from windows (which is CI) will break in CS file systems, while Unix software is more likely to break on CI.


I was going to say the same thing. Everytime I format my hard drive I have to look up which mode is the one that doesn't break things...


Just as an aside, you can reformat the file system to be case-sensitive on MacOS. I think at $dayjob it's more or less policy to do so.

The only downside I have seen so far is that some software only runs on case-insensitive file systems. For example Photoshop did this last I checked.


Neither does the Steam application (on macOS, not on Linux AFAIK). Thanks to that I could start to live without it. ^^


I still see \’ appearing on large well trafficked websites, like espn.com and cnn.com.


That’s how C usually prints it. Swift will print “nil”.


I was curious, so I tested that[0], and 3 of the 4 major compilers (clang, gcc, and ICC) all output `(nil)` for a `%p` on a null pointer, but `(null)` for a `%s`. Godbolt doesn't support executing MSVC for some reason.

However, it is my understanding that passing a non-null pointer to printf for `%s` is undefined behavior.

[0]: https://godbolt.org/z/7K5qWv


> Godbolt doesn't support executing MSVC for some reason.

That's because the MSVC compiler doesn't actually run on the godbolt servers (unlike all the other ones). The compilation is done on Microsoft servers. It's understandable that neither side is looking to have Windows sandboxes to execute arbitrary code in.

(Disclaimer: At least that's how it worked when Matt introduced the MSVC compilers. Not sure if things are substantially different by now.)


All three are probably using glibc, which does that, yes: https://github.com/ahjragaas/glibc/blob/82cfac84c7e24be587bb.... On Darwin Apple’s libc prints “(null)”: https://github.com/apple-open-source-mirror/Libc/blob/5e566b.... I should also note that passing a non-null pointer to printf is the only correct way to use it ;)


Huh. I feel like I distinctly remember it being both one of our mobile clients, and it being the string "(null)".

Java, perhaps? (That was our Android app, of course.)

Although we did have a desktop app (weirdly) and that was in C++.


Java just prints “null” IIRC. To be fair, the “(null)” you’ll usually get from passing a null pointer to printf is implementation defined (really, undefined, but no implementation I’ve seen has done bad things with it).


> Huh. I feel like I distinctly remember it being both one of our mobile clients, and it being the string "(null)".

Objective-C will do that with the format string "%@" if you pass it nil.


'Just today, Azure's support system can't handle (among many things) the outlandish characters of "<" or ">".'

My favorite way of breaking things is to go the other way... oh, you won't allow < or >? Well, how about < and >? That's ok then? Great!

One I've done several times is encounter a field that "can't be left empty", and is smart enough to filter out the ASCII whitespace before the check... but isn't smart enough to filter out the Unicode zero-width space. "A computer wizard never says too much or too little, he says precisely what he means to."


> When I was in high-school, I took physics. I was assigned to room, say, 309, to a teacher whose name I didn't recognize. But I knew the teacher in room 309, and she even taught physics. So, I approached her, and asked, "I've been assigned 'Ms. Stewart', but it lists her as being in your room. Do you know what the correct room number is, Ms. Cook?" Right room; it was her maiden name, of course.

I'm reasonably certain one of those two 'Ms.' instances should be 'Mrs.'.


Traditionally yes, but there's no legal requirement and many people don't feel the need to make the change (my wife being one).


So, you're saying your wife did change her last name, but not the honorific? That's an interesting combination I hadn't encountered before. Probably worth adding to the "Things Programmers Beleive About Names" list.


Ah no I missed your implication there. She didn't. It isn't unheard of though, some people stay with Ms as a default if they don't think their marital status is anyone's business. It's a less political version of Mz I guess, which is something one teacher I had used and made sure we knew about.


I suppose I assumed that anyone inclined to leave the honorific unchanged would also be inclined to leave their last name unchanged as well.

But we all know what they say about assumptions...


When I lived in the US I was amazed how many systems couldn't handle my (English) surname, which has a dash in it.


People like that is the reason why this list was created

https://github.com/minimaxir/big-list-of-naughty-strings/blo...

My personal favorite is this one though

  "If you're reading this, you've been in a coma for almost 20 years now. We're trying a new technique. We don't know where this message will end up in your dream, but we hope it works. Please wake up, we miss you.",


what a great github repo.

I enjoyed:

    #   Strings that may occur on IRC clients that make security products freak out
    DCC SEND STARTKEYLOGGER 0 0 0
and everything under:

    # Innocuous strings which may be blocked by profanity filters (https://en.wikipedia.org/wiki/Scunthorpe_problem)


Found some GitHub issues [1] with something similar: an enterprise firewall blocking a repo because it contained the string "arglebargleglopglyf" [2] in some tests.

The text was flagged as malicious because of its presence in the repo github.com/wireghoul/htshells [3]. However, the whole point of the word in the htshells repo is that it's an invalid command that breaks Apache, so it could have been almost any random string.

[1] https://github.com/search?q=arglebargleglopglyf&type=issues

[2] https://mume.org/help/arglebargle

[3] https://github.com/wireghoul/htshells/blob/master/dos/apache...


This one from link 3 caught my eye:

    "".__class__.__mro__[2].__subclasses__()[40]("/etc/passwd").read()
Looks to be a Python 2 specific way of trying to read a file in a sneaky way. I say Python 2 specific because Python 3 strings only have 2 supertypes now, so __mro__[2] is out of range, but __mro__[1] is 'object', and I'm guessing they were going for a file like class, but right now object.__subclasses__()[40] points at "mappingproxy".

And the only subclasses of object I can find with a read classmethod are these:

    109 <class 'codecs.StreamReaderWriter'>
    110 <class 'codecs.StreamRecoder'>
Found with:

    for i, x in enumerate("".__class__.__mro__[1].__subclasses__()):
        if "read" in dir(x):
            print(str(i) + " " + str(x))


FWIW it’s looking for the `file` class which does not exist anymore an was a direct subclass of object: `open` now creates a TextIOWrapper<BufferedReader<TextIO>>.

You can still reach TextIO though _IOBase, in python 3.9 it’s object’s 101st subclass, then 0, then 0.

In 3.8 it’s 99, 0, 0.


This is pretty fascinating!

It's a shame subclass numbers do change from version to version, so there is no "one-size-catch-all" injection string.

Someone in this thread posted a solution with next() that iterates over subclasses to find the correct one. But an injection with spaces won't work as well when injected in jinja2 (something that original injection accomplishes in python2).


> It's a shame subclass numbers do change from version to version, so there is no "one-size-catch-all" injection string.

Yeah going through subclasses is not trivial, but that's the way exploits work really. And usually once you find a target the version is going to be reliable.

An other big injection sources in Python is when modules are available in the evaluation context, that's way more risky than exposing classes, functions, and objects due to Python's transitive import nature: anything you import becomes an attribute of your module, meaning if your module is visible so are its module. And very often there's a point at which `sys` is imported somewhere within transitive reach. Once `sys` is available they're out of the interpreter.


    $ python2
    >>> "".__class__.__mro__[2].__subclasses__()[40]("/etc/passwd").read()
    'root:x:0:0:root:/root:/bin/bash\n ...
yep, there it is.


Python 3 `system` example

  next(sub for c in "".__class__.__mro__ for sub in c.__subclasses__() if '__init__' in dir(sub) and '__globals__' in dir(sub.__init__) and 'system' in sub.__init__.__globals__).__init__.__globals__['system']('cat /etc/passwd')


I'm surprised about Scunthorpe, if any name with a profanity substring would trigger the filter I'd have thought this issue would be more common than Scunthorpe.


Yeah, such a filter would be a mbuttive problem due to all the mimanures.


This could exists a fun puzzle genre, replacing substforbidds of words with synonyms. Perhaps with muloanle dened itefractionns. Could even have replacementb agrosb lade boundaries.


My favorite of these was the discussion section of a popular lefty blog where any mention of "socialism" or "socialist" was routed to moderation and had to be unblocked manually. Not because of profanity, but a particularly pernicious spam problem at the time.


I am surprised that the EICAR test string is not here:

    X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
https://en.wikipedia.org/wiki/EICAR_test_file


From README —

Likewise, please do not send pull requests which compromise manual usability of the file. This includes the [EICAR test string](https://en.wikipedia.org/wiki/EICAR_test_file), which can cause the file to be flagged by antivirus scanners, and files which alter the encoding of `blns.txt`. Also, do not send a null character (U+0000) string, as it [changes the file format on GitHub to binary](http://stackoverflow.com/a/19723302) and renders it unreadable in pull requests. Finally, when adding or removing a string please update all files when you perform a pull request.


+++ATH0

That one normally only worked reliably if you could figure out some way of introducing a short delay between the +++ and the ATH. There may have been some crap modems that didn't require the delay, but that wasn't the spec.

(the 0 was not necessary, btw, as 0 is the default for the ATH command)


Why is "nop" your personal favorite? I don't get it.


I have to confess this freaked me out well played!


I can get why testing for "Jimmy Clitheroe" and "Horniman Museum", but can't make a reason for "Linda Callahan".


In February 2006, Linda Callahan was initially prevented from registering her name with Yahoo! as an e-mail address as it contained the substring Allah. Yahoo! later reversed the ban.

https://en.wikipedia.org/wiki/Scunthorpe_problem


that was unexpected. but I guess I should have expected it given how much islam is iconoclastic. it's probably muslims protesting the use of the name of allah in email addresses that caused it (alternate explanation: the word was raising too much false positives in Xkeyscore)


1. That’s not what iconoclastic means. If you actually care about iconoclasm in Islam, the Saudi government has unfortunately destroyed almost 90% of Muslim holy sites without a word from other countries.

2. The Second Commandment Christians follow is “Thou shalt not take the name of the Lord thy God in vain” and I can tell you a lot of Christians follow that.

3. You have it backwards. It’s not Muslims’ fear of the name of God, it’s Yahoo’s fear of literally just the Arabic word for “God.”


It's their fear of people constructing offensive names (<deity>-sucks, etc.) that will spark another Charlie Hebdo incident.


Perhaps it's 'allah'?


Any idea why "Lightwater Country Park" is on the list?


Twat, maybe?


"twat" substring probably.


Most of the things in this list I can see what they're testing for but your favorite stumps me. Can you give me a hint? Thanks.


I assume it's a joke to test if reality is a simulation or a dream.


When Elon Musk was talking about reality vs simulation, I couldn't help but think he's onto something

If you were going to play a simulation game, you would not be a normal participant. You would not play a normal person, you would play the successful guy at the top launching spaceships and making money.

So - the chances of Elon Musk being in a simulation are very high compared to normal people.


I had the exact opposite take, that him talking about that is (further) evidence he's a bit off his rocker. Not that being off his rocker is necessarily a problem. In fact, it might be the only reason he's doing interesting stuff with his wealth instead of just trying to turn it into more wealth for the sake of a high score.


The entities running the simulation knew you would think like that, which is why they put Elon Musk there so the simulated you thinks "nah, if I were simulated my life would be awesome".


But I'm not Elon Musk, and I'm not super rich and successful, so therefore I am real and he is as well and not a simulation. I don't know what things are like your simulation, however.


His mistake in his assumption is all games are the same. Sometimes I play a game that is kind of 'boring' and relaxing like a bit of solitaire. Other times I play a fragfest with 15 other people looking to take my head off. Sometimes I intend to play something exciting and end up playing max/min stats. Sometimes I just suck at the game...


I would assume most simulations would be more academic/business related, like how we have tools that simulate a wind tunnel, rather than for entertainment purposes


That makes complete sense and it went right over my head. Thanks


Reminds me of the first time my DnD group tried out roll20.net. The chat box allows players to type things like "/roll 1d6" or "/roll 2d12" to simulate rolling dice (in these cases 1 6-sided die and 2 12-sided dice). I quickly tried "/roll 1dNaN", crashed the chat, and we went back to physical dice for the rest of the session.


Also fun is `/roll 9999999999d2` which IIRC Roll20 blocks but many virtual tabletops just hang on.


Perl 5 has a taint mode built into the language. If enabled, it forces the developer to untaint every bit of user-controllable data (by running it through a pattern match) before doing anything dangerous with it. I can't believe that this isn't a standard feature in all languages.


That’s because it’s both more annoying than warranted and completely insufficient, even ignoring that “running through pattern match” is not great (see: parse, don’t validate).

* “untainting” is highly context-specific, that something was cleaned up for HTML does nothing for SQL

* which also means that the boundary is incorrect, just because you’re getting something out of storage does not mean it’s safe for anything (not even storing it back)


You are right, the programmer still has to think. But with builtin taint it's harder to overlook something because it forces you to consider everything.

I have a friend who is a long time C++ developer. Every time we discuss C++ he tells me that memory errors can be easily avoided in C++ if you have a certain level of competency. Someone still developed Rust because of this issue and it is popular.


You can do this with any language with a type system by wrapping reads with a 'Tainted' type.

ie:

fn safe_read(path: str) -> Tainted<IO> { Tainted(unsafe_read(path)) }

And then you can apply functions to Tainted<IO> or whatever type that convert it into something structured / validated.

So long as your functions only take in those validated types (ie: you do not write functions that take str) you can ensure that new reads will fail to typcheck without first parsing.

To be honest this is how most programs I see work anyways, at least in typed languages. Few work directly on strings. But they do it naturally, without enforcement - so like, a function might take a 'str', but the 'str' passed in was parsed into a wrapping structure already.


You can do some actually useful stuff with a real type system, instead of replicating Perl's stupidity.

For example, you can convert the input into a safe representation, suitable for the exact place you'll be using the string, instead of "validating" it.


Exactly. What you need is not a “tainted” type and assume everything that’s not it is safe. That’s not the case. Html-escaping a string does not make it safe for SQL or whatever.

What you need is a safe type for each use case, and ways to convert values to that (or mark them as that depending on your TS).


This is what I was referring to at the end of my post:

> Few work directly on strings. But they do it naturally, without enforcement - so like, a function might take a 'str', but the 'str' passed in was parsed into a wrapping structure already.

ie: Most programs in typed languages already do what you're saying - they parse the data directly into a structure, and therefor they validate some aspects of it naturally, so even when you do see a 'str' in typed code it's very often already gone through some sort of parsing phase.


Hum... Your sibling has a nice explanation of what I was trying to say.

You don't sanitize user input when parsing, you do it on usage. "Robert'; drop table users--" is a perfectly fine name and you shouldn't mangle it on you data fields.


We're saying the same thing.


Taint mode is a terrible misfeature and modern code should not use it. It's one of those things that makes people think "it's annoying and makes me do additional work so surely it improves the security". No, it doesn't.

For those who are unware what taint mode exactly is: when it's enabled, a string may have a hidden "tainted" flag. Passing a tainted string to many (but not all) built-in functions will result in an exception. Many built-ins return tainted strings, additionally all strings in @ARGV (cli parameters) and %ENV (env variables) are tainted. You can get an untainted string by accessing a tainted string through a regex capture group ($1, $2 etc.). Taint mode is global, so it affects everything, including third party modules.

You may ask "how do I even validate my environmental variables? What's the difference between valid and invalid PATH?". Well, you can't. That's why programs using taint mode are often littered with code like:

    my($untainted) = $foo =~ /^(.*)$/ 
The worst thing is that you never know whether a function from a third party (CPAN?) module will return a tainted string or not. It may differ between platforms! For example, File::Spec is sometimes returning tainted strings on unixes, but not on Windows (or the other way around, I'm not sure!). In practice that means you will have to run your program, check if it throws an exception, and if it does, you have to use the above no-op "validation" regex.

Well, that assumes that the said third party code works in taint mode. If it wasn't tested with it, it's possible that it won't work at all and there's nothing you can do about it.


Popey, I think I see you around these parts from time to time. If you're reading this:

You and Martin Wimpress are constant sources of inspiration for me and many others, who want to keep on discovering the world of FOSS software. Thanks for the many hours of entertainment in your podcasts and the help you provide to people on the forums and mailing lists. Excellent work!


Thank you. That's very kind of you to say.


Kids, a story from the Old Days, c.1981.

DRI (since absorbed into McGraw Hill) had EPS, an advanced economic/financial analysis scripting language, provided via timesharing (mainframes on the East Coast of the USA). I was a customer support programmer in San Francisco the day that they rolled out a powerful arrays feature on the testing mainframe (no clients, but lots of real work going on).

One could put anything as an element inside an array. So I tried:

    X=array(123, "abc")
    Y=Array(X)
and it worked. You know where this is going, right?

    i=loop from 1 to 1000
    x(i+1) = array (xi)
It crashed the mainframe at i=67, if memory serves.

So far, so good, excusable as "clever programmer tests the limits". And then I ran it again.

Same result, plus, 2 minutes later, a call for me from my friend Kevin, who was a lead developer on EPS in DRI HQ: "Chris, what the ^&^&^!@@ are you doing?"


I once named a fat32 USB pendrive ЯBK

This data must have corrupted some firmware section or so because the drive was gone afterwards.

Couldn't format, couldn't dd, anything.

Fits the category, I think. Only less funny :( Well, depends on the observer :)


I like it :)


Reminds me of a QA buddy. One day at the crosswalk, he decided to, I believe, hold the button. For the whole wait. He apparently broke the entire intersection’s lights and a repair crew came out. He was unable to reproduce it after.


Maybe it was working as intended? If a crosswalk button becomes stuck, it's reasonable to switch the intersection to "safety mode" so people can still cross the street.


Reminds me of https://www.wired.com/2015/11/null/

Numerous cases of encoding out of band data as a special case of in-band data.


My browser is set to not accept cookies from sites I don't have a relationship with, because … well, frankly, I don't know why the rest of y'all still let shady people on the Internet use your hard disk for their ad tracking.

So, that link (like so many), just plain doesn't work. It just loads a white screen.

There are many sites like this on the Internet. Twitter waffles between working and "Ooops! something went wrong!". I've sent patches to Rust's documentation to fix it to work with cookies disabled. (But it won't persist any settings you change, of course, which is what it uses the cookies for.)

That link has the double the fun: not only is the page completely white, it's logging errors to the JS console as fast as it can.


> But it won't persist any settings you change, of course, which is what it uses the cookies for.

Seems dumb not to use local storage (possibly with a cookies fallback though I wouldn’t even bother).

You can also disable LS, but I don’t know that that’s possible on a per-site basis so it’s an unlikely configuration (and you can fallback same as if cookies were disabled, probably).


Sorry, I was generalizing a bit for simplicity. I believe the Rust docs do use localStorage. The "cookies" settings in Chrome/Firefox control all the browser's various forms of storage: FF calls it "Cookies and site data". If you disable it, the site can't set cookies or localStorage. (As being able to set localStorage would defeat the point of what the setting is really attempting to control.)

> but I don’t know that that’s possible on a per-site basis so it’s an unlikely configuration

Yeah, if you were using the "Block" settings in "Cookies and site data", for example, that would disable LS on a per-site basis. (I essentially block-by-default, and have an exception of allowed sites in that FF setting.)

(I also have an extension that I wrote to produce a fake, good-for-this-page-load-only localStorage, with two settings: hold the values in RAM, or just /dev/null them. Most sites do not handle localStorage denying access, and essentially crash, so it's handy there, as that works around the poor programming on those sites.)


You convinced me. I just disabled cookies globally so I have to allow them individually for every website.


You should try using the direction changing unicode code points like 0x202E in your name. That will probably break many things.


I found something like this when managed to accidentally break the Drupal.org git parser by adding emojis to a commit message. It wasn't on purpose, I was just on a 2015 emoji kick.

That said, it did uncover a bug that obviously hadn't been tested for which gave the infra team more impetus to solve utf8mb4 support for the database.

https://www.drupal.org/project/infrastructure/issues/2531884 https://github.com/govCMS/govCMS7/commit/ab5da5fd0cb3d7e1d33...


I laughed out loud when he said he held down the print screen key until it started repeating. That’s exactly the kind of thing a user would do but a developer would never think of!


That’s a fun article!

I’m big on Quality. Comes from 27 years, working for a corporation that is pretty much synonymous with the word.

“Abuse testing” is very important, and almost impossible to automate. A good monkey tester will have a “sense” of where to go, as this chap indicates.

I worked with an enormous team of people like this, and they would regularly find things like sync bugs (he talks about one). Those take a lot of work (and RSI risk) to find.


Somebody should write a book like the original "Programming Pearls" for weird hacks and anecdotes like the author's. Pretty fun stuff.


If I'm honest my reason for submitting this was a hope that any traction it received would be met with other odd stories in the comments. So yeah, there is some market for your book suggestion.


Thanks for submitting it!


> A year or so ago, at a company sprint I gave a lightning talk in which I wanted to make the tiniest possible snap

What is a snap in this context?


I would assume it's a Snap package [1], the new (?) packaging system that was somewhat controversial when included in Ubuntu 20.

[1] https://snapcraft.io/


Thanks for identifying this missing info. I've added a link to explain in the article.


A mean of software distribution, created by Canonical

https://snapcraft.io/


> When snaps are uploaded, there are security and sanity checks which run against the snap. My use of the (probably reserved) word null seemed to fool the backend checks script, live on stage, in front of my peers. That’s the way to end a lightning talk, I think!

The backend crashes and instead of getting an error message you are forced to watch a spinner forever?


Not anymore. You're welcome! :D


If you want to find a bug in your software, make a live demonstration.


I experienced the Thunderbird bug mentioned in the article first-hand and freaked out for a moment. "Where does that damn turtle come from?" And had to search quite a bit until I recognized that it was part of the subject. Unfortunately, I no longer have a screenshot of it.



If these kinds of errors occur there might very well be an SQL injection going on.


Declarative programming is something that helps deal with weird edge cases like this right? I'm learning Elixir currently and the subtle semantics around the 'traditional' assignment operator (=) are quite cool. It means you can ditch most if blocks and provide a list of pattern-matchable functions (matched on their arity) to define logic which helps deal with edge cases a little better.

I'm curious how such a declarative paradigm _may_ help with the wacky usage of software old mate Mr. Null endeavours in. No one paradigm solves all problems I feel but perhaps some allow us to harvest some low hanging fruit for free?


I don’t see how a string “null” would break anything besides a very stupidly written program (e.g., one which tries to eval() the input) or an ordinary program written in a very stupid language (e.g., one which tries to coerce strings to other types—PHP, is that you?).

I’m a big fan of pattern matching (especially statically verified pattern matching so sorry elixir), but I don’t see how it would help here.


Because there was this period of time where the prevailing wisdom was "be liberal in what you accept, and conservative in what you emit" to enable computer systems to handle a wider array of cases. Some people still adhere to it.

I don't. If I want to extend a system that's currently in English to also accept Arabic or Farsi / Persian, you better believe I'm sanitizing that input carefully. Otherwise I'm opening up my application to zero width non-joiners[0] and all sorts of random fingerprinting for my English speaking users. I know it's a pain, but I'd rather just do it right.

[0] https://www.zachaysan.com/zero


Yes, implicit type casting is the work of the devil. Even the limited case in C is a popular source of trouble, never mind implicitly casting between strings and other types. Similar case with SQL's "NULL".


Even PHP finds null==“null” to be false.


And at least you get to strongly compare by adding an extra = (e.g., ===, !==). I just wish the extra character (i.e., the not-as-default versions) weakened instead, like a tilde.

But yes, all strings are truthy. Except an empty string! And maybe some little-used nullish characters? Doubtful, but...


> But yes, all strings are truthy. Except an empty string! And maybe some little-used nullish characters? Doubtful, but...

Or "0"


But not in PHP 8


http://sandbox.onlinephpfunctions.com/code/d6df91536669985a6...

Set it to PHP 8 and click "Execute code," then set it to PHP 7 and Execute again. They give the same result, which is that "0" is falsy in both versions.


They changed a lot of the weirder string conversion stuff in PHP 8, but I'm pretty sure "0" is still falsy.


Nice to hear about issues that got actually fixed. I tend to find some edge case issues regularly too, but usually most of bug reports end up in limbo.


I understand Gary Null has had some experiences.


Fun story: At work, we were learning how to use a new browser tool that was put together at the last minute. Plenty of people were having trouble logging in, so the instructor had one of the people share their screen so he could help. The user's profile showed: "name: null", which prompted someone to observe "I see your name is null, too". I interrupted with "yes, it is a family name". Good laughs were had.


Weird hacks, love it!


I know finding bugs is undeniably a good thing but I can't help but feel someone as obviously bright as this should be making more things

It's like the people who spent a lot of their time finding ever more pedantic inaccuracies and continuity errors in films.

The mute LED on your thinkpad sometimes goes out of sync? fascinating


Since the mute LED is a user-facing security feature, it is in fact a pretty serious bug if users can't rely on it to be correct. Witness the wide variety of incidents over the last year where people ended up in serious trouble because they thought they were muted but weren't.

I feel like of all the examples in the blog post, this was the one with by far the biggest potential for actual harm to people.


What gives you the impression the author doesn't also make things?


I know they do

But it just reminds me of this: https://youtu.be/2Z8pgV74_Hw?t=148


Not everyone feels it's their moral right to be as productive/effective as possible.


This could allow a malicious party to trick the user into thinking it's microphone is not recording while it actually is.

If this action can be performed using the hardware button it is likely that it can also be performed software-wise, it would be a nice addition for malicious software such as malware.


Breaking things is far easier than making things. While this type of poking around might feel fun, it will mostly result in low value work to fix something very few (if any) actual users would experience.


Unless they have cats.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: