I haven't, like, done a study, but I have done a lot of security code review, and my instinct is that the basic issues here are straightforward: bugs are hard to spot when they're subtle (ie, they involve a lot of interactions with other systems outside the one under review) and when they involve a large amount of context (like object lifecycle bugs).
Which, another issue with this study: I get why Chromium OS is an interesting data set. But I don't think it's a particularly representative case study. Another factor in effectiveness of security review: how much of the whole system you can fit in your head at once.
This is why I think pure functions as much as possible, i.e. a part of the functional programming mindset, is so important for making code reviewable. Yes, you can make nice abstractions in OOP, but at least in my experience OOP with it's stateful objects interacting makes you need to know a lot more about the system than pure functions.
And yes, it's not a panacea, and large allocations may take too long to copy, which is why the next best thing is mostly functional, most techniques don't work in every case.
Uncontexualised security is bad security. Frequency of bugs is not a good indicator of severity.
That reminds me about an old saying on complexity: a complex system has no obvious bugs, a simple system obviously has no bugs.
Although a lot of people seem to reach for abstraction to "solve" complexity, I don't think that's the solution either --- because it can hide bugs too: https://news.ycombinator.com/item?id=25857729
IMHO abstraction is really a tool of last resort, for when you can't reduce the overall complexity of the system any further.
It's a consistent issue that's brought up... the majority of bugs are found in large releases. The smaller the release, the less likely you'll create a bug (this is where you get some methodologies moving away from waterfall).
I wonder if the future of security practices will just be "deliver small pieces of functionality / changes"
There was a period for me when I 'solved' this code review problem the way many of us solved class participation in grade school - it's not a competition, it's a learning environment. It doesn't matter to the class if you know the answer, but it's really awkward if nobody knows the answer. So you say the answer in your head, wait to see if anybody else volunteers that information, and then put your hand up if they don't. Or you count to five. (I suspect based on hands and grades, at least a couple of my classmates knew all the answers but were too shy to say anything, and this also appears to be true in code reviews.)
Code reviews are typically but not always asynchronous, so you can choose to wait for someone else to point out the obvious stuff and then just mention the things nobody else saw. The danger there is if an argument starts about one of these issues and distracts from others. Like a real conversation, you can run out of time for it before the issue you cared about gets sufficient airtime.
Are you finding a lot of little things in the PR itself? Many of the usual suspects can and should be automated away (linting, codegen, test coverage reports, static analysis, etc). Others can easily be added to a checklist before PRs are submitted for review, eg "Have you updated the relevant documentation".
Usually newer employees will get more comments on their PRs, as they're not as familiar with the system and might fall into more traps that others know about from experience. This can be mitigated somewhat by pairing/mentoring.
Sure some people just aren't going to cut it on your team, but most engineers when corrected on an individual coding mistake won't repeatedly make the same mistake, in my experience. If you just never mention the mistake, they'll keep making it in every PR and be completely unaware. Sometimes it helps to communicate out of band (ie, not in "official" comments on the PR itself) if there are some low hanging gotchas you want to make the engineer aware of.
I wonder if we overestimate how sensitive people are in this case. if someone raises ten objections on my review and makes it clear what I have to do to get their approval, I'm happy. I'll get right to work on fixing it. if this happens in front of a group (esp involving skip-levels / managers of other teams), I take it as a signal that I should have run the change by a teammate or my direct manager before involving so many people.
the only thing I actually dislike is when people comment about vague concerns that leave it unclear whether I actually have to address them in that change to get their approval. sometimes this causes a circular waiting deadlock with the other reviewers, which is especially frustrating.
In an environment that places stress on those who underperform, anything that calls attention to performance problems will be hated. Anyone that points out those problems will be disliked. Even the best people will fall to this stress.
In a more relaxed environment that isn't constantly counting points towards raises and promotions, people can relax and not worry about the problems that are raised. However, not everyone can manage this. Some people have just had too many jobs of the previous type, or other experiences like that. They just can't adjust.
IMO, it also means they have a much harder time fixing their own problems because they can't tolerate other people pointing out those problems.
As for code reviews, I always call it like I see it. It's my job to make sure the code will work and be secure. I'm not going to let things slip by that violate that, no matter how much I want that person to be happy. And I'll still point out matters of style as well, though I usually won't insist they be corrected, unless the person has repeatedly had the same problem. (My people are generally great, though, and will fix even my suggestions. So it's generally not a thing.)
You'd probably hate that last bit of my code reviews. If we were on a team, my advice would be to simply discuss that part with me in real time. We could decide together if it was worth the effort to make those changes. Async communications like PRs and emails simply aren't fit for that kind of discussion.
Seems to me that this might be the problem rather than the code review itself!
> We could decide together if it was worth the effort to make those changes. Async communications like PRs and emails simply aren't fit for that kind of discussion.
Indeed. I've definitely found myself falling into a pattern of failing the review for serious issues with comments on the PR, but just approving the review and slacking the comments for small issues.
Back in my day (hold on while I go grab my cane...) software developers would just flap their lips and sound would come out at each other.
Nowadays there seems to be this underlying assumption that actually talking to each other is a waste of time, that it should all be documented with no curation, and should always be in PR's, et al.
I'm all for documenting decisions, but I feel like it's not too much effort to swing by someone's office and have a discussion with them. At that point, maybe you don't even need to document if the updates are small issues. Is it really so all-encompassingly important?
But maybe I'm just old. Most of the time I'd rather talk to a human when I call into tech support too (unless it's something obviously automated like making a payment).
Well there is a modern trend that's supposed to help with this too: open plan offices. No need to swing by their office, they're already right there (this has indeed been my working environment for most of my career).
It's all a bit different at the moment of course, but my team regularly video calls each other to discuss things that would take too long over text.
This is so missing the point of development.
What do developers do? They write code and they ship code. (we could say that they deliver features or business value rather than code but that's not the point).
Code review is effectively preventing them to work. A bunch of comments on a PR, with no approval in sight, is very effectively blocking any progress on their work and making their project late.
Of course they're going to hate that.
Why must developers write and ship code? To deliver a certain goal / feature / business value. If all they're willing to do is writing and shipping code without considering how will that software work in the future, what problems it might have, how hard it will be to make changes to it, they're not being responsible enough about their work.
Software in general (except for throwaway code) is not supposed to work only NOW — it is expected to work for a long time, with an acceptable degree of quality, and with changing requirements.
If, to build a wall, it takes longer than just laying some bricks, it is because a solid wall serves many more purposes than some layers of bricks.
Code review is one tool in a large set to get closer to perfect. However if the business doesn't care to get there why are you doing it at all?
Of course if you care at all about quality code review is one of the better bang for buck tools and so maybe you need to change your culture around peer correction so it can be used. The choice is your companies.
Of course the desired level of quality will be different at each company/team, and each team should adjust for its own reality.
In the same team there can be times for throwaway code with low standards, and times for high quality code with strict standards. It is part of the developers' responsibility to tell them apart and make contextualised choices.
I once saw a 3-4 line fix turn into a rewrite of an entire page that took several full-time days to finish because the developer didn't like the original fix. I've seen a former DBA turned software dev insist on using nulls throughout the codebase rather than empty strings and ignore the argument that's not considered good practice as a software developer because they themselves are "now software developers".
And the rebuttal is always the same, "well you're just doing it wrong". Maybe, but it gets done wrong more often than right.
I'm not against code review per se, but the number of times I've seen silly and arbitrary changes because of one developer's sensibilities is way too high.
In short your very words prove that you are wrong and code review is about getting to perfection. (along with types, formal proofs, tests, static analysis, and a few dozen other tools that are often used).
Developers are supposed to be producing working software, not just code.
Code reviews help with that. They don't detract from it.
These people didn't last because that wasn't their job. Their job was to improve the product, and tickets were a way to explicitly telling them what their priorities are. Instead, they ended up creating more problems than they solved. Some of those problems we're still dealing with to this day.
Coding is something they do to improve the product, but it's not the only thing. They were expected to report other bugs they found, or just fix them, depending on the situation. And I'm not even talking "write up a full bug report with repro steps." Just a comment in the ticket would have been fine.
Code reviews are the least time-consuming thing that we do, and even if they took a whole day, they can start working on a new ticket until then.
The conflict I was talking about before is the difference between the goals of the company and the goals of the employee. The company wants a better product, in line with their vision for their product. The employee wants money and to advance their career. These aren't fundamentally incompatible, but some people's attitudes towards their career actually end up hampering their own goals.
The idea that "coders just code" is one of those ideas. That might work for junior developers, but mid and senior developers need a lot more skills.
You could talk to the person directly and point out those issues without dragging them in front of the group.
When there's a LOT wrong it usually indicates that their mental model isn't quite correct, and that is gonna be hard to correct without a conversation anyway.
The tradeoff is that now you're constantly reviewing code where you have no idea how it fits into the rest of the design, which I find a lot less useful. It's good for catching things that a linter probably should have caught anyways, but it's really bad for critiquing how concurrency is done or other higher level constructs.
There's also the ones who've learned that correct answers aren't welcome, and consequently don't even bother to try.
0: Read, pedantic, excessively verbose, and/or makes it clear that the teacher/project manager is full of shit.
Wish I'd learned that lesson. Instead I got this idea that people will like me because I'm smart, so show everyone how smart you are, which is so not true.
Also, many attacks today target security protocols rather than, for example, cracking passwords.
Another huge impediment to creating secure systems is agile, which discourages up-front design. The average sprint duration of two weeks is too short for meaningful security testing. Testing is not feasible with nightly builds or pre-release sanity checks, either.
Product owners or customer representatives are too often non-technical stakeholders that have the authority to make technical design decisions. While never a good idea, this has bad consequences on architecture and particularly on security.
Looking for things you can spot means you read the code as if it was novel, and you discuss only the things that stand out or that seem to be important.
Many people interpret "doing the code review better" as reading the novel more slowly and spotting more things that stand out.
Imagine bridge engineering plans were reviewed this way. Somebody would look at the numbers and react only if for some reason the numbers looked incorrect.
Asked to do the review more thoroughly -- they would just stare at the plans for longer, but never actually performed the calculations, looked up the tables, consulted standards, etc.
"Why do no records get dropped in this step?"
Some programmers will have an argument resembling a logical proof: "No records get dropped because for every record, we call a method to write the record into the outgoing artifact, and if that call fails, the entire stage aborts. The flow of control never reaches the point where the outgoing artifact is added to the database for delivery. The lack of a successful delivery record for the batch triggers an alert after an hour and ensures the system will keep reprocessing the batch until it succeeds, or until the batch gets manually flagged as bad."
Other programmers will say something like: "Well, all the records get passed in, and we write them all to the artifact. We had some bugs in this code, but they haven't been hard to fix, and anyway, if something goes wrong, the process is retried later."
This has been a big thing on my mind recently because I've been asked to do a design review for a system created by some high-SLOC-productivity people who built a complex system with a sophisticated-looking architecture but who keep giving me answers like the second one. There's a bunch of cool tech but no coherent explanation of why they they believe the system will work reliably. It's just, well, there have been some bugs, and of course it doesn't work when there are bugs, but we're fixing them.
Until you have an argument for why a system works, there's no coherent way to understand why it doesn't work, and what the effort will be to achieve different aspects of reliability. It's just "bugs." With such an argument, you can target specific aspects of reliability instead of playing the "it'll work when the bugs are gone" eternal game of whack-a-mole.
Granted, this isn't very rigorous, but 99% of the time, it's the closest you'll get to "proof" in application code.
I've recently found myself in the middle of one such team, except without the high productivity part. Sometimes I think I'm the only sane person in a sea of mad, and at other times I wonder if maybe I'm just dumber than I think and don't understand.
I've designed and built systems that get deployed around the world, but if you look at my code and architectures, they're relatively unsophisticated. I just don't see the need for sophistication unless it's solving a specific problem, and I find these highly sophisticated designs just make everything harder (for me anyway).
Ideally, of course, everything should be hidden behind leak-tight abstractions.
Unfortunately, that is rarely the case.
I recently had a developer reuse a function that generates IDs for some other type of ID.
That function that generates IDs happened to use OS entropy pool directly. It was used before, once every startup of the application so it was deemed OK.
But now with high demand for IDs that is no longer acceptable.
I noticed it because I followed the code to see how the function actually generates IDs. That would not happen in web based tool because there simply isn't convenient way to follow the references from the new code.
That is just a simple example, I get things like that every other day.
And I often get frustrated because I see you call foo(), but I need to see how foo works with the given arguments.
Though to be honest, most of the time I catch my self playing grammar checker, if the style isn't violated and you didn't make any of the dozens of checklist items you pass (I work in C++, so I often catch people writing c++98 code when there is a c++14 construct that is better)
I've seen more bugs than I like OK'd because the single line change looked innocuous in the web UI review tool.
Tools like Pull Panda are worse still, for game-ification of PRs, with leader boards for most PR reviews and fastest reviews. :(
Just as rewarding developers for lines of code written, rewarding them to mark off as many lines of PR as quickly as possible is just so totally moronic I can't begin to comprehend who and for what reason would think that may be a good idea.
Did the author manually modify 30k LoC or is this result of some automated process?
If it is for example a refactoring run from IDE (which is completely fine in itself), then the better way to handle it is to separate that single huge change into its own commit that explains how the change was generated.
For example, I insist that there are three categories of changes: reformats, refactors and functional changes and that commits can only contain single type of change and that it must be immediately apparent what type of change it is. So if you see 2k LoC reformat you skim over it and don't expect functional changes to be hiding.
On the other hand if the code is actual manual modifications it means somebody has spent a huge amount of work on it and it is not unreasonable that Code Review will take proportionally huge amount of work.
My longest Code Review took me a week and was for much smaller change in LoC than that.
There are probably better ways to handle a change like that. My preferred way is to pair up and assume that if two people actually worked on it, it is as good as a code review.
And for every one you notice, you miss three. Because the system is too complex. This is why we need to program defensively.
It was used before, once every startup of the application so it was deemed OK.
And that's where someone went wrong. There should have been a test, or runtime failure, or even better (but more difficult) a compilation failure if someone misuses something in that way.
> And that's where someone went wrong. There should have been a test, or runtime failure, or even better (but more difficult) a compilation failure if someone misuses something in that way.
I don't subscribe to the idea of creating so much additional code for a relatively simple constraint.
Code should be dense. Dense means a lot of important constraints implemented in small amount of code.
Of course this is not yet guarantee that the code is readable and easy to maintain, but as a general rule, a codebase that has 5 times the amount of code to implement same functionality will more likely than not be less maintainable. That is because it is just more difficult to work with that much code.
> There should have been (...) or even better (but more difficult) a compilation failure if someone misuses (...)
In this particular case what I suggested was to move the original function with generic name from a generic utility package to a specific module package with specific name.
See? Compilation failure, not difficult at all.
Sorry, but I can't believe how wrong this is. "Dense" is a really poor metric for code quality. "Simple" is a much better metric. And one aspect of "simple" is "not interwoven".
So the question here is, could the new code I'm advocating be implemented in a way that is simple and not interwoven? That is, not adding complexity to any of the rest of the code base. It is nothing more than a safer leaf.
And the answer is, in this case, yes. This does not increase the complexity of the code base. The amount of code added is commensurate with the need to make sure it's not a foot gun.
Clever naming, or visibility restriction, is a poor solution. You may prevent the function from being called explicitly more than once. But you don't prevent future refactors from leading to the function that calls that thing being called more than once.
The crucial bit of humility that is necessary is to realize that there's no way you can foresee how these problems will come up in the future. The only thing you can do is be defensive.
I agree wholeheartedly and I wish more developers felt this way.
I hate opening new projects and seeing the potential for null reference exceptions everywhere when you could instead just check the constraints directly and give a more meaningful response.
My review flow now is to check out the branch under review, read it in IDE with git annotate on, and do all my actual review work there so I can see the full context. Then add line-by-line comments on Github where I can. Comments relating to older parts of the system that should change in tandem with the current PR have to go in as "general" comments, with file/line number included manually.
You can't see that the code they changed also lives in another function that they missed, or than in fact the whole thing should be factored out into a function and fixed once.
I think most of us who understand code reviews know about this problem and push back against it.
> Surprisingly, the likelihood of missing a vulnerability during code review increased with a developer’s reviewing experience.
Hold up. I need more information about 'experience'
> reviewer’s coding experience and reviewer’s reviewing experience,
still no criteria...
> Another threat is the measure we take to calculate the developer’s experience. We can interpret the term “experience”in many ways. And in many ways, measuring of experience will be complex. For example, we cannot calculate the amount of contribution of a developer to other projects. Although a different experience measure may produce different results,we believe our interpretation of experience is reasonable as that reflects the amount of familiarity with current project.
Wrong "experience" and still no criteria. If I may proceed from a notion that they mean the same class of experience even though they didn't define either of them until almost the conclusions, experience is here measured as time on Google projects.
New people are trying to establish themselves. Finding a bug is 'counting coup' and raises their status. They also bring in outside notions of good code reviews that may be more diverse or rich than those inside the group.
An 'experienced' reviewer may be suffering from any or all of the following:
* under-training (outside experience/echo chamber)
* over-training (confirmation bias)
* over-extension (too many reviews, competing priorities)
* lack of familiarity
* lack of motivation
It's almost always possible to rewritte shitty performing algo, but it's almost impossible to undo leak's / hack's damage.