Hacker News new | past | comments | ask | show | jobs | submit login
Nobody's just reading your code (akkartik.name)
352 points by chauhankiran on Feb 27, 2018 | hide | past | favorite | 181 comments

One of the things that sets a good programmer apart is the willingness to fearlessly dig into someone else's code. I've heard it said before that you're responsible for every line of code you ship to your users. It follows that you shouldn't treat your dependencies as black boxes. Getting a stack trace that's 3 or 4 levels deep in Django/React/<insert library here>? Dig under the covers and understand what's happening. Include that information in a bug report, and better yet, open a PR offering to fix it.

You'll learn a whole lot and be more effective if you let your curiosity expand beyond code you/your team has written.

> ... willingness to fearlessly dig into someone else's code.

I've done this many times. The problem is that it takes a lot of time, and there's no way you can dig through more than a fraction of a large codebase. You gotta pick your battles.

Pragmatism wins the day, for sure. I'll happily read someone else's code (e.g., an open-source GitHub repository I rely on) if it's necessary to troubleshoot issues or implement a feature. There needs to be a reason (i.e., motivation) to read code.

On github and the node/JS ecosystem, the way this will play out:

  1. Spend hours digging through layers and layers of crufty JS.   
  2. Find the issue (and the package responsible for the issue).   
  3. Find out there is a Github Issue for the issue.   
  4. Find out there is a pull request for the issue.   
  5. Find out the pull request is sitting in position #35 of #128 pull requests going back to 2015.   
  6. Go binge drinking.

7. (next morning) Fork the project, apply your PR and any others you like, and use your fork in your codebase.

8. (optional) Thank the creators and maintainers for the time and effort they put into it.

This is not always an optimal solution. Because by doing this you have essentially became a maintainer of a dependency you initially wanted to 'just' use.

This is sadly why I keep my dependencies to a minimum.

After coding for almost 20 years now, full time, I have never regretted not using a dependency. But I have regretted using one multiple times.

(the regret is like a walk of shame during the "separation and cleanup" phase at the end, which makes me question the "got work done faster" phase at the beginning....)

In 20 years you've never started work on something and after a bit said damn it I think I'll use that library that other guy developed?

If you've ever switched out dependency X for dependency Y I suppose you regretted not using Y to begin with.

If you've ever stopped working on your own solution to a problem and instead used a solution provided in a library didn't you regret not just using the library from the beginning?

>In 20 years you've never started work on something and after a bit said damn it I think I'll use that library that other guy developed?

nope, I am just saying what I said, I never regretted not using one, and have regretted investing in a few that later caused more headaches than they were worth.

When we switched from inhouse code to a dependency, we never regretted doing it on our own to start with because there was so much insight gained from this and many other side benefits (like direct control, intuitive understanding of how our code worked, etc...).

But when you have a dependency you add that you later have to work around, you simply can't fix in the same way can your own code, and you just hate the mess in a totally different way.

When it's your own code, you can fix anything. And replace it a piece at a time if need be. With a dependency, there's usually catches, hacks or work arounds, or conflicts built up over years that finally have come to a head. And it sucks to fix and in many cases if I had waited even a little while, or did more research at the time, I would have picked a different dependency or none at all.

There are a lot of dependencies we use, I don't think you can run a business properly these days without them. They just are not a part of our core systems anymore. We use them for tertiary systems and addons, things that a replaceable. Then our core systems can't be hijacked by stuff like the latest NPM debacle.

Yeah, this also matches my (+10 year) experience as a professionnal dev ... with one exception, though: the "maintainance-can-of-worm" packages. These are packages which, by design, can never be considered as finished, and periodically need to be updated to stay relevant in your application.

There are three reasons for this:

1) Those packages implement an ever-growing pile of tricks/heuristics to convincingly solve problems from the domain. They include: physics engines, SMT solvers, compilers/jitters/optimizers, computational geometry packages, video encoders ...

2) Those packages implement an unification layer over an ever-growing/evolving set of underlying APIs/protocols/formats. They include: SDL, ncurses, curl, ImageMagick ...

(these are not to be confused with "bug-factories" packages, which might not solve a complex problem, but still require constant updating to fix the current bugs, and benefit from the new ones).

> There are three reasons for this:

> 1) [...]

> 2) [...]

Where is the third reason?

Classic off-by-one error.

> In 20 years you've never started work on something and after a bit said damn it I think I'll use that library that other guy developed?

I can't speak for him and I don't know about you, but while deciding whether I would start to work on something, I'm actually deciding if I'm capable of doing it and looking at all the pieces needed to do it. I'm not starting work until I know I have all the pieces in place. Sometimes I may doubt that some piece is missing or might be hard to do, so I make a very small prototype of that system only (this usually takes a couple of hours tops). Once I make a decision and start working, I never go back.

Just like him, I have never regretted doing it myself, but have regretted using another developer's library many times.

Sometimes yes, sometimes no. I've abandoned writing something to use a dependency in place that got the job done, but could have a number of improvements.

I enjoy Clojure dependencies. They are usually more succinct than other languages.

one of the reasons that I switched from using node.js to ringo.js

I have only been coding for 1.5 years and this kind of comment is validating what I worry is a very bad habit. I hate using dependencies unless they have reached a certain level of legend in the community.

>...unless they have reached a certain level of legend

That is a similar tactic I use now for anything core. I know I commented that we don't have any dependencies in our core systems, but as an afterthought, this isn't entirely accurate. We do, but as you said they are "legendary", and don't change a whole lot anymore.

I think you are on the right track. :)

Edit: Keep business practicality in mind, if using a 3rd party library means you feed your family or are productive for the business, you need to weigh these considerations carefully.

It's likely you need to use a dependency to get your work done, and then later when you are more profitable, you can clean it out if need be. Maybe that is how I should have worded my first comment in retrospect.

Some of this conversation isn't as exclusive and the tone it has.

Many of these dependancies are open source. Nothing prevents you from copy/pasting the code and then maintaining it yourself. Then you alleviate the "hard to change" problem and you can vet it thoroughly inside your own team.

This is a normal feeling. But one weakness of this approach for javascript is that it's easier to become popular because you have a much larger base of developers.

That makes sense. I use Python the for most part. I feel like I have seen this happen with old libraries where they used to be very popular and maintained then became abandoned a few years ago.

That can be okay. Depending on the nature of the library it might legitimately be "done."

Personally, I would rather see this than a bug fix being committed every week. That makes me question how many more bugs there are and how much I'm going to have to babysit updating the dependency.

That's a great point. Hmmm... I wasn't considering that. Oops. I'm glad I mentioned it! Thanks!

It's ASIC vs FPGAs...

Well, that's kind of the point - (almost) everybody wants to just use the dependencies, but someone must maintain them for the community to work.

That's a last resort, but it's important to remember that the option is on the table.

Then consider ditching this dependency (either by switching to a different one or with your own code) if the maintainers are not active enough.

7. Find out that the sole maintainer of the repo is a "well-respected influencer" in the JS community, and actually has a blog post stating that GitHub should remove the "issues" feature and instead only allow contributions via Pull Requests.

And then there's the problem that you can't hardly convince the maintainer that something is an issue unless they have the issue on their OS and configuration.

In my last 4 years of working in node, I've had this issue once. I think you people need to pick better dependencies.

webpack. My dependency was webpack.

It was a dependency of webpack that was broken (you can pick your bride, but you can't pick your bride's family). And the package owner, in a Github Issue said "not my problem, this package should never be used with webpack and should only be used with Browserify." Looking at the date, that Issue discussion occurred in 2015. Which makes sense. But a WONTFIX on a package that obviously is broken (I looked at the code), and is still an issue in 2018 is clearly a problem. The node world is a circular firing squad where no one wants to take responsibility for fixing their shit because each person thinks they are doing the right thing. When, thanks to the mess of npm, no one really knows who the "user" of their package is and you can't just dictate that so-and-so shouldn't be using your package that way. Because someone will use your package that way.

WONTFIX, issues just suddenly "closed" without reason (why, dammit??), or pull request queues 100+ long and obviously unmaintained are way too common for me. I've lost track of the number of times this has occurred.

This is the only reason (at least for now) that makes me dig into a library code. And I always got some benefit from it.

I think we often treat libraries as voodoo blackboxes, but they're more often thatn not created by normal people with great skills. Sometimes it's a great way to learn a way to code, or even better, to understand the behaviour of a library.

I found myself doing it more often when the IDE itself downloaded and linked the sources. (Lazy, I know).

> I found myself doing it more often when the IDE itself downloaded and linked the sources. (Lazy, I know).

Same here. It's about trivial inconveniences. I sometimes can be bothered to download and read through the codebase of a dependency, but most of the times I end up in third-party code happen when it's just a single jump-to-definition keypress away - the same keypress (M-.) which I use all the time for my code. That seamless browsing really blurs the difference.

Absolutely, but I find in many cases, people are just too lazy to want to dig into another piece of code, and just give up prematurely.

Or they look at the tradeoff of:

1. I could dive into the code, find the bug, submit the report and hope the developer accepts the patch. Hopefully this bug isn't there because it is required by some other feature the developer actually cares about.

2. That module wasn't doing that much, I can just redo it by hand and dump the dependency.

3. I could hunt around for a different module that does basically the same thing and use it instead. This might be necessary if the module is doing a lot of work, like an ASN.1 parser or something.

This. I'm not afraid of it, but it can take a lot of time. It's also extremely hard to estimate so gets avoided in scrum.

The thing is, "digging into someone else's code" is a skill, so you can get better at it, and it takes less time as you get better. It makes sense to invest some time at this skill.

> It's also extremely hard to estimate so gets avoided in scrum.

So, one more time Scrum seems to be unable to support actual development needs.

The inability to justify time spent in Scrum doesn't make the time not get spent, it just gets hidden, defeating the whole point. I abhor Scrum, but I would also argue that in this case the development task weight just gets inflated for "unknowns" on the task.Instead of a 5 maybe it's an 8, or whatever.

I sometimes wonder what fully-honest Scrum would look like.

I mean... 3 points: dealing with ticketing system. 5 points: looking into bugs/issues which will become new tickets. 5 points: taking questions and talking process on other people's tickets.

The list goes on. I like agile, and I don't hate Scrum, but it's unpleasantly easy to end up running a planning/scheduling process which is completely distinct from one's actual schedule.

Tickets based development can not work very well.

We need tickets (hell, I have to write them for myself on personal projects, otherwise I get locked), but we need something else too. I also wonder how we can formalize that something else - time-boxing it is harmful, full attention to it is harmful, having to explain it too much is harmful.

I have a strong suspicion that the reason points and time-boxing are so bad is that they're trying to weld two distinct processes together.

Tickets are great; they bound tasks, provide a clearinghouse for all relevant information on an issue, and help monitor and prioritize needed work.

The instinct to turn the "what work exists at what priority?" tool into a "let's organize our work" tool is totally understandable given all that, but I think it's a serious mistake. Tickets define tasks, but their times are highly variable (timeboxing is ugly, storypointed "investigation" tickets are uglier) and they exclude a whole bunch of non-ticket work: communication, bureaucracy, unstructured inspection, and so on.

If you make tickets the basic unit of scheduling, you're suddenly building "buffer" tickets and padding ticket durations to account for "something always comes up". It's Goodhart's law in action; your metric - storypoints - becomes a target and stops being an accurate metric.

The inability to justify time makes people avoid doing the work. Some people will strongly avoid it, other people will just weakly avoid it, but nobody will be more inclined to do it than if the time was taken into account.

If your methodology makes people avoid doing important work, it's a bad methodology.

My experience too. Velocity trumps everything else. IMHO, Scrum is like Communism. Okay in theory, but never implemented properly. And it's always the people at the bottom who suffer.

Or it could be handled as a Spike.

It's not perfect, but it captures the idea of uncertainty by having a time-boxed investigation, the end result of which is an estimate of how long it will take to fix.

https://www.leadingagile.com/2014/04/dont-estimate-spikes/ https://www.scrumalliance.org/community/articles/2013/march/...

Picking your battles may in fact be (one of) the things that'd set one apart. You can get bogged down in the weeds of dependencies. You can also end up with kludgy half-solutions to persistent problems by treating them like black boxes.

But, before you you can pick battles you need to treat such things as possibilities, something you've done before an feel comfortable doing.

Also, when you're using 3rd party proprietary code, you don't get the option of fixing it. So delving into it can feel silly.

But there are often times when you can trick it into doing what it’s supposed to do.

And a really targeted bug report can often get a bug fixed pretty quickly. Several times I’ve been working at a place where the goddamned IP lawyers were so power mad that I could not file a patch for a bug even though I knew how to fix it, but filing a bug with an exact line number and description of what’s wrong with it can get it fixed anyway.

Filing a razor sharp bug report seems to be a rare skill, but if you can learn it your quality of life is better.

10 points for tricking it. often times its not completely broken, but may have been written with different assumptions, be failing on an avoidable edge case, or have an issue with a dependency.

knowing and deciding you can't reasonably fix it is a much better position to be in than shrugging and hoping it gets fixed in a later release, trying lots of versions or trying to refactor around it.

bsimpson still raised a good opportunity to do this: when you get a stack trace. Another idea is whenever you have no idea how something works... Or when the documentation for a certain piece does it no justice, it sure woudln't hurt to try and look under the hood to see what it's doing, sometimes you might even find comments that do a better job than the docs.

A nice middle ground is to step through with the debugger when you're in your own code and see what other code it's calling that isn't yours. You don't/can't read all of it as you said. But perhaps knowing the bits your interacting with helps a lot.

Along those lines are there any tools that are really good for that? I'm looking for a program that can do some things like IDA does for assembly where there's the ability to label blocks/variables and then see where it's used elsewhere etc.

I always get an uneasy feeling when this is not (practically) possible. Whenever I add a dependency to a project I will at least skim through the source to get an idea of what it is I’m adding, and it pleases me a lot when a library is well written and self contained enough that this does not mostly leave me confused or overwhelmed.

There’s a similar reason why I have such mixed feelings about some build tools and compilers, especially in the JS world - the output is often hideous and so unlike what a human might write. I know you can say that what the output looks like doesn’t really matter, as long as it works as it should. But it just makes it so much harder to feel confident that it actually does.

The output of gcc and clang probably don't look much like hand-written machine code either, though. Or, for that matter, the output of V8 and SpiderMonkey. You can't avoid the compilers :)

Of course! I’m not saying it’s rational. If I had experience hand-writing machine code maybe I would feel the same about that...

I've done bits of assembly, but I've also studied compilers. They felt less magical to me when I learned some of how they work.

Then again, I'm primarily a C++ developer, so I'm used to the output of my build tools being non-trivial to understand.

What frustrates me about the JS world is that all too often, what you get from NPM is the output, not the input. Maven might not be universally loved, but you do at least get to navigate through the original Java source and stand _some_ chance of finding what you're looking for.

I don't follow, the code in node_modules is readable and allows to navigate through the original sources. Is it not always the case?

I think they're saying that code on NPM may be written with a bunch of experimental Babel features or in some niche language like Elm, and then translated to vanilla JS. In those cases, you might find code that's been machine translated or minified in node_modules, not the original source.

Exactly :).

That's one reason rollup is better than webpack; the output looks almost hand rolled.

> as long as it works as it should.

Maybe confirmation bias, but I can’t think of a single library where this stayed true for the entire life of a project. Software is written by humans.

“To err is human. To really fuck up requires the aid of a computer.”

That's one thing that bothers me about modern Electron-style apps which depend on an entire web rendering engine -- actually digging through all the dependencies to track down a bug can be almost impossible. For example, I was hitting a bug in an Electron app a while back and wanted to track it down. I got the app building with a locally-built version of Electron, but it became clear the real bug involved Chromium's loader code. My poor 4-year-old laptop ran out of disk space before I could even download the whole libchromiumcontent repository, let alone build it!

I don’t think Library designers have adapted yet to a library heavy world. We still write them like we are using dozens but we have hundreds.

Every library demands a fraction of our attention greater than its relative fraction of the code. Complex calling conventions, deep call stacks.

Why am I even using a single purpose library with multiple levels of abstraction in it? At this point a library should BE the abstraction, not contain them. I have to trace code through my code into yours into your dependencies. Just stop already. Keep It Simple Stupid.

I don’t think Library designers have adapted yet to a library heavy world. We still write them like we are using dozens but we have hundreds.

The other day I was debugging an app that - no joke - pulled in 600M of libraries that were all so it could use one file parsing function that could have been re-implemented in <100 lines. Crazy!

Take all my upvotes and offer a newsletter for subscription please.

The lack of digging is a frustration I have with a lot of new guys who join the team. For example, they might create something new and tack on some new, untested piece of code without reading whether something already exists within the same codebase. Even in a startup environment where documentation is scarce, it's worthwhile to ask the question of, "Did someone else here think of this problem here before me? What did they do?"

It comes down to unknown unknowns. Should I poll my supervisor/co-worker about every problem to see if there is already a solution? As the new guy, I have no idea who or what I should poll people about.

The second problem is that people will keep stuff hidden. Have that sweet script that creates 1000 users for testing? You might share it with the team, you might also think its too trivial to bother sharing.

That being said, I can see how it can be frustrating. We have packages upon packages of existing functionality, languishing somewhere in source control, used maybe only once. In the meantime, someone is going through the cycle, not only of ignoring what someone put on gitub, but what has already been tested and qa'd by their own company.

> Should I poll my supervisor/co-worker about every problem to see if there is already a solution? As the new guy, I have no idea who or what I should poll people about.

It helps to have a company culture that encourages openness and asking questions. My current company goes so far as to have a Slack channel, #askanything, where folks can ask anything. We'd much rather you ask early in the development process than pursue a path that will lead to duplication.

Don't let survivorship and confirmation bias cloud your judgment.

What sets many successful people apart is not any one thing, but that they were successful. Many are willing to dive into other people's code in methods that are far deeper than I can comprehend, and yet they have failed at whatever task they were going about.

So, yes. By and large, you shouldn't stop at your dependency boundaries. However, your job is to get results. For most of us, even if the dependency is bad, you are better off changing your code. Why? Because you can more quickly get a change in your codebase than you can in the dependency. And heaven help you if you think it is worth forking.

This is not to say don't update the base place. By all means, please do. But don't wait for that change to land before you do something on your end, too. Even if that means not using your actual fix.

Forking is not that bad, unless upstream has a lot of activity yet takes a long time to accept patches. Many projects are mostly dormant, so keeping a patchset and occasionally rebasing is easy enough.

We use git-aggregator[1] for easily applying pending PRs/MRs over the upstream branch, and for the most part, conflicts are rare.

[1] https://pypi.python.org/pypi/git-aggregator

The thing I find with python/ruby ecosystems - is that the versions are generally pinned, so if you do fork a change for your specific need; you'll end up using it for the project. If the change isn't merged - then at least you may use it yourself.

These are both valid choices. I had not meant it to be that it is not possible.

However, in both cases you now have a new line of work. It is common in my experience to find folks are behind in their first job, so adding a new one is dangerous.

Forking is perfectly doable. Rust project forks LLVM, adds patches, and forward ports them when upgrading LLVM version. It's manageable.

If you are a large enough team, definitely. And not just in people, but in momentum.

That is, yes, it can be done. It is more work, though. Account for it.

No mention of time in the article or in your comment.

With infinite time we could fix a lot of things.

Of course you can expand this to be unreasonable - you could also make a strawman argument about how you need to understand all the compilers, OS APIs, and chip architectures that support your product. Obviously, that's not what I'm saying.

I'm making a few assumptions: the code you're depending on was written by people like you (e.g. other open source volunteers with expertise in the same language as you); the limitation you're encountering is problematic for your users, product, or business; and that you'd rather have it work correctly than design around it (e.g. remove the affected feature). If those are all true, yeah, it's probably worth a week or two to figure out what's wrong and fix it (especially since you'll gain comfort, familiarity, and efficiency in that codebase over time - it might even inspire other enhancements).

I realize that nobody has infinite time and I trust you to make reasonable decisions about what to spend where. Still, I implore you to not let yourselves be afraid of Other People's Code. It's probably not as scary as you think it is; after all, it was written by other people like you.

> It's probably not as scary as you think it is; after all, it was written by other people like you.

Most interesting code has been written by people smarter than me (the kernel, compositing window manager, network stack, web rendering engine, virtual machine, etc.), and while reading that code probable isn't intractable, it would take me decades of effort to understand. At the same time, I think I can be an effective developer by building on top of those technologies (i.e., being a great plumber). And that's OK.

>it's probably worth a week or two to figure out what's wrong and fix it

Ha! I'm convinced that people who say stuff like this haven't worked where their paycheck comes from actual paying customers, as opposed to VC dollars or subsidies from other parts of the business's revenue.

> Still, I implore you to not let yourselves be afraid of Other People's Code.

Wtf? Nobody said they were and what is with the Condescending Proper Noun?

>It's probably not as scary as you think it is

Nobody said it was.

Sure, but I think many programmers underestimate the investment value of that time.

Diving into codebases is a skill that gets stronger with use, such that you can eventually do it radically faster. That makes a much larger set of problems economically practical to fix.

> I've heard it said before that you're responsible for every line of code you ship to your users.

This is a great way of thinking about it. I wish that more people took it to heart when choosing third-party dependencies to adopt, especially in the Node.js ecosystem where people are tragically cavalier about adding literally hundreds of under-scrutinized transitive dependencies to their projects.

> willingness to fearlessly dig into someone else's code.

That's kinda illusion. If you start digging into some really complex code, like distributed processing core, meta-programming framework, low-level optimizations, it can take you years to understand all that is going on. Often it's more economical just to rewrite it, even if the previous system was far better than what you'll produce. Whether companies like it or not, projects live with original team members and once they move on, it's getting complicated to maintain or extend them.

The inherent danger in rewriting dependencies always seems to be greatly underestimated, though. Especially when this is your first time tackling the problem space in earnest.

* Getting the solution 80% of the way there is often pretty fast (which ends up deceiving everyone depending on it "it's almost done!")

* You are nearly guaranteed to encounter a bunch of bugs that were already solved in the original dependency.

* Due to the previous points, there's a big chance that your initial design had a few fatal flaws, and requires significant refactoring to address (making the project take longer, and if not careful making it much harder to understand & maintain)

Rewriting dependencies can definitely be worth it from a code ownership/maintenance point of view; but more often than not, you'll end up creating a cycle where the new version of the dependency is just as obtuse as the previous—and once you leave, the same problem all over again.

If a dependency is well documented, has a clear architecture, and follows consistent coding practices, it will stand the test of time (including loss of authors/maintainers). Most people/companies aren't willing to put the time/effort into that level of polish.

To be sure.

There's also a trade off to doing it too much as the time it can take to really go deep can effectively paralyze the development.

You need to be able to properly use abstraction layers and be able to treat code as black boxes when needed, you only need sufficient knowledge to work with a system. Of course we shouldn't stop until we reach that level and too often developers give up. But it's also common, especially for very skilled or perfectionist developers, to go too far.

I've only found time to do this with CherryPy really, mostly cause every time someone asks for help they say if something is missing in the documentation you could always read the source which is also commented and readable. It really is approachable and not too big. I never have time to go through the source of every single library but you raise a good time to do so: stack traces. Although when doing front-end JS code it's a bit less approachable if your files are already minified (client hands source like this) and the error becomes a lot less useful.

"...open a PR offering to fix it..."

Perhaps I've been asking to wrong people (at the wrong time) - forgive me - but (as dar as I can tell) git has no easy / reasonable way to repo'tize a (vendor/) dependency. That means you have to bend over backwards (read: make too much extra effort) to patch such a library local and then eventually PR that change.

This feels like a fairly significant flaw in the current OSS "ecosystem." It encourages the wrong behaviour, and discourages the right one.

Or is there something I'm just not understanding?

Use git submodules. git submodules interact tolerably neatly with GitHub's PR functionality; if you make changes to your submodule, then it's straightforward to turn that into a PR that you can submit to the upstream repo. And since your updates are a git branch like any other, if the upstream repo won't accept your changes then you can just keep doing fetch+merge/rebase to keep on top of the changes at their end.

Of course, this is git, so the UX is a total disaster, and it's very easy to get yourself in a pickle. But Stockholm syndrome will set in after a while and you'll begin to find it tolerable.

Yes. But I've occasionally seen situations where vendor is not gitignore'd. It's tracked like everything else. Submodules doesn't help then.


Assuming I've understood correctly, if you want the easy workflow then you just have to have the 3rd party code in its own repo and add that repo as a submodule. Best if that repo is one you look after, so that you can commit local changes to it if you like.

If you've not done this from day one, and you've got changes made to the 3rd party code, that's a bit of a pain, but provided you tracked (or can figure out) the version you were working from you can solve this manually. It's a bit of donkey work though.

(You have to decide from the start whether you're going to just put each dependency in the repo, or whether you're going to keep each dependency as its own submodule. I do think git works a lot better if you keep each 3rd party dependency as its own submodule: easier to patch your repo's copy, easy to make PR from your patches, easier to keep your patches merged in on top of later updates, and so on. But you have to plan for this a bit, and I won't deny that the submodule UX is pretty awful.)

It's not really so much that _git_ is lacking the functionality -- it's actually much more possible to do reasonable vendoring with git than with (say) subversion, as you can use submodules to point at your fork.

To my mind the issue is more with tooling: you don't typically want to vendor your code, most of the time you want to use a published module. And the process for publishing your fork is too complex to be worthwhile.

Also, the work to keep the fork itself up-to-date when you realise it's necessary is minimal, but by switching away from the original you lose tooling support for telling you when you need to make that update.

I understand there are workarounds, but they are just that. I mean, for example, I'm working away but hit a bug in a dependency. Now i have to fork, submodule, etc.

I'm not expecting frictionless per se. But it just feels like there's more overhead than there needs to be.

Also, it's possible (in rare cases) the vendor folder is not .gitignore'd. Now what? Submodules is out.

Again, given the nature of most modern dev (i.e., dependencies) vit feels underprepared.

Re: "...but by switching away from the original you lose tooling support for telling you when you need to make that update..."

Yes. But doesn't this actually support the idea (mentioned above) not to treat your dependencies like black boxes? That is, no pain no gain? Yeah, perhaps some sort of simple notification your fork needs an update. Just the same, ALL code should, in theory, be treated as your own. You wouldn't blindly merge/commit a colleague's work, so why shouldn't your dependencies be semi thoroughly reviewed as well?

Are we too busy to do the right thing(s)?

Here are approaches I use:

a) Make changes in the minified source to understand the problem, then go make changes in the original; or:

b) git clone dependency.git

cd dependency


yarn link

cd project

yarn link dependency

make changes in dependency

yarn run build

test my app

repeat until fixed

clean up, commit, and PR

I prefer B, but I'll do A first if the build step is annoying or slow.

>Include that information in a bug report, and better yet, open a PR offering to fix it.

In my experience, never happens on permissively licensed projects. "Contribute time and code to open source on company time? Are you mad? Fork it and keep that as our competitive advantage!" Nevermind how dumb this approach actually is, it's what management thinks. "GPL? Oh, we have to? Oh, well, I guess if we have to."

It's why (in theory) I love Go; as opposed to the JS hell as mentioned by another commenter, it has a lot more emphasis on consistency and, in a way, predictability, so there's less surprises and distractions (e.g. code style) when you browse other people's code. Also no "black box" dependencies (usually), you pull every lib from source and have it on your machine.

Isn't it the purpose of blackboxes, to abstract me from doing this? I'm here to get the work done and hopefully having to debug only code of my colleagues, why would I want to waste time on debugging something that should have been battle tested by myriads of people before me?

Argument like this almost justifies leaky abstractions as something to be praised.

Isn’t the purpose of open source also to empower you, the developer using the library, to fix the bug you find so that everyone may benefit?

I can only say Amen to this. If you ship it, you either accept responsibility as far as you can, or you're "that" person who redirects you another department because it's not technically their job.

Plus, as you mentioned the opportunity for learning is immense.

I agree with the spirit but in practice more often than not that's a complete waste of time and energy, unless you're trying to figure out the cause of a specific bug when you've already excluded your code as its source.

Do you consider the OS a "library"? Who checks the OS's code before shipping a program? What about the programming language's code? Do you code in assembly to reduce the amount of dependencies?

Really good programmer is imo set apart by ability to decide when to dig into others code and when not.

I just write that the library's function at hand is unreliable and can't be used this way.

Reading code just for fun was a thing in the early and mid years of UNIX (eg., 7th edition or System V). People eagerly passed around faint 10th generation photocopies of Lion's printout of and commentary[1] on the UNIX source code. The annual USENIX conference had a popular short course in which they went through the entire UNIX kernel line by line. Why was that a thing back then and not now? Certainly UNIX was an amazing piece of work, but there are amazing works today -- the modern browser for example.

I think that the difference is that the totality of the UNIX kernel was comprehensible for a single mind. The UNIX kernel in 1983 was less than 20,000 lines of code, almost all in C, and more than 75% was not machine-dependent.[2] The amazing software of today, something worthy of reading for fun, is literally millions of lines of code, written in multiple languages you don't know, and evolving so fast that whatever you read might be completely different a few months later. There's no joy in knowing a tiny part of that. It would be like reading 10 pages from the middle of a novel, a novel so big that you know you'll never finish it.

[1] https://en.wikipedia.org/wiki/Lions%27_Commentary_on_UNIX_6t...

[2] https://en.wikipedia.org/wiki/History_of_Unix#1980s

I learned yesterday that Ken Thompson used to host readings of Unix source code at Berkeley where they would go through the code line-by-line (https://www.salon.com/2000/05/16/chapter_2_part_one/): >And from the very beginning, Unix benefited from a communal vibe that spread directly from its creators, Ritchie and Thompson. > > Fabry recalls grasping the hidden wonders of Unix one week in 1975 when Thompson conducted a "reading" of Unix over several successive nights. > >"The first meeting of the West Coast Unix User's Group had about 12 or 15 people," recalls Fabry, a mild man, now 60 years old, who clearly delights in his 25-year-old memories. "We all sat around in Cory Hall and Ken Thompson read code with us. We went through the kernel* line by line in a series of evening meetings; he just explained what everything did ... It was wonderful."

This is still done today, just with other software.

I've read blog posts that go through the entire redux javascript library line by line, or review an entire chunk of React.

Even just yesterday here on HN was a submission that just contained shaders from Wolfenstein.

There's a category of articles out there nowadays that do "$library in $num lines of code", reducing popular libraries or frameworks down to their core - filtering out all the edge cases, so to speak. I've seen them for Angular, React, Redux, etc. I think that a lot of frameworks and libraries would do good to have a "core" codebase somewhere, something pure but unoptimized and without edge cases that are easily digestable, that explain the core idea of it.

Do you mind linking me to the React one? I was unable to find it, but maybe my Google-fu is just not good today.

I wonder if part of the reason this worked so well is that the Unix kernel was written in a fairly straightfoward top down style? It's a lot harder to do a code reading if you're hitting different abstractions on every other line and having to make digressions to track down exactly what that abstraction is doing.

I suspect in a lot of cases reading the Lions commentaries was not purely for fun. The original audience (Lions' students) were doing it as part of their university course. Many of the contemporary readers would have been reading it because they needed to understand, fix bugs or add features to their unix systems (I think one of the introductions in the reprint book is from somebody who was in this position). But you're right I'm sure that the the key was the combination of (a) real world and genuinely useful code that the readers were using (b) size such that you could realistically understand all of it (c) excellent accompanying code commentary.

I read through the Lions book in the late 1990s when the reprint came out, essentially "just for fun" -- it remains probably the only sizeable piece of code I've read for fun.

Is there any material around still from the USENIX course that went through UNIX line by line?

It looks like the Lions commentary also does that to a certain extent as well. There's even a version of the book online [0].

[0] http://www.lemis.com/grog/Documentation/Lions/index.php

I'm a younger developer and I had never heard of Lions' Commentary. It sounds amazing and I'll order a copy, but is there a more updated version or a similar book out there that may be more relevant?

Reading code always felt awkward, slow and ineffective. The cognitive load is huge, you have to compute some things while memorizing others and at the same time keep track of flows. Thats why I always have pen and paper while diving into a new big code base, you simply can't keep it in your head. But something we forget often is... the goal of reading code is almost always to understand a system or some part of it. And reading code for me is a terrible way to do it.

Code is just one level of abstraction and one way of seeing things. We should have multiple ways of "reading" a system depending on what level we want to understand it. No, I don't think that rushed, out of date "system documentation" you wrote will do. I still think this is not solved in a satisfying way. We could find inspiration from things that works well, just imagine how google maps lets you look at the world at many different levels, from continents down to street level. Also it provides different views of each level, street maps are useful for tourists, terrain maps for hikers and satellite images for someone who want to study vegetation. How you will "read" a system depends on what you are trying to understand.

> But something we forget often is... the goal of reading code is almost always to understand a system or some part of it. And reading code for me is a terrible way to do it.

Good point. A couple famous quotes that speak to this[0]:

Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious.

-- Fred Brooks, The Mythical Man-Month

Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

-- Linus Torvalds

[0] https://news.ycombinator.com/item?id=9487819

Assuming I can, I always study code with a debugger and various simplification techniques (say throw in `abort()` or raise an exception to get a stack trace or throw in `printf()` on interesting objects). Usually I just start it up with `gdb ./program` and step through and see where it takes me. If the code is complex enough, I'll even start ripping out parts to see where it's required (often it's not in whatever simple case you're interested in).

If I can't use a debugger (i.e. my initial assumption fails), I usually run for the hills...

As the author of the post, just wanted to reply and say I'm totally open to the notion that the solutions I proposed will not solve the problem. I'm much more sure about the problem than I am about the solution.

A google maps zoom system for code sounds awesome, although I have no idea how that might work.

Brackets inline editor comes to mind. Just replace CSS definitions with function definitions (I think it might already do this for ?)

The zooming functionality you're thinking about probably wouldn't be very practical. (Zooming in down to opcode or out to high level framework calls.)

But simply being able to inspect definition inline would do.

Solid upvote and the map analogy is great. I think this is a very important point. We often think of a 'program' as a flat blog of text, but we really should have tools to explore various projections of both static and dynamic constructs that form the program. We should be able to view the high level system view of the running program, and query various interconnections.

I find it useful to write comments about what's happening in the code as I am trying to figure it out.

One of the reasons I love golang* is how the entire source code for the standard library is just a few easy clicks away from the documentation on how to use it.

As an example: When I want to do something and the interfaces I'm seeing provide friction (XML parsing not /quite/ handling normal, slightly incorrect, HTML) I can get a better idea of what the library is doing behind the scenes. I got an example of how to use other exposed interfaces of the library as a tool kit for my own iteration (which added some state tracking and cleanup of that mess) instead of relying on it to decode everything and dump it back for me.

Other parts of the library are similar, maybe it does 90% of what your use case needs, and you can extend those interfaces with your own library to add the corner cases that are required in your specific use case.

*(even if it can be tedious at times because it's sort of a mid-level language; it abstracts some REALLY bread and butter things that probably can't be lived without, but it doesn't shield you from true complexity pitfalls)

Rust's library documentation has this feature as well; each item has a [src] link that takes you straight to the code being documented; see for instance [0], and you'll find the link to the right side of the heading. This even works for crates not in the standard library because it's actually a feature provided by rustdoc, the standard documentation generator; for example, you can get to the source for functions in serde (the most popular Rust serialization library) directly from its documentation [1].

[0] https://doc.rust-lang.org/std/vec/struct.Vec.html#method.tru...

[1] https://docs.serde.rs/serde_json/ser/fn.to_string_pretty.htm...

Haskell does this well on hackage, which I personally use for all the documentation - there's a source link next to every function. Also great if you only want one function from a library rather than taking the whole dependency.

Being able to browse the Go source so easily and clearly is a fantastic way to both learn and grow in understanding of the language. It's definitely helped me as I've learned and written in Go over the years.

Smalltalk did it first and I have always been able to do it with C++, Java and Python as well.

> One of the reasons I love golang* is how the entire source code for the standard library is just a few easy clicks away from the documentation on how to use it.

Not just a few clicks. In my editor, I just navigate into /usr/lib/go/src/<package-name> which is where my distribution (Arch) puts the stdlib sources. For example, /usr/lib/go/src/io/pipe.go for the implementation of io.Pipe() which I was looking at the other day.

Rust does this too. Since docs are generated from sources every docs page has a [src] button that has the context the docs were generated from, which means the function source. Super easy and intuitive to switch between the two.

Its good that most newer / modern languages have learned how valuable both easy documentation and pairing the docs with the code are.

This is a bit western centric maybe, because I see Chinese developers reading tons of code, to the point that I receive an incredible amount of Redis PRs about conceptual bugs that can never happen in practice, since some Chinese developer is reading the code and doing the math in her/his head.

That's interesting, but also not so surprising. China has had decades of directing primary engineering/manufacturing efforts towards reverse engineering western products, or executing specific designs specified from western clients. The attitude of looking to someone else's engineering to learn from it is probably something the rest of us could do a bit more of.

I wish one of these pull requesters would comment on this post! I'd love to hear how they're going about reading your code and whether they're actually using Redis or just reading it for enjoyment.

I've heard similar things anecdotally from a developer that went to IIT in India. Had to read code first, with clarity, before they were able to sit down and code.

Any case you could share a link to one of these, if possible? A matter of curiosity.

First PR I saw on redis GH: https://github.com/antirez/redis/pull/4714

That same user also made https://github.com/antirez/redis/pull/4685

This may be? https://github.com/antirez/redis/pull/4622

This was apparently critical, but also kind of an edgecase: https://github.com/antirez/redis/pull/4568

& one more from someone else: https://github.com/antirez/redis/pull/4568

Probably soloestoy recently (but he also does a lot of Real world Redis) and Sun He in the past hold the records, but there are several...

I love reading code written in Elm.

I always dive in and have a look. The reason I love it is that Elm is quite restrictive, there is a way of doing things and you can't deviate too much. Therefore reading someone else's code is usually pretty easy.

In addition in Elm you can clone and run an Elm package and see the examples, do some "time travelling debugging" all within 30 seconds :-), again because there is one way of building and debugging things. No "Oh no browserify! I normally use webpack" or "WTF all that global npm shit I need to install" moment.

Just once per computer:

   npm install elm
Then do this

   git clone ...
And open your browser to localhost:8080

I'd probably say Elm is more Zen than Python!

We read code for all kinds of purposes other than to make an intended change. Like

How the heck does this behavior (good or bad, expected or not) come about?

Is this correct behavior assured in every case?

What are all the possible values of these poorly documented configuration parameters or other inputs?

Will this apparently working operation break under concurrency?

Why is this so slow for these inputs, or under these conditions? / Why is this so unexpectedly fast?

Is this doing nasty covert things I don't want, in addition to the overt functionality that I want?

Is this using a particular library or OS feature to do something obvious, or did they roll their own?

How will this scale in time, memory use or whatever for certain large inputs that are too impractical to actually supply just to answer this question?


Hi kazinator, thanks for your comment (I'm the post's author). When I originally wrote the post, I actually had a whole section about "reading for questions", which discussed how the best way to read without modifying a program is to read to answer focused questions you have about the codebase. It kind of got lost when I shifted focus to active interaction versus passive exploration, but I agree that reading with questions in mind is far superior to just reading the code to understand it generally.

I’d like to spend more time reading great code. I think I’d get something out of it.

Possibly more important, though, I’ve always had a fearlessness to dive into a codebase. I’ve built a little thing called Cronitor and recently I was giving a talk on building our first server agent in Go. Long story short, there were a few people surprised that I studied a couple crond implementations to figure out some important details for proper monitoring and I was surprised they were surprised. I don’t write much c or any c++ but as long as you approach it with some fearlessness and a simple plan you can learn from unfamiliar codebases. Spend up to an hour figuring out how a project is structure (hopefully a lot less) and then just skim/grep to find a spot probably close to what you’re looking for. Spread out from there.

The best way to learn a new language is to build something useful and learn what you need along the way. I think studying code can be done the same way.

Another way to approach this: review other people's code at work (e.g. code in another language or for another product). Even if you don't know the syntax of a language, you can probably follow enough of it to learn something interesting. A commit is going to be a much more easily-digestible unit than a whole codebase, and you can go ask the author to explain anything that you don't understand. Plus, over time you might understand enough to help out on that project, or help integrate it with yours.

I agree with you - there are many details that are either missing from the documentation or the books on the topic. Actually reading the code usually provides the necessary _practical_ structures used in the real projects.

One of the counter-examples had been the implementation of egalitarian paxos for me: when I had first read the paper I've thought that I didn't understand a lot. By reading throught the code written by the actual author I had realised that the paper by itself described something quite far away from a practically working prototype; So it wasn't my understanding of the paper in the first place that was the problem.

How about some good _annotated_ code? :)

http://www.kohala.com/start/tcpipiv2.html - 15K line implementation of TCP/IP stack from BSD. Fantastic book, reads like a novel.

http://www.pbrt.org/ - fairly detailed write up on building a pretty decent 3D renderer

Studying codebases is one of my rewarding hobbies. I ‘ve learned more from that practice( https://medium.com/@markpapadakis/interesting-codebases-159f... ), than from most technical books I ‘ve read (by the way, most technical books aren’t particularly great).

I agree most programmers are too quick to write before they read, but I'm not sure why it has to be "for fun". Reading code takes effort, just like understanding any new complex thing. The primary virtue here is patience. It's a lot like debugging actually: the goal is to understand.

Setting aside the "for fun", of course I read my co-workers' code, but also I read my dependencies' code quite often. In particular when you have something that is meant to be extended, the docs never completely cover all the use cases, and the quickest way can be to start reading.

Some examples from my own recent work are reading code from the Ruby gems devise, devise_token_auth, and spree, and also reading Django class-based views (all things meant to be extended). One trick I've learned in Ruby is `cd $(bundle show spree_core)`. From there you can grep, find, read, even add your own `puts` or whatever.

Actually I have been reading some code for fun: Postgres! I have an extension that adds temporal foreign keys, and I'm slowing porting it from plpgsql to C. Reading the code for standard foreign keys is very helpful. Same thing for a lot of other extensions I've written in the last few years. It has helped me a ton to read others' extensions and the core code.

This is what I've been working on for the past few months. I've been writing a lot of spaghetti code in the past and come back to find how poorly I organized it. I'm suppressing the urge to immediately write code, but rather, look at the big picture first, digesting the information, even if it means prolonging writing anything for a day or several hours.

I usually write pseudocode first before doing anything now. Either on paper, as code comments, or as a list. Sometimes I find myself writing tests on paper as well, it helps out a lot too. A programming language is just a means to an end, what they all have in common is a set of logic to follow.

Then I debug. I just fork the code, comment things out, add a little bit of functionality, etc. When I have a grasp on it write some TDD

I find the easiest codebase that I refer a lot is the todomvc, or any similar ones that have different implementations (differnet languages) for the same general solution. All the core logic between languages is mostly the same, so its easy to dig through and understand a different programming language / how its generally organized.

One thing I remember learning about from experience polyglot programmers. Its extremely helpful to have an existing codebase you've written (such as todoMVC), and using that same example / porting it over in a different language to learn that languages nuances.

"we usually read when we want to edit... That hacking produces better comprehension than passive, linear reading fits with what we know about learning... solid understanding emerges from active exploration, critical examination, repetition, and synthesis. Hacking beats passive reading on three out of four of these criteria."

This doesn't just apply to learning programming languages - also learning Chinese.


I've had much more success since writing my own tool to translate the text I'm interested in, rather than being spoon-fed some patronising phrases to recite from a textbook.

The thing about code reading for me is you have to have decent tooling.

It's pointless reading code in Github, especially for a codebase you are not familiar with, you need to get the codebase open in an IDE to be able to dive in quickly and back again to build the map out in your head. I don't think you can read a portion of code from top to bottom without having tooling....unless the piece of code is really short and just one or two files

The one online tool that I use for this kind of work is Woboq Code Browser. It's only C/C++ projects and not many are freely available. The Linux kernel is though and I spend a lot of time in that code base. glibc, gcc and llvm are also available.


Which projects would you like to see on code.woboq.org? Maybe we can add them.

Absolutely this. You can always tell a PR reviewed via web. Unless it's very localized, I always open up the branch in an IDE and follow the flow through the abstractions and see if the changes make sense, and if there are additional opportunities or less clear gray areas of operation.

It will be even better when web tools catch up. Could be even better if they can navigate across repos in different languages even without even checking out a single file.

Sometimes reading code saves your butt. I've had to make very minor changes to a code base to make it compatible with Python 3, and I've had to add things to HTML libs to prevent execution to JavaScript.

Taking an open source library at face value can sometimes put you in a precarious situation. For me, it's an exercise of following the errors or trying to break the code. I'm not really sure if that counts as reading. Reading a large code base, in my experience, calls for a pencil and paper. It's a lot more work than passively reading and trying to hold all the information in my head. Reading code isn't the same as reading a novel. Perhaps better guidance on what actively reading code means is in due order?


"Two-and-a-half months later I received an email from someone who not only managed to find the comment, but also managed to guess the code had to be rot13'ed."

On this spectrum, there's also "read by refactoring" (https://www.jamasoftware.com/blog/read-by-refactoring/). You'll need good refactoring tooling and good versioning (as you'll almost certainly want to discard many of your trial refactorings) but I've found that the approach helps me evaluate how my mental model matches up with reality. It also generally leads to a more readable codebase too

I must disagree with this sentiment, though I understand where it is coming from:

"Clean, solidified abstractions are like well-marked, easy-to-follow paths through a forest — very useful if they lead in the direction we need to go, but less useful when we want to blaze arbitrary new paths through the forest."

Using the same analogy, if you want to blaze a new trail through the woods, you need a map of the territory, and a map for this territory takes the form of higher-level abstractions with clear, explicit semantics and minimal but effective, and unambiguous, interfaces.

One of the biggest problems in modifying or reusing existing code is the risk of breaking higher-level consistency, for example by violating an implicit constraint that is necessary for correctness (the biggest Ethereum errors have been of this form, where the implicit assumptions included when and how often initialization is performed, and that a certain library will be present.)

If the clean code movement has been delivering code that works as step-by-step instructions for getting from A to B, but not as a map, it may be because breaking code down into small functions and classes is easier, more visible and more measurable than making coherent abstractions with consistent and minimal interfaces at a higher level.

To push back on this point a little bit, I think we agree that a map is one of the most useful things you can have but disagree on what the right type of map is.

While I'm not anti-abstraction, I think people often impose poorly chosen high-level abstractions on top of messy lower-level components (akkartik calls this Authoritarian High Modernism in a comment (https://lobste.rs/s/gtxi5y/nobody_s_just_reading_your_code_h...) on Lobsters). This approach leaves readers having to understand a bad abstraction and its internals, a map, its territory, and inconsistencies between the two.

As I understand it (please correct me if I'm wrong), you want people to do the work to find better high-level abstractions. I agree that it would be good if we could do this more often. I just also think we can find other ways to give people maps that work with our existing, messy codebases that don't have good, minimal abstractions already.

Yes, the start of your last paragraph states what I would like to see, and the statement I took issue with specifically dismisses 'clean, solidified' abstractions, not messy, failed attempts at it, yet I am also very well aware, from personal experience, that we often have to deal with code that is not like that (even though, in some cases, written by people who thought they were doing SOLID work.)

The question is what to do about it, especially given that abstraction and the separation of concerns are not suddenly going to fix things after all these years of being more honored in the breach. I do not think that there is a programming style that leads to the writing of understandable code without having a coherent vision of the big picture, because I do not think you can understand the purpose of code in the small without knowing its place in the big picture (except insofar as the author has successfully separated the independent concerns, which brings us back to my original point about the value of abstraction.)

One can still modify such code, but the less you understand about it, the more likely it is that you will make mistakes, and there is also a real possibility that the result will be harder to understand than the original - this vicious circle is one reason why much-modified code is usually difficult to understand.

While I don't hold out much hope for programming style to save us from this situation, I think we might do better with tools to help us understand code. In particular, I sometimes find myself doing program slicing by hand (what ways are there to get from A to B that modify X, for example), and there is some tool support for doing so, though availability is spotty at best.

"But you can’t expect a map to tell you what questions to ask, and it makes no sense to read a map linearly from top to bottom, left to right."

But you can read/view a map as an illustration to get an overall view of the layout of things and how they are connected.

I read code. It surprises me that people deem themselves an expert in some 3rd party lib without actually understanding the internals.

I had to use a js library for something today.

npm install downloaded 113 new packages. I'm not even a javascript programmer; I'm saddled with being "full-stack"

I'm supposed to read all that every time I need a new library? Or write it myself, when I have absolutely no idea how to build something that complicated, if I had time, negating the whole point of OSS in the first place?

The point made is to read code to become a better programmer. That doesn't mean you have to read all code you ever use.

OP makes no prescription about reading code. It merely points out what successful attempts to understand strange codebases have in common, and discusses what writers can do to help readers succeed.

(I proof-read drafts of OP. And I have made prescriptions about reading code. But that's a separate story.)

In a similar vein, I often wonder just how many people actually check cryptographic algorithms. We'd all probably agree that it is good not to roll your own crypto algorithms, or even in implementation of known crypto such as AES. But how many experts in the world are there for AES and secure hashing? A lot of the major MD5 and SHA1 work all came from Marc Stevens' team. How many experts are there in this field, really? "Provable security" is another issue: Shoup, Menezes and Koblitz have all expressed concern about the current state of the art, I think.

I'm far from an expert in cryptography but I've had to dig into OpenSSL's codebase on more that one occasion and every time it left me with a deep sense of uneasiness. It's really not what I would deem good code: too many macros, too many potentially confusing API conventions (inconsistent return values in case of error, unclear resource lifetimes, ...), not enough comments and documentation...

It's not the worst C codebase I've ever seen (far from it) but for something as critical I'd put the bar very high it terms of code quality and OpenSSL doesn't even come close.

I wouldn't have thought to be in the minority when stating that I actually read a lot of foreign code out of curiosity. Especially when dealing with architectural decisions, I like to dig into codebases I consider to be good and professional (a rarity in the professional/proprietary development world unfortunately). For instance, reading the chromium source code or mozilla for that matter, give you a lot of good input since both codebases are sufficiently complex and deal with a lot of layers and stacks.

How do you know a codebase is good before you invest time digging into it?

I don't. There are a few indicators though like the overall structure of the source files, how they are laid out on the file system. Minor details like inconsistent formatting. Typically if one or more of those indications is messed up, I'm pretty sure the rest of the codebase is also not in a good state. I still do skim through some files to confirm my expectation though.

One of the best parts of the Plan 9 system is that all of the kernel/command/library source is on hand and readily available to consult at any given time.

Jointly, an important detail is that the complexity of said source is kept to a sane minimum and general style trends mean that most, if not all, of the source is formatted similarly and legibly. The system is so compact that you can keep most of the system source in your head at one time if you really need to.

A fun tidbit from the Plan 9 compilers only really able to be found by reading the source: https://groups.google.com/forum/#!msg/comp.os.plan9/uMF7A4gk...

One thing that helps me make reading code in code reviews less passive is to add occasional temporary print statements and then run the unit tests again that cover the code I am reading. Seeing variable values satisfies some curiousity. Sometimes I will use a debugger to step through code I am reviewing but for some reason I prefer print statement to satisfy any curiousity I have about how the code works.

Do I sit back in my comfy chair on an evening and just read through some code repo? Well, actually, yeah, I do... Though not terribly often. Do I sit back and read through programming bloggers' explaining their code? Yes, pretty often, and based on HN, that's pretty true of a lot of programmers.

Still I'm more likely to find myself browsing wikipedia for math and cs

Peter Seibel, the author of the Coders at Work book mentioned in the post, actually discussed this in a blog post: http://www.gigamonkeys.com/code-reading/

It's not quite true that none of the programmers interviewed in the book routinely read code for fun. In his blog post, Seibel mentions the exceptions:

> First, when I did my book of interviews with programmers, Coders at Work, I asked pretty much everyone about code reading. And while most of them said it was important and that programmers should do it more, when I asked them about what code they had read recently, very few of them had any great answers. Some of them had done some serious code reading as young hackers but almost no one seemed to have a regular habit of reading code. Knuth, the great synthesizer of computer science, does seem to read a lot of code and Brad Fitzpatrick was able to talk about several pieces of open source code that he had read just for the heck of it. But they were the exceptions.

The Knuth exception is notable: he does routinely read code for fun. In the book he mentions things like

> I’ve got lots of collections of source code. I have compilers, the Digitek compilers from the 1960s were written in a very interesting way. They had their own language and they used identifiers that were 30 characters long but very descriptive, and their compilers ran circles around the competition at the time—this company made the state-of-the-art compilers of 1963 or ’64. And I’ve got Dijkstra’s source code for the THE operating system. […] I collected it because I’m sure it would be interesting to read if I had time. One time I broke my arm—fell off a bike—and I had a month where I couldn’t do anything much, so I read source code that I had heard had some clever ideas in it that hadn’t been documented. I think those were all extremely important experiences for me.

And then a passage quoted in the post about how he reads code, with a Fortran compiler as example (not repeating here it's in Seibel's blog post: http://www.gigamonkeys.com/code-reading/ ), after which the interview (and the book) ends with Knuth saying:

> don’t only read the people who code like you.

(The examples in the book of people who last seriously read code years ago also interesting, e.g. Douglas Crockford mentions some programs he read, and there are multiple pages in the book about Guy L. Steele reading the TeX program.)

Anyway, back to Knuth: I think that, the way he reads and digests code (or even papers: http://blog.computationalcomplexity.org/2011/10/john-mccarth...), it addresses a lot of the points the linked post makes. Even when “just reading” code, or anything for that matter, you are supposed to be doing the things the post mentions in “hacking versus passive reading”: active exploration, critical examination, synthesis.

For example, Knuth read the code of ADVENT (aka Colossal Cave Adventure), and loved the code (not just the game itself) so much that he rewrote it to share with others to read, in his own preferred literate-programming (CWEB) style that matches the way he thinks (http://www.literateprogramming.com/adventure.pdf). This definitely doesn't sound like “passive reading” to me.

Nevertheless, great post.

Upvoted for careful scholarship and a useful addendum (I'm OP)!

I sometimes wonder whether a lot of Knuth's greatness comes from doing more of the stuff everyone knows they should do but don't. If you read this interview (https://github.com/kragen/knuth-interview-2006) with Knuth, he talks about how he was nervous he wouldn't be able to learn calculus so just decided to do all the problems instead of just the assigned ones. Unsurprisingly, partly because he's Knuth and we all know Knuth can do math, he ends up really good at calculus: > But Thomas’s Calculus would have the text, then would have problems, and our teacher would assign, say, the even numbered problems, or something like that. I would also do the odd numbered problems. In the back of Thomas’s book he had supplementary problems, the teacher didn’t assign the supplementary problems; I worked the supplementary problems. I was, you know, I was scared I wouldn’t learn calculus, so I worked hard on it, and it turned out that of course it took me longer to solve all these problems than the kids who were only working on what was assigned, at first. But after a year, I could do all of those problems in the same time as my classmates were doing the assigned problems, and after that I could just coast in mathematics, because I’d learned how to solve problems. So it was good that I was scared, in a way that I, you know, that made me start strong, and then I could coast afterwards, rather than always climbing and being on a lower part of the learning curve.

Yes exactly. It's always inspiring to see how (in his case) you just start doing things simply and methodically with focus, and eventually you get very far and become better than anyone else.

Well, Knuth is a powerfull exception. Does not compare with the actual copy-paste monkey programmers, that don't even bother to real all the SO post, because people do most of the time explain the code in the solution. Sometimes the insight in the post+comments is more valuable than the code part.

Somebody read my code once and commented on my improper use of 'mod' or something like that. It then turned into a long discussion on how they might be incorrect, and so on and so forth on Reddit, back and forth on Lisp.

I love Lisp. It's a great language with great programmers.

I've been thinking about this idea for a while and love the reimplementation idea. I really want to start live-streaming myself coding (or pair programming) and have been trying to come up with some ideas that would end up being interesting to watch.

One idea was taking a popular open source tool with a good test suite and trying to get the tests to pass, but I think checking code out at a point before a new feature is implemented and reimplementing it could end up being really interesting. Could even make it more interactive by sharing the checkout point so viewers can do an implementation themselves and doing a stream walking through different interesting implementations.

A few years ago in my previous company there was a major push to convert a lot of the codebase to Python. A byproduct of this conversion was an easier way to build UI using predefined widgets. Yet no one was using this functionality due to the mindset of years of TCL. I honestly hated TCL. So I took it upon myself to explore the entire new code base and go about building UI panels. Initially I just couldn't understand how the code mapped out until one day I just printed out some of the code and read it repeatedly till it made sense. The whole exercise was an eye opener.

Best way to figure out someone else's codebase, IME, is to just dive in heavy and start breaking things. Figuring out what's connected to what and why by removing chunks of code that you need to understand.

I read the source code for the libs all the time, mostly to see what methods and properties they expose. I learn something new every time. This is JavaScript/NodeJS. I don't miss compiled .dll's

Once I propose for a team to read the code of Redux before adding it to the codebase (if you don't know, redux is a very small libraries, very easy to read). But no one seemed to care, they was just saying things like "Well, I'll read this blog post that teach us how to use it", and I was like: "BUT, if you read the code You will know exactly how to use it! Why not go to the source instead of read other people opinions?!"

Reading code is difficult and time-consuming. To really understand a function you need to try it with different inputs, either with interactive testing or in a debugger; AND you need to know what the functions it calls does.

The second part, knowing what the code the function is calling does, is nearly impossible when they are other functions in the program itself.

I love to dig through peoples code. Unfortunately I know myself enough to know that it can be a giant time sink if I let myself go.

In swedish there's a phrase called "snöa in", to get snowed in literally, and that's what happens to me.

So after enough years in the biz I try to hold myself back from this habit.

By corollary why it's also so important and terrifying to release open source projects too.

You're oftening wondering who is gonna dig into this code of yours and think about your coding, esp. when you're not following the best practices and TDD.

If you want to exercise reading concise and easily understandable standard C code, grab a toy project from suckless.org. Even though I write my code more defensively in some ways, that stuff is really straightforward to hack on.

Interesting read, but I find comparing code to a map to simplistic. I would at least say that code is more like a puzzle of a map. You may or may not understand each puzzle piece and where it fits it. That is my experience with reading code.

Code is like Ogers and Ogers like Onions. They have layers of complexity :P

I read code to learn all the time. And I don’t get paid to program.

The only way to be sure you understand everything is to learn quantum mechanics perhaps string theory too just to be safe. But in practice modesty dictates we limit to learning as much as we can to get the job done.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact