Hacker News new | past | comments | ask | show | jobs | submit login
Learn to Read Code (kylesletten.com)
195 points by kbsletten on June 29, 2017 | hide | past | favorite | 57 comments



I really do think curiosity is the most important bit out of all this.

If I see some bit of our code doing something a bit odd, I'm nosy, I'll go and look why. Why does one of our classes use an IDisposable? Why is that property using a custom class? Why is that code using async/await when it looks like it should take nano-seconds?

If it's for a good reason, you'll have invariably found a gotcha, or gained a little more knowledge about the domain you code works in.

But surprisingly often you'll find that it's for a bad reason, or an outdated reason, or someone not really understanding how the functionality they're using works.

In the early days of a language/framework especially, when people really don't understand how it really works, what you see is a lot of cargo-cult code. Doing without understanding. .Net, Jquery, Ruby, Javascript, Typescript, Node, I've seen it happen many times (and of course done it myself too! I remember making loads of pointless singletons in one project a decade ago).

People do something because they don't have the full picture. They make up totally wrong rules like "If I add IDisposable the GC must work better as I believe IDisposable has something to do with the GC so it can't hurt".

A great example right now, in the .Net world, is people are littering their code with async/await. They can't understand it's making their code worse, not better. They don't understand that the performance gain is almost always negligible, but the code complexity increase is expensive.


Async/Await can definitely increase the complexity of the codebase and make stack traces difficult to interpret (especially if you aren't clear on what is happening under the hood), but if you work on any sort of web-based/distributed system, with lots of back-end database/API calls, the performance gain is hardly negligible.

Sure, you won't see much of a difference in response times, but resource usage gets dramatically better, and with it the ability to scale.

If you're working with ASP.Net and you're not using Async/Await, you're doing yourself a disservice.

If that's not your scenario, well then, yeah, dealing with Async/Await is probably not worth the headache.


I do work on web based systems and I can tell you, without a doubt, that the performance gain is bugger all.

I flatly disagree with you, this is exactly the cargo-cult programming I'm talking about.

You're thinking about it totally wrong. Async/await is useful if you have a million pipes, but need 2,000,000. Thats millions of requests per second. Virtually no ASP.Net programmer has that need. They usually need 10 or 20 pipes per second, but they need really, really fat ones. Async/await does not make fatter pipes, it lets you use a single pipe as multiple pipes. But for decades IIS already, by default, supplies loads of pipes.

To make it clear, practically speaking, for virtually every asp.net programmer in the world, async/await is utterly pointless.

It's ok for external website calls. Or something you really need to multi-thread. Like the once in a blue moon tasks.

I work in ASP.Net, I rewrite bad async await code, get rid of it AND improve the performance by orders of magnitude.

Performance problems are almost always, in my experience, SQL bottle necks, bad EF queries or stupid loops. And I have fixed serious performance problems for various businesses. Once in 12 years have I seen IIS running out of threads and that was in 2006 when CPUs sucked. And that was fixed by, amazingly, fixing a bad SQL query.

Littering several hundred 40ms controller calls with asynchronous EF calls that run in 3ms won't make your site faster. Fixing that shitty report that locks 20 tables and takes 2000ms to run causing multiple deadlocks and gets called 5 times everytime someone opens your dashboard because you suck at JavaScript will. Put all the async awaits you want into that bugger, it's still going to screw your site over.

Async/await is a load of utter nonsense for the vast majority of MS customers. And yet now I see people using it for the most asinine calls that clearly do not need it.


Using async/await with desktop GUI apps is basically a no-brainer. Nobody likes applications to randomly lock up and this is a way easier way to solve the problem than the old ones.


Amen to that. Where I work I see people adding await/async all over the place because "performance". So now when you get exception stack traces they are hard to follow or flat out useless.


>They can't understand it's making their code worse, not better. They don't understand that the performance gain is almost always negligible, but the code complexity increase is expensive.

Yeah, sorry, give me an easy to use benchmark for when to decide to use async/await. Knowing when to use Async/await for .NET has been the biggest minefield so far. And yet you see it in ASP.Net and UWP tutorials.


You use async/await when you are doing any sort of IO in an application that needs to be responsive. Example: your ASP.NET thread pool is starved because all your request threads are waiting on IO. It's useless to have those threads waiting there for IO when they could be processing other requests to your server.

You're obviously going to see some overhead from the state machines generated, but that's negligible compared to the gains you'll see in throughput and code readability(compared to the older ways of doing async code).

The same reasoning extends to UWP, don't block your main thread with IO so it can still process input. Obviously there's other ways to handle async in a desktop app, but async/await just make it so much easier to reason about your async code. It looks exactly like you're writing synchronous code.


It's a total nightmare to introduce async into an existing, non-async codebase, so I'm not sure it's a good case for incremental refactoring based on benchmarks.


How do you balance reading code with getting things done? I'm thinking of the "move fast and break stuff" attitude that doesn't seem to leave much room for reading a lot of code. I believe that reading and understanding the code will help you move faster eventually, but you're slower to start. So, how do you know when you've read enough?


I just accept I'm a sprinter when it comes to coding, not a marathon runner. I spend 2 or 3 hours poking the code with a stick, proverbially speaking, reading it, maybe pruning crappy code, maybe faffing around, then bash out the actual code in an hour or two.

I believe what happens is that I form a mental map of the business process and a mental map of the code and then when I eventually come to write something, it just pours out.

Psychologically, it took me a long time to accept that only actually coding 2 or 3 hours a day is fine. For years I beat myself up for being 'lazy'.

EDIT: I mean, it's obviously different if you're just adding an extra column to an existing table or something (usually) trivial like that.


IMHO, the answer to this question is to frequently ask yourself this question as you're working on things. There's no perfect balance, you'll have to make a decision depending on the context and priorities of the moment. It might be ok to gloss over a concept you're unfamiliar with if debugging a production issue and having evidence that the root cause of the bug is in another area of the code altogether. But then make some time during feature development (or when writing a COE about the production issue) to go back to that piece of code and dig deeper into why it's written that way.

To expand on the analogy of the OP with regards to unfamiliar words in a novel, I personally don't look up every single word that I come across when I can get an idea of what it means from the context, but only after I encounter it a few times.


Good points. Thinking about the unfamiliar words analogy I would look up a word if I see it often enough or if it's key to understanding. Knowing if an unfamiliar word is important is tricky though--I can't define the process I use there.


It's clearly a trade-off. I have found that as I gain experience, reading code becomes faster, even for domains that I'm not familiar with, and it's nearly always worthwhile to understand the problem.

Moving fast and breaking things adds to the technical debt. The word debt implies that you will have to repay it (or at least, someone will have to repay it). So moving fast now means slowing down later. Sometimes this is desirable, sometimes not.


Related: Dig into your language's source code and code of major libraries! Your standard lib is probably the best examples of idiomatic code and efficient code that you're going to have access to, written by the authors of the language itself. You'll notice bugs, inconsistencies and poor docs which gives you opportunities to contribute upstream, at the same time making you read more and learn how particular features work under the hood, which might give you new perspectives in your own code.

In Java, this might be java.util.Collections package, or java.util.Concurrent. or in C++ that might be Boost. In Elixir, I start with Kernel.ex[0]

https://github.com/elixir-lang/elixir/blob/master/lib/elixir...


I agree with the advice to read your standard library, because I think there's huge value to your general understanding of lower-level concepts and to knowing the available functionality of your language, but I disagree that it is the best example of idiomatic code, necessarily. For example, C++'s STL was written to be portable, efficient, and general beyond all other goals. Modern C++ style and abstractions are nowhere to be found, and if you follow the STL style, you may over-abstract (templatize), use manual memory management, and architect for non-relevant circumstances.


Certainly depends on the language and where in the standard library you are, though you're completely right to always be cognizant that it's all been taken through battles in production at scale and has been optimized accordingly. One could also argue that that kind of code might be idiomatic for that kind of problem (an example of general, portable, and efficient), and a good guideline on how that might be done, though it might not be a super satisfying argument depending on how conservative your definition of "idiomatic" is.


BTW, if you're going to read a C standard library, skip glibc and go for musl: https://git.musl-libc.org/cgit/musl/tree/src


There was an article last week-ish about reading OpenBSD source daily.

https://news.ycombinator.com/item?id=14521386


why do you say that?


Because libcs other than glibc are less bloated and probably easier to read.

  ~ $ cat empty.c
  int main(void) {return 42;}

  ~ $ gcc -static empty.c -O3
  ~ $ strip a.out
  ~ $ size
     text	   data	    bss	    dec	    hex	filename
   677598	   5792	  10464	 693854	  a965e	a.out

  ~ $ diet gcc -static empty.c -O3
  ~ $ strip a.out
  ~ $ size
     text	   data	    bss	    dec	    hex	filename
      901	     32	     76	   1009	    3f1	a.out


It's just not very readable.


I would modify this slightly, and suggest finding out who some of the most highly regarded programmers are for your language, and reading their libraries.

Standard libraries are often a very mixed bag of stuff because things are added but rarely taken away. Some it may be very new, and some of it may have been written before the idioms of a language were even finalized, and then frozen by compatibility needs. If you don't already have a strong sense of good style (or what is currently considered good practice for a language) you may find it difficult to identify what is good and what is a relic of history.


Of course being able to read and analyze code is important, but I think the author picked a bad example to make his point. Removing all the using statements would be a bad idea! They don't hurt anything, the verbosity is minimal, and someone could add code to the Dispose() method later. If you are dead-set on changing something, you should make the XReader no longer implement IDisposable. And if the XReader is not under your control, then you should absolutely respect the author of the class by following the correct disposal semantics.


Yeah. If you ever watched any of the old MIT lectures with Abelson, the most fundamental point they make is that it's only possible to make a complex system by essentially treating components as a black box. Having a bunch of crap implementing IDisposable when you don't intend that it should ever be disposed, and then chastising people for not going to the implementation to see that the Dispose method is empty, illustrates a sort of confused mindset.


>If you are dead-set on changing something, you should make the XReader no longer implement IDisposable.

I'm sure that's what he meant...at least I hope


You're correct in your assessment, but the parent is correct in their criticism. I'll try to rephrase, though the particular example is not of particular importance.


For me, the best articulation of why coding is so hard comes from a 2000 Joel Spolsky article [1]:

It’s harder to read code than to write it.

I continue to use that line to explain coding to non-coder friends. This article's author seems to understand that as well, but it's worth making explicit.

[1] https://www.joelonsoftware.com/2000/04/06/things-you-should-...


"dispose" sample in the blog is a wrong sample. IDisposable is a contract, and if someone define this contract, is because this contract should be used. the fact that the actual implementation is "empty" doesn't mean that this remains true in the future, or in another implementation of the same interface. so, based on the sample, I can say: don't read the source!


YAGNI.

Just like the author found out, it's been there years, no-one's used it.


Ok, so remove IDisposable from class declaration and fix compilation issues. My2Cents.


I completely agree. That's what we ended up doing to "remove all those usings." I do admit it was poorly phrased.


I just wrote a similar comment but I think yours is better. "Contract" is a great word to use here.


This whole "read code" thing ("write code that's easy to read", "learn to read code") is imprecise at best. Whats important is to learn code. You want to learn to write code that's easy to learn (not easy to read). And you want to learn to learn code.

If you write code that's easy to learn, and if you learn to learn code, you'll simply remember more code. The more code you know, the more productive you gonna be. That's because at one end of the spectrum, you know nothing about the code and you simply cant do nor solve anything. And at the other end of the spectrum, you know all the code, in which case you gonna be able to do whatever you want by definition: analyse it, expand it, rewrite it, refactor it, talk about it, fix it, etc.

This also explains why you want to write code with fewer lines of code (or fewer AST nodes): the more lines (or AST nodes), the more you have to learn. The more you have to learn, the less you know, the less you know, the closer you are from the bad end of the spectrum.


Any tips on how to learn code?


The way I make myself familiar with new code is to find some process of interest and write out a psuedo code stack trace from the trigger through execution. So, function names, relevant params, leave out error handling and irrelevant branches. If a step is async, reset indentation and annotate the "stack" with "and later..."

This process can collapse large amounts of code into a hopefully a single page of text that can be well understood. Without it, I find it very difficult to keep all of the relevant code in my head when it's spread across the code base.


I like it! I'll have to try that the next time I encounter some new code.

A few questions:

* Do you worry about file names?

* Do you use names from the code or your own shorthand?

* How do you handle conditionals?


No filenames. Just functions. Collapsing info from multiple files is a big part of the goal. IDE search works well enough that I rarely know what file I'm in.

I use names from the code. I do want to be able to locate the code later.

I try to omit most conditionals. The goal is to show the path of interest, not all possible paths. Some conditionals are interesting though --very likely errors, process cancellations, scary bug-tempting edge cases.


Some ideas: Spend time learning how to do thenusers jobs, understanding how thenusers use the system. Go into the code, pick one spot, and understand what it does very well. For that one spot, make a list of all the external things it uses that you don't know about. Go to each spot in that list and make notes about how that thing works, indented under the list item. Repeat that process recursively for a level or two, or as many as you have patience for.

Pair program with someone else who knows the system. Write unit tests.


Reading automated tests and doing extensive searches in the codebase (with recursive `grep`s) is often helpful. If you use an IDE with a go-to-definition function, it can replace grep.


The most damaging thing about modern web frameworks is that they make it difficult, sometimes prohibitively difficult, to read code. You can sometimes step into framework code if it's well-written (i.e. Django REST is pretty easy to read) but if the framework does anything "magical" you're basically stuck.

Complex frameworks are good for getting a project started quickly, but become more and more problematic as a project ages and needs to be debugged.


.NET MVC is really bad when it comes to this. I'm not sure what problem they were trying to solve, but referring to a specific controller-action by using Url.Action("Controller", "Action", "Area") was a really bad idea imo. This means that visual studio's 'find references' functionality doesn't work, and the project I'm working on has dozens of unused controllers (I think). It also means that developers can do stupid stuff like make links dynamically loaded from the database ('Controller', 'Action' fields that never change).

I also despise the whole concept of 'validators' in .NET MVC. Not only does it prevent you from using distinct validation for particular actions using the same model, it completely hides the fact that validation is used at all (if you trace the controller code). Why not just write a class that validates the model and use it in the controller? Is that 'too much code/too much coupling' for the controller? Ugh.


>This means that visual studio's 'find references' functionality doesn't work, and the project I'm working on has dozens of unused controllers

I know it's an annoying extra step to learn and do, but when doing anything with refactoring controller actions (e.g. changing action names or required parameters), Ctrl+Shift+F is really your friend. I have noticed references not getting picked up by the refactoring quick actions too, but it was sporadic.


Oh I know -- powershell's my weapon of choice here. And it doesn't help when the previous developers decided to store action/controller names in the database...lol.


This may also be a contributor to the fact that a lot of devs these days seem to make a clear distinction between the type of code they allow themselves to write and code they won't write. A lot of people stay within a narrow range of what their chosen framework allows and don't really experiment with doing complex stuff themselves. In my view someone who has worked with React or Node (example) for a while should have a fairly good idea how they are imlemented but it seems many people accept it as "magic" and don't look behind the scenes.

To me reading source of these frameworks helps realizing that whoever developed them are just human beings like the rest of us and write regular code that's really not special.


More than that, reading framework source often reveals that the code is very, very bad.


Because a lot of working code is very bad. Reading other people's code is a good to way calibrate expectations for code quality.


It's ambiguous what you mean there, expectations could be beliefs ("I expect code to be bad generally, so I don't assume anything") or what you'll accept ("I expect code in my system to be good, so I rejected the PR that added that garbage library").


The how-to-read-books advice from grandpa that is buried in this article is where the majority of its value lies.

If you skip over unknown words, or just acquire their apparent meaning by guessing from context, you don't really know those words and won't be able to use them effectively, or at all.

It's perfectly fine to gloss over code. You don't have time to understand at a maintainer level every nook and cranny of everything you have to work with.


I think this is one of the best features of e-readers. You can get the definition right there as you are reading, decently seamless.


I use such a FireFox extension for reading Japanese: Rikaichan. Written by the same dude who put together the Tomato firmware for routers.


It's easy to imagine changes being made by folks who don't truly understand what's going on, being made on top of changes that were made by folks who didn't truly understand what's going on.


I don't get this. When I use a Using to wrap EF DB calls that output to a DataGridView in C# I often get errors that mean the connection was closed before the data was grabbed.

The solution is easy, using a ToList in the DB call to force it to dl everything first, however to me it shows that the Using is working. Now we find that the black box of Using is empty?


This sounds like active listening. A person can be a good listener in the sense of paying attention and being present, but it gets even better when the listener helps guide the speaker by asking for clarification of ambiguity or for greater detail on specific topics.

Surrounding yourself with the tools and materials the article suggests makes it more likely you'll turn code-reading into a conversation.


Every once and a while I would come upon a word in the book that was highlighted or underlined.

I realize this is a tiny nitpick, but it is "once in a while" not "once and a while"; the latter isn't even grammatical.

The article is otherwise very good.


This is funny, just wrote post on a same topic: https://zarkzork.com/code-reading.html


This post (and the one by Jessica Kerr) has inspired a new witticism for my LinkedIn profile: "A 10X engineer after my destructive behavior has caused most of your team to quit."


> This was timely for me because I've recently become (by atrophy) one of the most productive developers on my small team.

Does the author ironically mean... by attrition?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: