Hacker News new | comments | show | ask | jobs | submit login
Ask HN: What code samples should programmers read?
310 points by _Microft 167 days ago | hide | past | web | 148 comments | favorite
What are examples of code that ever programmer should have read and analysed because they show a particularly deep insight into the problem, a creative attempt at a solution or otherwise outstanding proficiency?

Goal is to learn something from excellent pieces of code, much like it is practised in art by studying Old Masters, pieces of literature or musical compositions.

PS: I admit to exaggeration in the headline / first paragraph ("should have") btw ;)




Here's an overlooked one: your dependencies.

Every programmer should read the (presumably open source) code they depend on, but almost nobody does it. Some look at documentation, but not everyone even does that. What you may find is that some of the code you depend on is garbage, and may even motivate improvements.

This may be less feasible for giant messes such as most front-end tooling and frameworks that exist today.


Paraphrased from a professor when he substituted my intro programming course:

Code is a valuable gemstone covered in layers of shit. Always aim to create diamonds, not CZ, and keep at least one face of that diamond shit free.

I always thought was a very pessimistic way to view code until I landed my first job. Algorithm exercises and toy projects for capstone can be shit free diamonds every time. Working with finicky clients, legacy code bases, and peers who don't care about quality quickly proved that sometimes you're destined to make topaz double-dipped in dung.

Dependencies are no exception to this. I've worked with a mapper that works pretty well out of the box. I ran into an issue with how Strings were mapped to boolean values. It turned out the default mapping values for those conversions were yes, no, y, n, and possibly some other variation of those String values. It didn't include true and false by default.


I had a fun one recently. Just the other day in fact. The elastic beanstalk cli from aws was refusing to accept my branches when I tried to refer to a code push from a codecommit repo. Turns out branches with / in it can't be used because it conflicts with the way the cli parses its parameters.

Basically it expects a parameter along the lines of

codecommit/RepoName/branchName

And then it splits the string and if there are more than 3 items as a result it returns from the function saying invalid value.

I had to correct it to use it myself but I found it fascinating since it suddenly made the somewhat mystical innards of aws seem a little more human. At least that's how I felt :D.


I would submit a bug report. I've reported bugs in aws cli tools before, and they were patched.


They also accept PRs sometimes.


I program a lot of Python using PyCharm, and one of my favorite features of this combination of language and IDE is how easy it is to hop into the source of your dependencies. Anything you have imported into your code you can bring up the context menu for and click "go to declaration" (or use the relevant keyboard shortcut).

Python dependencies are source code, not compiled blobs, and the import syntax makes it very explicit where definitions come from.

If I'm unsure how to do something with the library I usually look at the source of the relevant module before looking up online docs.


I think the ease with which your IDE lets you get and debug into the source code of your dependencies (assuming your language lets you do this) is definitely something I overlooked in my earlier development career as pivotal to better understanding not only the dependencies, but how to think about software from other peoples' POV, which when done often enough gives you a better intuition for where and why different bugs manifest. JetBrains indeed makes some awesome products.


Java, Maven and Netbeans is almost as easy but since dependencies are compiled I have to click one more time, -download sources- and there I have it.


Eclipse will automatically attempt to download sources from Maven when you jump into a dependency.


I suspected it was easy in eclipse as well and should possibly have mentioned that.

That said let me add that IntelliJ probably has similar systems.


If you don't have the source, IntelliJ will decompile the binary and I must say it does a terrific job.

It also handles downloads via Maven, but I have only done it only once and don't remember the details.


This is why I love C++ and C even though you cant find the source you can read the headers.


What programs or libraries are these for which you can't find the source but have the headers available to read?


All proprietary third party libraries which one might depend on. And depending on the domain that could be at lot.


Anything proprietary? Trivial example: Windows or MacOS system headers.


Doh! Of course. Being on Linux almost all the time has given me tunnel vision.


This is especially necessary for driver code from chip manufacturers. Most of that stuff is written to provide demonstrations and test the part, and it should not be trusted.


> Every programmer should read the (presumably open source) code they depend on, but almost nobody does it.

Has this ever been beneficial? I generally just check if a library is popular and maintained and that's always been enough for me. If the code really is horrendous nobody is going to be using it and if you always read the hundred of thousands of lines of code you depend on you'd never get anything done.


Reviewing dependencies avoids dragging shitty, obviously broken and/or horrifically inefficient code around.

You don't want to use a data serialization library that is so poorly written that it likely corrupts stuff sooner or later, do you? Or a crypto library that documents almost nothing and requires you to review x86 assembler to find how parameters are actually interpreted cough?

Taking at least a cursory glance at the code is one part of vetting dependencies before using them. Other activities would be:

checking use of best practices ("Your parser written in K&R C is probably fine without fuzzing and any tests"),

availability of documentation ("source is there, no?"),

checking for known vulnerabilities and bugs (First open issue: "Library segfaults when parsing long lines of []{}"),

maintainership ("Last commit: 2004") and

portability ("(void*)a + (int)5.7").


You don't want to use a data serialization library that is so poorly written that it likely corrupts stuff sooner or later, do you?

Python's CPickle comes to mind. There's a race condition in there somewhere, but I've never been able to reproduce it with a small program. [1] Using the version of Pickle written in Python works.

This sort of thing is why I'm somewhat negative on the argument that "if Python is too slow, write the important parts in C". It's too easy to break Python's memory model in C code.

[1] http://bugs.python.org/issue23655


On the other hand it's fairly easy to not break Python's memory model if you don't unlock the GIL :)

Edit: Ah that kind of bug can be ... tricky.


> You don't want to use a data serialization library that is so poorly written that it likely corrupts stuff sooner or later, do you? Or a crypto library that documents almost nothing and requires you to review x86 assembler to find how parameters are actually interpreted cough?

If it corrupts data and has terrible documentation it wouldn't be popular. It's usually very obvious from the community around the library if it's got major problems or not.


Popularity is not an indicator of anything other than popularity. McDonalds is the most popular burger sold in the world, that does not make it good.

Length is an important consideration and most never consider it, partly why software is notorious for getting bloated over time. At some point it would be easier to rewrite code from scratch with less complexity and overhead.


> Popularity is not an indicator of anything other than popularity. McDonalds is the most popular burger sold in the world, that does not make it good.

On the other hand, McDonald's is pretty reliable and you know you're getting something palatable that is probably safe to eat (in the short term, anyway). You don't need to pick the best-in-class library for everything.


You sound very close to advocating a "Not invented here" approach. (https://en.wikipedia.org/wiki/Not_invented_here)

There are some rare cases where that makes sense (Keyword: RARE), but most times I've seen developers take that stance, it leads to longer development times, less consistent code, more maintenance work, and generally pretty shitty outcomes.

Popularity is ABSOLUTELY an indicator of things other than just popularity. However you need to understand why the thing is popular and vet that against your needs.

Finally: McDonalds IS good. I don't eat there often, but they provide a consistently satisfactory experience for their customers. There's a reason you can find McDonalds nearly everywhere on the planet.

It's popular to diss on companies that have become large iconic chains, but they're large iconic chains for a DAMN good reason. Plus, they're solving supply chain issues you aren't even considering, much like large popular libraries are handling use-cases and problems you don't even know about.

Can I make a better burger? Sure. Can I make a better burger for the same that McDonalds charges, as consistently as McDonalds does? Fuck no.

The same probably applies to your software.


> Finally: McDonalds IS good. I don't eat there often, but they provide a consistently satisfactory experience for their customers. There's a reason you can find McDonalds nearly everywhere on the planet.

McDonald's is consistent, predictable, cheap, generic, and ubiquitous. Sometimes those are good qualities. "Good enough" is usually good enough, after all.

Regionally, there are a half-dozen choices that I'll take over McD's: Less iconic world-wide, but as cheap or cheaper, just as consistent, and more pleasing to the palate. I think that's closer to what they were advocating: Not necessarily building it in-house, but also not taking popularity as a reliable indicator of code quality (beyond a bar of minimum acceptability).


If it hadn't been for the last sentence:

>At some point it would be easier to rewrite code from scratch with less complexity and overhead.

I'd agree, and probably wouldn't have commented.

But I think being all of these things:

>consistent, predictable, cheap, generic, and ubiquitous.

Means you have a damn good product. No one is claiming you can't get something better, but come on, I'd love to have software that's consistent, predictable, cheap, generic, and ubiquitous.

If those words describe a popular library, and you decide instead to build your own thing in-house... you better have a really REALLY good reason for doing it.


> Means you have a damn good product. No one is claiming you can't get something better, but come on, I'd love to have software that's consistent, predictable, cheap, generic, and ubiquitous.

Note that I never said "consistently good" or "predictably reliable". There are a lot of things in the world that are known to be low-quality, single-use crap that are still everywhere.

> If it hadn't been for the last sentence:[...]

I think it depends on their idea of where "at some point" lies.


I'd settle for consistent, predictable, and cheap. Generic doesn't matter if it solves my specific problem and ubiquity is a social problem. Enterprise pays a lot for bespoke solutions as well.

A very good reason to write your own software is if the problem is novel enough to warrant it. I'm not advocating that there should be n+1 JSON serialization libraries.


> Popularity is not an indicator of anything other than popularity. McDonalds is the most popular burger sold in the world, that does not make it good.

It doesn't make it good, but it does mean that it's unlikely to make you ill.


Popularity means more eyes have looked at it, more issues have been made, more contributors exist. Of course, I've had issues with dependencies that were popular but generally it holds that a library that's been around for a while and still finding usage means that it does a good job at it.

The difference between code and a burger is that anyone can consume a burger but code is consumed almost exclusively by professionals. A better comparison would be power tools where you'll often find that if a brand is popular, it tends to make solid, long lasting products.


The point was more about popularity having no correlation with anything else. There is tons of unpopular code that is really, really good, these are hidden gems. Likewise, there is popular code that is quite bad or mediocre regardless of number of contributors.

To rephrase: PHP is the most widely used server-side scripting language on the web. That does not make it good.


> Likewise, there is popular code that is quite bad or mediocre regardless of number of contributors.

Like what? If you're looking for a JavaScript library for example, you can't really go wrong picking one that has many contributors, an active issue tracker, thousands of stars, lots of forks etc. Unless it's some core part of your stack, I don't see how it's practical to spend a lot of time hunting down super well implemented libraries that aren't likely to be supported in the future because nobody uses them.

> To rephrase: PHP is the most widely used server-side scripting language on the web. That does not make it good.

I'm not a huge fan of PHP but you can still write good software in it.

Libraries are relatively easy to swap out compared to say operating systems and programming languages. Personally I find many of the best libraries tend to be the most popular ones.


I have a rather unpopular opinion that JS libraries get popular for reasons unrelated to merit, such as corporate backing (React), appeal to beginners (jQuery), and adding syntax sugar that has no value to end-users (Underscore/Lodash, Babel). They are symptomatic of organizational problems related to herding programmers at scale.

To beat the average, by definition you can't just do what everyone else is doing and expect outstanding results. If your problem is unique there won't be a library for it. If you find or create a hidden gem, it's your secret weapon, you are its user and responsible for it.


I use React, jQuery, Lodash and Babel and think they're great projects. What equivalent projects would you replace them with? What are you using that you consider a secret weapon? Going back to my original comment, I don't understand how you can pick these libraries as examples of things you should avoid if you read the source code.


I'm sure they weren't recommending you read every line and grok the whole code base of each of your dependencies.

I don't know how helpful reading source is for determining whether or not to use a dependency, but I find it very helpful to have at least a small understanding of how my dependencies work:

1. I find it useful for debugging. If a coworker or I have used a library incorrectly, understanding how the library works can help us figure out why it isn't doing what we wanted. Also, my IDE lets me do step-through debugging through dependency code as well, which can be helpful.

2. It helps with determining the capabilities of a library. If I know how a library works, then I can make reasonable guesses about what features the library has (or could easily add). I find myself thinking, "I'd like to do X, and from what I know of how library foo works, that seems like something it should be able to do." So I can go investigate that.

3. It helps with knowing the performance characteristics of the library. Just yesterday, I needed to implement some stats logic in java. I found an Apache Commons class that does exactly what I need, but by looking at the source code, I found that it uses an O(n) algorithm, which is fine for the cases they are probably targeting. In this case, I need more performance than that, and it's not hard to write this as O(log n). So I wrote my own instead. If my IDE (Intellij) didn't give me super easy access to dependency source code, this would have been more painful.


#2 is sort of a mixed bag, though, reading the source is great for finding out what it can do, but it also opens up to depending on implementation details rather than the library's specified behavior.


Have you ever used a library and thought, "ugh, it should be written THIS way."

When you review a dependency, you answer why it should not be written THIS way. (Or why it's junk and shouldn't be used.)


It has been immensely beneficial. Reading the underlining framework probably made me write less code than i would have otherwise(plus the insights). The best example in my case was to extend the classes of Django REST Framework.


>Has this ever been beneficial?

If you want to write secure code it's necessary. Otherwise you just end up with PHP web apps that have remotely exploitable memory corruption bugs because the developer didn't understand that he absolutely has to sanitize all user inputs to certain functions.

>If the code really is horrendous nobody is going to be using it

This just makes it pretty clear that you haven't ever looked.


Maybe if you're working for something that requires very high security but it's completely unrealistic that you can security audit all your dependencies. I'm pretty confident Angular, React, Vue etc. don't have any glaringly obvious security problems that all the other contributors missed for example.

Do you check the source code for Apache, MySQL, PHP etc.?

If I'm picking a library for a component where its flaws would be high impact (e.g. security) and there isn't a strong community behind that library I'm going to be very cautious though.


I read my dependency code partially as a way to learn the language. Several times in the past year I've been assigned to a project writing a language I don't know, and the major dependencies have often been a great example of well written, idiomatic code in that language.


You depend on so much code, how is this even remotely feasible?


This is expected in some fields such as embedded programming, it's reasonable to look at source code just to figure out how hardware works. Or web front-end where you shouldn't be sending huge payloads to the browser anyways. You don't always need to look at the lowest level, just direct dependencies.


I can really recommend this. I've made a habit of just casually browsing through the source of dependencies and just opening up anything that sounds interesting.


Crappy documentation for dynamically typed code has often made me read my dependencies.

Statically typed languages are much more likely to be self documenting, though occasionally, with Golang for example, they will return a structure with the same name as another structure in the program, and it's hard to figure out which of these is being returned.


Great advice, especially if the improvements are made to the open source package(s) so that everyone can benefit. Proprietary forks/upgrades to open source packages has a much narrower impact.


Just a brief study (~few hours) of mysql and postgresql sources allowed me to made a definite decision which of the two is a better choice for all my web projects.


Seriously?

I am assuming you are not a database expert but basing this on the look and organization of the code and not the data structures and design. If that is not true than please share your findings instead of teasing us.

I'm curious would you do the same with your editor (or IDE)? You should see the source code for those behemoths (with the exception of the rewrites like nvim) they are pretty nasty.

Are you going to pick a different editor just because Vim or Emacs (or whatever editor) has nasty code?

Guess what IDE has pretty clean code and is extremely modularized... Eclipse. I can tell you Eclipse is not a comfortable editor (and I use it all the time because I'm used to it not because the code is elegant).

Mature projects often have pretty ugly code but they are mature and stable. Ugly code also doesn't necessarily mean poor design (e.g. if your editor is single threaded and has nice looking code its still single threaded).


There's a difference of category between your and parents example. Editors and IDEs will not make your code crash when it is deployed no matter how flaky or not they are. Databases on the other hand can.

I wouldn't imagine it being controversial to prefer well engineered code over less so for bits you will need to depend on.


Yes but databases are not your normal application, library or tool. They require a fair degree of expert knowledge and I agree even more are a critical component.

I would much rather choose a battle tested database that has the features I want, performs well to my testing and profiling than "familiar" looking code. Because good looking code is really more often or not familiar looking code.

So unless the OP is an expert I believe basing the most critical aspect of your application just because the code isn't to your liking is sort of asinine.


Have you looked at MySQL's source code? Pretty tangled, and that's an excellent database.

Also vim is not flaky. The insides may be a nightmare, but it's very solid.


If there was a new editor with little history I think the quality of its code could help to decide whether it is worth switching and learning (Editor switch is very expensive for a programmer).

Mysql and PostgreSQL offer similar functionality, people use both systems for similar tasks. When you try to decide between the two, you find conflicting arguments all over the web. I think in such case looking at the quality of code can be a key signal to help you make a good decision.

I do agree that projects with ugly code can still be extremely useful and in many case an optimal choice, but such project will more likely become obsolete quicker and will cause problems more often than project with high quality code.


> Mysql and PostgreSQL offer similar functionality, people use both systems for similar tasks. When you try to decide between the two, you find conflicting arguments all over the web. I think in such case looking at the quality of code can be a key signal to help you make a good decision.

It would be pretty low on my check list. The number one for me would be profiling and testing the databases. IMO you should do that first. Even the ability to extend the database with plugins would be higher on my list than code quality (this is for a database). Probably 10 other things would be higher on my list than code quality (again for databases... for libraries thats a different story).


I'd disagree with your analogy. If your text editor crashes, it will cause a few minutes of lost work. If your database engine mangles your data, your site will be down for however long it takes you to restore from backups.

Looking over code would give you more than just an impression of its style; it would also give you a feel for the quality of the product. If most of the code you look at is of poor quality, it is very unlikely that the product, at a high level, will be any better.


> I'd disagree with your analogy. If your text editor crashes, it will cause a few minutes of lost work. If your database engine mangles your data, your site will be down for however long it takes you to restore from backups

So you are going to base the crashing of a database on the cursory non expert runtime of your eyes. Why not just test the database, read the doc, ask other experts.

Secondly rarely do most developers extend their database so their is very little need to know that codebase and they probably should not change that codebase (given the criticalness).

You know what developers extend often... their editors and IDEs. The code quality (in terms of readability) of that should be much higher.

Now obviously if you have nothing but the code to base (like no doc, no benchmarking, no existing mind share, etc...) and are generally knowledgeable in the area than yes code quality would be a good decision criteria but I remind you the OP said it was the major reason between Postgres and MySQL which are probably written by superior experts than the OP.


>So you are going to base the crashing of a database on the cursory non expert runtime of your eyes.

You look over the source code to decide whether the database is worth proceeding with. I would assume the OP did other research as well, even if code quality was the deciding factor.

And again, I don't find the text editor analogy a compelling argument.


Perhaps Operating System would be a better example.

It is hard to map an analogy 100% (after all it is an analogy).

> I would assume the OP did other research as well, even if code quality was the deciding factor.

I wouldn't assume anything about the OP particularly given how poor the original comment was (including not telling what his/her findings were).


If I view the source code to an operating system, and it looks like it was written by an amateur, I'm probably going to try to find a better option.

I would say his comment implied he went with Postgres. I generally think of MySQL, which, if memory serves, was started by people with little database knowledge, as not being as high quality as Postgres, which came directly out of another relational database project.


Do tell ")


Please elaborate. Did you write a blog post or something like that to document the experience?


For those who understand networking: the Mirage TCP/IP networking stack [1] in pure OCaml is a must. It's an object of extreme beauty, and possibly the most eloquent argument for types, types inference and algebraic data types I can think of. The TCP state machine is mostly specified at type level [2], preventing numerous potential bugs in one fell swoop.

Reading this code is probably most enlightening if you have already written networking protocols.

NB: this has nothing to do with OCaml, other comparable languages with ADTs (Scala, Rust, Haskell, F#) would be similarly suitable.

[1] https://github.com/mirage/mirage-tcpip

[2] https://github.com/mirage/mirage-tcpip/blob/master/lib/tcp/s...


Only looked at link [2], but wow. That state machine is brilliant.

Don't know OCaml but am learning Rust and I see what you mean about the universality of how types make this possible.

Thanks for pointing this out.


Thank you. I haven't written networking protocols, yet. But hopefully I can still benefit from these.


Thanks!

I've been looking for something beyond basic tutorial apps to get a real feel for OCaml.


Check out 500 Lines or less https://github.com/aosabook/500lines

I just can't recommend it enough. All the projects are open source, so you can review the source code and still be walked through the code by book. You'd learn why the programmers made certain trade offs and how the applications became better of for it


500 lines or fewer ;)


500 lines or fewer lines; but 500 lines or less code.


Yes! I often find myself making this point -- i.e. that the supposed problem indicates that the reader has made an incorrect assumption about an imaginary elision. Cf. "5 items or less [stuff]", "$5 or less [money]".


...500 code or less?


500 or fewer lines :)


Is it finished? I see lots of folders without a README, which leads me to think that it was abandoned.


Here is a direct link for those like me who are wondering where can they actually read the book: http://aosabook.org/en/index.html


This doesn't entirely match the criteria in your question but I think dipping into the standard library for your language (if it has one) is a good idea. It's not only useful to see what you're using but it can also show you how language experts use the language (which can be useful stylistically) and be an archaelogical exercise in the history of the language and its priorities (particularly true with Ruby's stdlib, I found).


Agree. I sometimes read Python standard lib for fun.


I'm not a C person, but I've often heard that the Quake source code is good, practical-not-necessarily-elegant code that's worth emulating.

https://github.com/id-Software/Quake


It's impressive how the software renderer of Quake II manages to achieve the results as they have them with just a 256 color palette and zero repainting of pixels (i.e. every pixel is correctly painted from the beginning).

http://fabiensanglard.net/quake2/quake2_software_renderer.ph...


Wasn't Quake the one where they did a really good approximation of he inverse square root because they needed it?



The Quake source was a huge source of learning for me when I was getting started. Hacking OpenGL, making a red/blue stereoscopic mod, and discovering pools of particles quickly brought to life and rendered via a linked-list pool.. great fun and so valuable to stumble upon.


Yes Quake II which I expect to be similar is a major influence on anything I do.


This. Changed my coder life forever...



> We will not engage in (list of bad things) via cybertechnology or otherwise.

cybertechnology?


COMPUTERMAGIC


I am learning lisp and was recommended to read "Paradigms of AI Programming" as the examples are all given in lisp.

The later book is "Artificial Intelligence A Modern Approach" which is written with pseudocode examples. The site includes other languages than lisp; python, java, js, scala, and c#.

PAIP lisp code: http://www.norvig.com/paip/README.html

AIAMA code: https://github.com/aimacode


Following Handmade Hero is always a good idea.

http://handmadehero.org/


Wow, this is impressive. The dedication required is immense, over 380 episodes recorded already.


Yeah, there's a lot of them. I think the best parts so far (I'm at 300ish) was the dll hot reloading, the software pixel renderer, with SSE instructions, and some of the stuff around how the debug machinery works.


Fabien Sanglard wrote great reviews of various codebases, including git, Doom 3 and others.

http://fabiensanglard.net/


The Architecture of Open Source applications: http://aosabook.org/en/index.html

These are very detailed, very well written articles from well-acknowledged developers about their OSS project.


Hashlife and anything else implemented by Norvig, he's one of the best programmers that I've had the pleasure of reading code from.

Plenty of tricks to be learned there, as well as fantastic structure.


Agreed, his code shows craft at every level.

Is a hashlife by him on the web? The first page of google results only turned up an older comment by you.


It used to be, let me see if I can dig it up.

Grr, nope :( That sucks.

Ok, so try this instead:

http://norvig.com/sudoku.html

Not quite as satisfying but it gives you the general idea, his whole approach is elegant and super direct.


Oh well, thanks! I'll read the Sudoku solver after I get around to writing my own -- it helps to keep you from just nodding along.


I sent an email to see if I'm either mis-remembering or it got taken down.

Ok, received answer from Peter Norvig, I must have mis-remembered who wrote what I read, but in the interest of completeness here is a life implementation by Peter Norvig (but not hashlife):

https://github.com/norvig/pytudes/blob/master/Life.ipynb


Nice. I saw one working like that in Clojure and rewrote it in my own Lisp: https://github.com/darius/squeam/blob/master/eg/bag-life.scm (the bag is like Python's Counter).


All of the norvig Jupyter notebooks: http://norvig.com/ipython/README.html


Really, no one has said Linux yet? This is very surprising, given how often it is put on a pedestal for its excellent design...

Realistically this is because it is in fact a big messy pile of 'at least it works'.

Still, it is worth studying as an example of a work that was architected to support open source contribution from thousands of developers.


So which parts did you study and actually understand?

I have contributed to device drivers a few times, for example. And I wouldn't really recommend this part of Linux for learning. Maybe they work but code readability is often neglected and it's not unusual to see whole functions without a single line of comment or files/modules without even a short explanation of what they are trying to achieve.

Maybe the core is better.


I liked reading the kernel code when I was an undergrad. At the very least it taught proper coding habits. (mainly to be consistent across my work. e.g. don't name a variable n_object in function and then nameObject in another) The book that helped understand the code was pretty good as well.


You should read the code of a open source project that you often use. You might not have the "ah ha" moment when reading something unfamiliar.


ffmpeg source code :-) its beautiful C code and you will learn how to build a maintainable, modular system with just the tools that C gives you.


Doesn't ffmpeg have a reputation for bad code merge practices? I've heard libav[1], a fork of ffmpeg has much better focus on code quality. Note that ffmpeg frequently merges any new code from libav. So ffmpeg is almost a superset of the both.

For anyone wondering why this is, here is a good explanation.[2]

[1]:https://www.libav.org/ [2]:http://blog.pkh.me/p/13-the-ffmpeg-libav-situation.html


A metacircular interpreter for Scheme. You can even watch a lecture where this is presented (https://ocw.mit.edu/courses/electrical-engineering-and-compu... see also part 2).


Thanks! By the way, there's a trailing semi-colon on the link that needs to be removed for it to work.



That's actually just a bug in the autoformatter; the url was simply embedded in a sentence with a semicolon in it.


Fang.[1]

Fang is a utility program for UNIVAC 1108 computers, written in 1972. UNIVAC's EXEC 8 had threads and async I/O for user programs, decades before UNIX. The machines were shared-memory multiprocessors. FANG uses those capabilities to parallelize copying jobs. The UNIVAC mainframes had plenty of I/O parallelism and many I/O devices, so this was a significant performance win.

See especially "schprocs". Those are the classic primitives from Dijkstra: P, V, and bounded buffers. That technology predates Go by 40 years. Here's Dijkstra's P function:

    .
    .
    .         DIJKSTRA P FUNCTION
    .
    .
    .         LA,U      A0,<QUEUE>
    .         LMJ       X11,P
    .         <RETURN>                      X5 DESTROYED
    .
    P*        TS        QHEAD,A0            LOCK THE QUEUE
              LX        X5,QN,A0            LOAD QUEUE COUNT (note: load)
              ANX,U     X5,1                BACK UP THE COUNT (note: Add Negative, i.e. subtract)
              SX        X5,QN,A0            REPLACE THE COUNT IN THE QUEUE (note: store)
              TN        X5                  DO WE NEED TO DEACTIVATE HIM ? (note: Test Negative)
              J         PDONE               NO.  SKIP DEACTIVATION (note: Jump, i.e. branch)
              ON        TSQ=0               (note: this is an assembly-time ifdef)
              LX        X5,QHL,A0           LOAD BACK LINK OF QUEUE
              SX        X5,QHL,X4           PUT INTO BACK LINK OF ACTIVITY
              SX        X4,QFL,X5           CHAIN ACTIVITY TO LAST ACTIVITY
              SA        A0,QFL,X4           CHAIN HEAD TO NEW ACTIVITY
              SX        X4,QHL,A0           MAKE THE NEW ACTIVITY LAST ON QUEUE
              CTS       QHEAD,A0            RELEASE PROTECTION ON QUEUE HEAD
    SCHDACT*  DACT$     .                   DEACTIVATE PROCESS (note: system call)
              OFF
              ON        TSQ                 (note: for later version of OS with alt wait fn)
              C$TSQ     QHEAD,A0            WAIT FOR C$TSA (note: system call)
              OFF
              J         0,X11               RETURN AFTER ACTIVATION
    .
    PDONE     CTS       QHEAD,A0            UNLOCK THE QUEUE (note: not a system call, just a macro. Stores 0.)
              J         0,X11               RETURN
(Notes:

Instruction format is

   OPERATOR  REG,OFFSET,INDEXREG
The "TS" instruction is "Test and Set". That's atomic. If the flag is already set, an interrupt occurs and the OS does a thread switch. CTS just clears the flag. Later versions of the OS support C$TSQ and C$TSA, where the OS queues waiting test and set operations.

X4 is the "switch list", the local data for the thread.)

[1] https://www.fourmilab.ch/documents/univac/fang/


I don't think there's an "every", because not every programmer can read every language and there isn't even a common union of languages that you can universally expect.

In that regard Donald Knuth's work in the fictional assembler MIX for TAOCP is worth reading - it's at one remove from any real system. Knuth's TeX source is also quite unique.


As a Linux user, I find it very useful to set up the source and debug symbol repositories for the distribution that I am running. This way I have a large body of code easily accessible which matches what I am actually running. I usually grep for error/log messages, or examine core dumps or hung/misbehaving programs using gdb.


I don't think there is such a thing. Learning about common algorithms is a good idea but programs aren't novels. People pay a lot of lip service to reading code but practically nobody does it.

http://www.gigamonkeys.com/code-reading/


I disagree in part, I read lots of code.

Typically if I don't know how to solve a problem / use a library, I'll go onto github and search out projects either by their dependencies or structures I know must be present in the code I want. Then I'll pick out 20-30 projects and compare and contrast. Funnily enough, I read most all of the source code to Etsy's 'Artsy' iPhone app because I wanted to get a sense of how a professional shop structured their iPhone code.

As for the understanding the code, the author is very correct in that it's tough to implicitly understand what's happening in the code. But to that point I use Reveal to peek inside iPhone apps, or node-nightly + chrome devtools to walk through js code. Since my goal isn't to understand the whole of a program, rather parts I'm interested in, it's worked out quite well.

I've always wanted to read code for very secure applications (like Signal or SecureDrop), but I fear I don't understand crypto/infosec enough to understand the context the code was written in.


I don't know, perhaps my perspective is warped by starting in Windows development, but I pretty much never do anything like that to evaluate a library.


Wow, this was a great read. I wholeheartedly agree that very few people read source at a deep level, but your link explains why.

We should read source code sometimes (maybe) to deepen an understanding of how someone might think about the problem differently. In the link, Donald Knuth says he enjoyed the challenge of exploring a piece of software that had no documentation in order to figure it out.

In light of this, my answer to "what code should people read" is that you should read (really, decode) the source code of libraries competing to solve the same problem. This would lead to the greatest yield in terms of uncovering core shared concepts and core unique concepts.

You could compare Vue and React to understand how two different parties thought about componentizing the front-end. Or you could compare Django and Flask to see how a batteries-included MVC framework compares to a lighter alternative.


(Almost) any project that is in maintenance mode requires a lot more reading of code than writing.


Reading existing code with a view to modifying it isn't really what people mean when they say you should "read code."


it was a response to "nobody actually does it".


A half joking response (only half), I would say your own code you wrote a year ago.


Every time I've done that I can never quite figure out if the guy who wrote it is a tremendous genius or a complete fool.


For me that has motivated the biggest improvement in my own coding - writing and maintaining a system for a few years. If I go back to code I have written a few months ago and it doesn't immediately make sense, then i have no one else to blame but myself (I usually refactor it when this happens).


Ghaa my eyes. It burns.


Read something that matters to you, some library or snippet that you use a lot or you depend on it. Context matters a lot in getting motivated to comprehend things. Otherwise you will just browse some random code with little to no connection and lack of understanding how the authors got there.

One source code I look into from time to time is Three.js: https://github.com/mrdoob/three.js/ to discover more details over the documentation.




> Why you're at it:

Not to sound pedantic, but did you mean "While you're at it:"?


Tabs in C source code. Ew!


One of the pieces of code I read through that helped me the most: Beautiful Soup. If you're a Python developer, I wholeheartedly recommend you read through it -- although I do think Leonard has removed some of his more disdainful comments from BS2. When I originally read it, I loved how you could really feel how much he hated all the XML/HTML parsing gotchas. I think it's the only time I've laughed out loud reading through code because of humorous comments and TODO notes to self.


Golang itself. It's soooo readable and contains so much wisdom.


Thanks for bringing this thread of thought to light! Sometimes I look through Stack Overflow not only for solutions to particular problems, but particularly elegant solutions in a proactive manner. I often just browse through profiles of people with tons of karma.


Fast inverse square root as seen in Quake (as mentioned in other comments)

https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overv...


The book Beautiful Code has some examples along these lines with interesting commentary too.


The D3 source code. Since this is made of around 30 modules, one might start with [d3-quadtree](https://github.com/d3/d3-quadtree).


I love this. Writers ask the same question. Teaching literature and composition starts with reading the works of greats who game before, learning what made them great, and incorporating their skills into your own tool box.




Lugaru Gametick. A succesful indie game written by a highschool student.

https://hg.icculus.org/icculus/lugaru/file/97b303e79826/Sour...


Have you seen Overgrowth, it's a spiritual successor by the same guys. They have very interesting development videos on YouTube.

https://www.youtube.com/user/WolfireGames/videos


Read the source code to Emacs. It has everything a growing programmer needs: portability, compilers and interpreters, user interface(s), language design, build systems...

http://www.gnu.org/software/emacs


https://www.amazon.com/Framework-Standard-Annotated-Referenc...

Even if you hate c#/java an amazing book on how to write solid clean code.


Whenever I look at the Tcl source code, I'm impressed by how clean it is: https://github.com/tcltk/tcl


I remember having a few aha moments when reading through the Redis code.


Which version did you read? I would be interested in giving it a look through, but I expect the newer version must have grown in complexity.


Don't exactly remember but I think it was right before/around 2.X.


smallpt: Global Illumination in 99 lines of C++: http://www.kevinbeason.com/smallpt/


I am surprised no one said Linus Torvald's double pointer problem


sox is kinda nice. Read a lot of it ages ago but I would like to revisit it know with my marginally improved math skills.


find popular github repositories for your tech and explore them for fun !!




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: