You'll learn a whole lot and be more effective if you let your curiosity expand beyond code you/your team has written.
I've done this many times. The problem is that it takes a lot of time, and there's no way you can dig through more than a fraction of a large codebase. You gotta pick your battles.
1. Spend hours digging through layers and layers of crufty JS.
2. Find the issue (and the package responsible for the issue).
3. Find out there is a Github Issue for the issue.
4. Find out there is a pull request for the issue.
5. Find out the pull request is sitting in position #35 of #128 pull requests going back to 2015.
6. Go binge drinking.
8. (optional) Thank the creators and maintainers for the time and effort they put into it.
After coding for almost 20 years now, full time, I have never regretted not using a dependency. But I have regretted using one multiple times.
(the regret is like a walk of shame during the "separation and cleanup" phase at the end, which makes me question the "got work done faster" phase at the beginning....)
If you've ever switched out dependency X for dependency Y I suppose you regretted not using Y to begin with.
If you've ever stopped working on your own solution to a problem and instead used a solution provided in a library didn't you regret not just using the library from the beginning?
nope, I am just saying what I said, I never regretted not using one, and have regretted investing in a few that later caused more headaches than they were worth.
When we switched from inhouse code to a dependency, we never regretted doing it on our own to start with because there was so much insight gained from this and many other side benefits (like direct control, intuitive understanding of how our code worked, etc...).
But when you have a dependency you add that you later have to work around, you simply can't fix in the same way can your own code, and you just hate the mess in a totally different way.
When it's your own code, you can fix anything. And replace it a piece at a time if need be. With a dependency, there's usually catches, hacks or work arounds, or conflicts built up over years that finally have come to a head. And it sucks to fix and in many cases if I had waited even a little while, or did more research at the time, I would have picked a different dependency or none at all.
There are a lot of dependencies we use, I don't think you can run a business properly these days without them. They just are not a part of our core systems anymore. We use them for tertiary systems and addons, things that a replaceable. Then our core systems can't be hijacked by stuff like the latest NPM debacle.
There are three reasons for this:
1) Those packages implement an ever-growing pile of tricks/heuristics to convincingly solve problems from the domain. They include: physics engines, SMT solvers, compilers/jitters/optimizers, computational geometry packages, video encoders ...
2) Those packages implement an unification layer over an ever-growing/evolving set of underlying APIs/protocols/formats. They include: SDL, ncurses, curl, ImageMagick ...
(these are not to be confused with "bug-factories" packages, which might not solve a complex problem, but still require constant updating to fix the current bugs, and benefit from the new ones).
> 1) [...]
> 2) [...]
Where is the third reason?
I can't speak for him and I don't know about you, but while deciding whether I would start to work on something, I'm actually deciding if I'm capable of doing it and looking at all the pieces needed to do it. I'm not starting work until I know I have all the pieces in place. Sometimes I may doubt that some piece is missing or might be hard to do, so I make a very small prototype of that system only (this usually takes a couple of hours tops). Once I make a decision and start working, I never go back.
Just like him, I have never regretted doing it myself, but have regretted using another developer's library many times.
That is a similar tactic I use now for anything core. I know I commented that we don't have any dependencies in our core systems, but as an afterthought, this isn't entirely accurate. We do, but as you said they are "legendary", and don't change a whole lot anymore.
I think you are on the right track. :)
Edit: Keep business practicality in mind, if using a 3rd party library means you feed your family or are productive for the business, you need to weigh these considerations carefully.
It's likely you need to use a dependency to get your work done, and then later when you are more profitable, you can clean it out if need be. Maybe that is how I should have worded my first comment in retrospect.
Many of these dependancies are open source. Nothing prevents you from copy/pasting the code and then maintaining it yourself. Then you alleviate the "hard to change" problem and you can vet it thoroughly inside your own team.
Personally, I would rather see this than a bug fix being committed every week. That makes me question how many more bugs there are and how much I'm going to have to babysit updating the dependency.
Then consider ditching this dependency (either by switching to a different one or with your own code) if the maintainers are not active enough.
It was a dependency of webpack that was broken (you can pick your bride, but you can't pick your bride's family). And the package owner, in a Github Issue said "not my problem, this package should never be used with webpack and should only be used with Browserify." Looking at the date, that Issue discussion occurred in 2015. Which makes sense. But a WONTFIX on a package that obviously is broken (I looked at the code), and is still an issue in 2018 is clearly a problem. The node world is a circular firing squad where no one wants to take responsibility for fixing their shit because each person thinks they are doing the right thing. When, thanks to the mess of npm, no one really knows who the "user" of their package is and you can't just dictate that so-and-so shouldn't be using your package that way. Because someone will use your package that way.
WONTFIX, issues just suddenly "closed" without reason (why, dammit??), or pull request queues 100+ long and obviously unmaintained are way too common for me. I've lost track of the number of times this has occurred.
I think we often treat libraries as voodoo blackboxes, but they're more often thatn not created by normal people with great skills. Sometimes it's a great way to learn a way to code, or even better, to understand the behaviour of a library.
I found myself doing it more often when the IDE itself downloaded and linked the sources. (Lazy, I know).
Same here. It's about trivial inconveniences. I sometimes can be bothered to download and read through the codebase of a dependency, but most of the times I end up in third-party code happen when it's just a single jump-to-definition keypress away - the same keypress (M-.) which I use all the time for my code. That seamless browsing really blurs the difference.
1. I could dive into the code, find the bug, submit the report and hope the developer accepts the patch. Hopefully this bug isn't there because it is required by some other feature the developer actually cares about.
2. That module wasn't doing that much, I can just redo it by hand and dump the dependency.
3. I could hunt around for a different module that does basically the same thing and use it instead. This might be necessary if the module is doing a lot of work, like an ASN.1 parser or something.
So, one more time Scrum seems to be unable to support actual development needs.
I mean... 3 points: dealing with ticketing system. 5 points: looking into bugs/issues which will become new tickets. 5 points: taking questions and talking process on other people's tickets.
The list goes on. I like agile, and I don't hate Scrum, but it's unpleasantly easy to end up running a planning/scheduling process which is completely distinct from one's actual schedule.
We need tickets (hell, I have to write them for myself on personal projects, otherwise I get locked), but we need something else too. I also wonder how we can formalize that something else - time-boxing it is harmful, full attention to it is harmful, having to explain it too much is harmful.
Tickets are great; they bound tasks, provide a clearinghouse for all relevant information on an issue, and help monitor and prioritize needed work.
The instinct to turn the "what work exists at what priority?" tool into a "let's organize our work" tool is totally understandable given all that, but I think it's a serious mistake. Tickets define tasks, but their times are highly variable (timeboxing is ugly, storypointed "investigation" tickets are uglier) and they exclude a whole bunch of non-ticket work: communication, bureaucracy, unstructured inspection, and so on.
If you make tickets the basic unit of scheduling, you're suddenly building "buffer" tickets and padding ticket durations to account for "something always comes up". It's Goodhart's law in action; your metric - storypoints - becomes a target and stops being an accurate metric.
If your methodology makes people avoid doing important work, it's a bad methodology.
It's not perfect, but it captures the idea of uncertainty by having a time-boxed investigation, the end result of which is an estimate of how long it will take to fix.
But, before you you can pick battles you need to treat such things as possibilities, something you've done before an feel comfortable doing.
And a really targeted bug report can often get a bug fixed pretty quickly. Several times I’ve been working at a place where the goddamned IP lawyers were so power mad that I could not file a patch for a bug even though I knew how to fix it, but filing a bug with an exact line number and description of what’s wrong with it can get it fixed anyway.
Filing a razor sharp bug report seems to be a rare skill, but if you can learn it your quality of life is better.
knowing and deciding you can't reasonably fix it is a much better position to be in than shrugging and hoping it gets fixed in a later release, trying lots of versions or trying to refactor around it.
There’s a similar reason why I have such mixed feelings about some build tools and compilers, especially in the JS world - the output is often hideous and so unlike what a human might write. I know you can say that what the output looks like doesn’t really matter, as long as it works as it should. But it just makes it so much harder to feel confident that it actually does.
Then again, I'm primarily a C++ developer, so I'm used to the output of my build tools being non-trivial to understand.
Maybe confirmation bias, but I can’t think of a single library where this stayed true for the entire life of a project. Software is written by humans.
“To err is human. To really fuck up requires the aid of a computer.”
Every library demands a fraction of our attention greater than its relative fraction of the code. Complex calling conventions, deep call stacks.
Why am I even using a single purpose library with multiple levels of abstraction in it? At this point a library should BE the abstraction, not contain them. I have to trace code through my code into yours into your dependencies. Just stop already. Keep It Simple Stupid.
The other day I was debugging an app that - no joke - pulled in 600M of libraries that were all so it could use one file parsing function that could have been re-implemented in <100 lines. Crazy!
The lack of digging is a frustration I have with a lot of new guys who join the team. For example, they might create something new and tack on some new, untested piece of code without reading whether something already exists within the same codebase. Even in a startup environment where documentation is scarce, it's worthwhile to ask the question of, "Did someone else here think of this problem here before me? What did they do?"
The second problem is that people will keep stuff hidden. Have that sweet script that creates 1000 users for testing? You might share it with the team, you might also think its too trivial to bother sharing.
That being said, I can see how it can be frustrating. We have packages upon packages of existing functionality, languishing somewhere in source control, used maybe only once. In the meantime, someone is going through the cycle, not only of ignoring what someone put on gitub, but what has already been tested and qa'd by their own company.
It helps to have a company culture that encourages openness and asking questions. My current company goes so far as to have a Slack channel, #askanything, where folks can ask anything. We'd much rather you ask early in the development process than pursue a path that will lead to duplication.
What sets many successful people apart is not any one thing, but that they were successful. Many are willing to dive into other people's code in methods that are far deeper than I can comprehend, and yet they have failed at whatever task they were going about.
So, yes. By and large, you shouldn't stop at your dependency boundaries. However, your job is to get results. For most of us, even if the dependency is bad, you are better off changing your code. Why? Because you can more quickly get a change in your codebase than you can in the dependency. And heaven help you if you think it is worth forking.
This is not to say don't update the base place. By all means, please do. But don't wait for that change to land before you do something on your end, too. Even if that means not using your actual fix.
We use git-aggregator for easily applying pending PRs/MRs over the upstream branch, and for the most part, conflicts are rare.
However, in both cases you now have a new line of work. It is common in my experience to find folks are behind in their first job, so adding a new one is dangerous.
That is, yes, it can be done. It is more work, though. Account for it.
With infinite time we could fix a lot of things.
I'm making a few assumptions: the code you're depending on was written by people like you (e.g. other open source volunteers with expertise in the same language as you); the limitation you're encountering is problematic for your users, product, or business; and that you'd rather have it work correctly than design around it (e.g. remove the affected feature). If those are all true, yeah, it's probably worth a week or two to figure out what's wrong and fix it (especially since you'll gain comfort, familiarity, and efficiency in that codebase over time - it might even inspire other enhancements).
I realize that nobody has infinite time and I trust you to make reasonable decisions about what to spend where. Still, I implore you to not let yourselves be afraid of Other People's Code. It's probably not as scary as you think it is; after all, it was written by other people like you.
Most interesting code has been written by people smarter than me (the kernel, compositing window manager, network stack, web rendering engine, virtual machine, etc.), and while reading that code probable isn't intractable, it would take me decades of effort to understand. At the same time, I think I can be an effective developer by building on top of those technologies (i.e., being a great plumber). And that's OK.
Ha! I'm convinced that people who say stuff like this haven't worked where their paycheck comes from actual paying customers, as opposed to VC dollars or subsidies from other parts of the business's revenue.
> Still, I implore you to not let yourselves be afraid of Other People's Code.
Wtf? Nobody said they were and what is with the Condescending Proper Noun?
>It's probably not as scary as you think it is
Nobody said it was.
Diving into codebases is a skill that gets stronger with use, such that you can eventually do it radically faster. That makes a much larger set of problems economically practical to fix.
This is a great way of thinking about it. I wish that more people took it to heart when choosing third-party dependencies to adopt, especially in the Node.js ecosystem where people are tragically cavalier about adding literally hundreds of under-scrutinized transitive dependencies to their projects.
That's kinda illusion. If you start digging into some really complex code, like distributed processing core, meta-programming framework, low-level optimizations, it can take you years to understand all that is going on. Often it's more economical just to rewrite it, even if the previous system was far better than what you'll produce. Whether companies like it or not, projects live with original team members and once they move on, it's getting complicated to maintain or extend them.
* Getting the solution 80% of the way there is often pretty fast (which ends up deceiving everyone depending on it "it's almost done!")
* You are nearly guaranteed to encounter a bunch of bugs that were already solved in the original dependency.
* Due to the previous points, there's a big chance that your initial design had a few fatal flaws, and requires significant refactoring to address (making the project take longer, and if not careful making it much harder to understand & maintain)
Rewriting dependencies can definitely be worth it from a code ownership/maintenance point of view; but more often than not, you'll end up creating a cycle where the new version of the dependency is just as obtuse as the previous—and once you leave, the same problem all over again.
If a dependency is well documented, has a clear architecture, and follows consistent coding practices, it will stand the test of time (including loss of authors/maintainers). Most people/companies aren't willing to put the time/effort into that level of polish.
There's also a trade off to doing it too much as the time it can take to really go deep can effectively paralyze the development.
You need to be able to properly use abstraction layers and be able to treat code as black boxes when needed, you only need sufficient knowledge to work with a system. Of course we shouldn't stop until we reach that level and too often developers give up. But it's also common, especially for very skilled or perfectionist developers, to go too far.
Perhaps I've been asking to wrong people (at the wrong time) - forgive me - but (as dar as I can tell) git has no easy / reasonable way to repo'tize a (vendor/) dependency. That means you have to bend over backwards (read: make too much extra effort) to patch such a library local and then eventually PR that change.
This feels like a fairly significant flaw in the current OSS "ecosystem." It encourages the wrong behaviour, and discourages the right one.
Or is there something I'm just not understanding?
Of course, this is git, so the UX is a total disaster, and it's very easy to get yourself in a pickle. But Stockholm syndrome will set in after a while and you'll begin to find it tolerable.
If you've not done this from day one, and you've got changes made to the 3rd party code, that's a bit of a pain, but provided you tracked (or can figure out) the version you were working from you can solve this manually. It's a bit of donkey work though.
(You have to decide from the start whether you're going to just put each dependency in the repo, or whether you're going to keep each dependency as its own submodule. I do think git works a lot better if you keep each 3rd party dependency as its own submodule: easier to patch your repo's copy, easy to make PR from your patches, easier to keep your patches merged in on top of later updates, and so on. But you have to plan for this a bit, and I won't deny that the submodule UX is pretty awful.)
To my mind the issue is more with tooling: you don't typically want to vendor your code, most of the time you want to use a published module. And the process for publishing your fork is too complex to be worthwhile.
Also, the work to keep the fork itself up-to-date when you realise it's necessary is minimal, but by switching away from the original you lose tooling support for telling you when you need to make that update.
I'm not expecting frictionless per se. But it just feels like there's more overhead than there needs to be.
Also, it's possible (in rare cases) the vendor folder is not .gitignore'd. Now what? Submodules is out.
Again, given the nature of most modern dev (i.e., dependencies) vit feels underprepared.
Re: "...but by switching away from the original you lose tooling support for telling you when you need to make that update..."
Yes. But doesn't this actually support the idea (mentioned above) not to treat your dependencies like black boxes? That is, no pain no gain? Yeah, perhaps some sort of simple notification your fork needs an update. Just the same, ALL code should, in theory, be treated as your own. You wouldn't blindly merge/commit a colleague's work, so why shouldn't your dependencies be semi thoroughly reviewed as well?
Are we too busy to do the right thing(s)?
a) Make changes in the minified source to understand the problem, then go make changes in the original; or:
git clone dependency.git
yarn link dependency
make changes in dependency
yarn run build
test my app
repeat until fixed
clean up, commit, and PR
I prefer B, but I'll do A first if the build step is annoying or slow.
In my experience, never happens on permissively licensed projects. "Contribute time and code to open source on company time? Are you mad? Fork it and keep that as our competitive advantage!" Nevermind how dumb this approach actually is, it's what management thinks. "GPL? Oh, we have to? Oh, well, I guess if we have to."
Argument like this almost justifies leaky abstractions as something to be praised.
Plus, as you mentioned the opportunity for learning is immense.
I think that the difference is that the totality of the UNIX kernel was comprehensible for a single mind. The UNIX kernel in 1983 was less than 20,000 lines of code, almost all in C, and more than 75% was not machine-dependent. The amazing software of today, something worthy of reading for fun, is literally millions of lines of code, written in multiple languages you don't know, and evolving so fast that whatever you read might be completely different a few months later. There's no joy in knowing a tiny part of that. It would be like reading 10 pages from the middle of a novel, a novel so big that you know you'll never finish it.
Even just yesterday here on HN was a submission that just contained shaders from Wolfenstein.
I read through the Lions book in the late 1990s when the reprint came out, essentially "just for fun" -- it remains probably the only sizeable piece of code I've read for fun.
It looks like the Lions commentary also does that to a certain extent as well. There's even a version of the book online .
Code is just one level of abstraction and one way of seeing things. We should have multiple ways of "reading" a system depending on what level we want to understand it. No, I don't think that rushed, out of date "system documentation" you wrote will do. I still think this is not solved in a satisfying way. We could find inspiration from things that works well, just imagine how google maps lets you look at the world at many different levels, from continents down to street level. Also it provides different views of each level, street maps are useful for tourists, terrain maps for hikers and satellite images for someone who want to study vegetation. How you will "read" a system depends on what you are trying to understand.
Good point. A couple famous quotes that speak to this:
Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious.
-- Fred Brooks, The Mythical Man-Month
Bad programmers worry about the code. Good programmers worry about data structures and their relationships.
-- Linus Torvalds
If I can't use a debugger (i.e. my initial assumption fails), I usually run for the hills...
A google maps zoom system for code sounds awesome, although I have no idea how that might work.
The zooming functionality you're thinking about probably wouldn't be very practical. (Zooming in down to opcode or out to high level framework calls.)
But simply being able to inspect definition inline would do.
As an example: When I want to do something and the interfaces I'm seeing provide friction (XML parsing not /quite/ handling normal, slightly incorrect, HTML) I can get a better idea of what the library is doing behind the scenes. I got an example of how to use other exposed interfaces of the library as a tool kit for my own iteration (which added some state tracking and cleanup of that mess) instead of relying on it to decode everything and dump it back for me.
Other parts of the library are similar, maybe it does 90% of what your use case needs, and you can extend those interfaces with your own library to add the corner cases that are required in your specific use case.
*(even if it can be tedious at times because it's sort of a mid-level language; it abstracts some REALLY bread and butter things that probably can't be lived without, but it doesn't shield you from true complexity pitfalls)
Not just a few clicks. In my editor, I just navigate into /usr/lib/go/src/<package-name> which is where my distribution (Arch) puts the stdlib sources. For example, /usr/lib/go/src/io/pipe.go for the implementation of io.Pipe() which I was looking at the other day.
Its good that most newer / modern languages have learned how valuable both easy documentation and pairing the docs with the code are.
That same user also made https://github.com/antirez/redis/pull/4685
This may be? https://github.com/antirez/redis/pull/4622
This was apparently critical, but also kind of an edgecase: https://github.com/antirez/redis/pull/4568
& one more from someone else: https://github.com/antirez/redis/pull/4568
I always dive in and have a look. The reason I love it is that Elm is quite restrictive, there is a way of doing things and you can't deviate too much. Therefore reading someone else's code is usually pretty easy.
In addition in Elm you can clone and run an Elm package and see the examples, do some "time travelling debugging" all within 30 seconds :-), again because there is one way of building and debugging things. No "Oh no browserify! I normally use webpack" or "WTF all that global npm shit I need to install" moment.
Just once per computer:
npm install elm
git clone ...
I'd probably say Elm is more Zen than Python!
How the heck does this behavior (good or bad, expected or not) come about?
Is this correct behavior assured in every case?
What are all the possible values of these poorly documented configuration parameters or other inputs?
Will this apparently working operation break under concurrency?
Why is this so slow for these inputs, or under these conditions? / Why is this so unexpectedly fast?
Is this doing nasty covert things I don't want, in addition to the overt functionality that I want?
Is this using a particular library or OS feature to do something obvious, or did they roll their own?
How will this scale in time, memory use or whatever for certain large inputs that are too impractical to actually supply just to answer this question?
Possibly more important, though, I’ve always had a fearlessness to dive into a codebase. I’ve built a little thing called Cronitor and recently I was giving a talk on building our first server agent in Go. Long story short, there were a few people surprised that I studied a couple crond implementations to figure out some important details for proper monitoring and I was surprised they were surprised. I don’t write much c or any c++ but as long as you approach it with some fearlessness and a simple plan you can learn from unfamiliar codebases. Spend up to an hour figuring out how a project is structure (hopefully a lot less) and then just skim/grep to find a spot probably close to what you’re looking for. Spread out from there.
The best way to learn a new language is to build something useful and learn what you need along the way. I think studying code can be done the same way.
One of the counter-examples had been the implementation of egalitarian paxos for me: when I had first read the paper I've thought that I didn't understand a lot. By reading throught the code written by the actual author I had realised that the paper by itself described something quite far away from a practically working prototype; So it wasn't my understanding of the paper in the first place that was the problem.
http://www.kohala.com/start/tcpipiv2.html - 15K line implementation of TCP/IP stack from BSD. Fantastic book, reads like a novel.
http://www.pbrt.org/ - fairly detailed write up on building a pretty decent 3D renderer
Setting aside the "for fun", of course I read my co-workers' code, but also I read my dependencies' code quite often. In particular when you have something that is meant to be extended, the docs never completely cover all the use cases, and the quickest way can be to start reading.
Some examples from my own recent work are reading code from the Ruby gems devise, devise_token_auth, and spree, and also reading Django class-based views (all things meant to be extended). One trick I've learned in Ruby is `cd $(bundle show spree_core)`. From there you can grep, find, read, even add your own `puts` or whatever.
Actually I have been reading some code for fun: Postgres! I have an extension that adds temporal foreign keys, and I'm slowing porting it from plpgsql to C. Reading the code for standard foreign keys is very helpful. Same thing for a lot of other extensions I've written in the last few years. It has helped me a ton to read others' extensions and the core code.
I usually write pseudocode first before doing anything now. Either on paper, as code comments, or as a list. Sometimes I find myself writing tests on paper as well, it helps out a lot too. A programming language is just a means to an end, what they all have in common is a set of logic to follow.
Then I debug. I just fork the code, comment things out, add a little bit of functionality, etc. When I have a grasp on it write some TDD
I find the easiest codebase that I refer a lot is the todomvc, or any similar ones that have different implementations (differnet languages) for the same general solution. All the core logic between languages is mostly the same, so its easy to dig through and understand a different programming language / how its generally organized.
One thing I remember learning about from experience polyglot programmers. Its extremely helpful to have an existing codebase you've written (such as todoMVC), and using that same example / porting it over in a different language to learn that languages nuances.
This doesn't just apply to learning programming languages - also learning Chinese.
I've had much more success since writing my own tool to translate the text I'm interested in, rather than being spoon-fed some patronising phrases to recite from a textbook.
It's pointless reading code in Github, especially for a codebase you are not familiar with, you need to get the codebase open in an IDE to be able to dive in quickly and back again to build the map out in your head. I don't think you can read a portion of code from top to bottom without having tooling....unless the piece of code is really short and just one or two files
It will be even better when web tools catch up. Could be even better if they can navigate across repos in different languages even without even checking out a single file.
Taking an open source library at face value can sometimes put you in a precarious situation. For me, it's an exercise of following the errors or trying to break the code. I'm not really sure if that counts as reading. Reading a large code base, in my experience, calls for a pencil and paper. It's a lot more work than passively reading and trying to hold all the information in my head. Reading code isn't the same as reading a novel. Perhaps better guidance on what actively reading code means is in due order?
"Two-and-a-half months later I received an email from someone who not only managed to find the comment, but also managed to guess the code had to be rot13'ed."
"Clean, solidified abstractions are like well-marked, easy-to-follow paths through a forest — very useful if they lead in the direction we need to go, but less useful when we want to blaze arbitrary new paths through the forest."
Using the same analogy, if you want to blaze a new trail through the woods, you need a map of the territory, and a map for this territory takes the form of higher-level abstractions with clear, explicit semantics and minimal but effective, and unambiguous, interfaces.
One of the biggest problems in modifying or reusing existing code is the risk of breaking higher-level consistency, for example by violating an implicit constraint that is necessary for correctness (the biggest Ethereum errors have been of this form, where the implicit assumptions included when and how often initialization is performed, and that a certain library will be present.)
If the clean code movement has been delivering code that works as step-by-step instructions for getting from A to B, but not as a map, it may be because breaking code down into small functions and classes is easier, more visible and more measurable than making coherent abstractions with consistent and minimal interfaces at a higher level.
While I'm not anti-abstraction, I think people often impose poorly chosen high-level abstractions on top of messy lower-level components (akkartik calls this Authoritarian High Modernism in a comment (https://lobste.rs/s/gtxi5y/nobody_s_just_reading_your_code_h...) on Lobsters). This approach leaves readers having to understand a bad abstraction and its internals, a map, its territory, and inconsistencies between the two.
As I understand it (please correct me if I'm wrong), you want people to do the work to find better high-level abstractions. I agree that it would be good if we could do this more often. I just also think we can find other ways to give people maps that work with our existing, messy codebases that don't have good, minimal abstractions already.
The question is what to do about it, especially given that abstraction and the separation of concerns are not suddenly going to fix things after all these years of being more honored in the breach. I do not think that there is a programming style that leads to the writing of understandable code without having a coherent vision of the big picture, because I do not think you can understand the purpose of code in the small without knowing its place in the big picture (except insofar as the author has successfully separated the independent concerns, which brings us back to my original point about the value of abstraction.)
One can still modify such code, but the less you understand about it, the more likely it is that you will make mistakes, and there is also a real possibility that the result will be harder to understand than the original - this vicious circle is one reason why much-modified code is usually difficult to understand.
While I don't hold out much hope for programming style to save us from this situation, I think we might do better with tools to help us understand code. In particular, I sometimes find myself doing program slicing by hand (what ways are there to get from A to B that modify X, for example), and there is some tool support for doing so, though availability is spotty at best.
But you can read/view a map as an illustration to get an overall view of the layout of things and how they are connected.
I read code. It surprises me that people deem themselves an expert in some 3rd party lib without actually understanding the internals.
I'm supposed to read all that every time I need a new library? Or write it myself, when I have absolutely no idea how to build something that complicated, if I had time, negating the whole point of OSS in the first place?
(I proof-read drafts of OP. And I have made prescriptions about reading code. But that's a separate story.)
It's not the worst C codebase I've ever seen (far from it) but for something as critical I'd put the bar very high it terms of code quality and OpenSSL doesn't even come close.
Jointly, an important detail is that the complexity of said source is kept to a sane minimum and general style trends mean that most, if not all, of the source is formatted similarly and legibly. The system is so compact that you can keep most of the system source in your head at one time if you really need to.
Still I'm more likely to find myself browsing wikipedia for math and cs
It's not quite true that none of the programmers interviewed in the book routinely read code for fun. In his blog post, Seibel mentions the exceptions:
> First, when I did my book of interviews with programmers, Coders at Work, I asked pretty much everyone about code reading. And while most of them said it was important and that programmers should do it more, when I asked them about what code they had read recently, very few of them had any great answers. Some of them had done some serious code reading as young hackers but almost no one seemed to have a regular habit of reading code. Knuth, the great synthesizer of computer science, does seem to read a lot of code and Brad Fitzpatrick was able to talk about several pieces of open source code that he had read just for the heck of it. But they were the exceptions.
The Knuth exception is notable: he does routinely read code for fun. In the book he mentions things like
> I’ve got lots of collections of source code. I have compilers, the Digitek compilers from the 1960s were written in a very interesting way. They had their own language and they used identifiers that were 30 characters long but very descriptive, and their compilers ran circles around the competition at the time—this company made the state-of-the-art compilers of 1963 or ’64. And I’ve got Dijkstra’s source code for the THE operating system. […] I collected it because I’m sure it would be interesting to read if I had time. One time I broke my arm—fell off a bike—and I had a month where I couldn’t do anything much, so I read source code that I had heard had some clever ideas in it that hadn’t been documented. I think those were all extremely important experiences for me.
And then a passage quoted in the post about how he reads code, with a Fortran compiler as example (not repeating here it's in Seibel's blog post: http://www.gigamonkeys.com/code-reading/ ), after which the interview (and the book) ends with Knuth saying:
> don’t only read the people who code like you.
(The examples in the book of people who last seriously read code years ago also interesting, e.g. Douglas Crockford mentions some programs he read, and there are multiple pages in the book about Guy L. Steele reading the TeX program.)
Anyway, back to Knuth: I think that, the way he reads and digests code (or even papers: http://blog.computationalcomplexity.org/2011/10/john-mccarth...), it addresses a lot of the points the linked post makes. Even when “just reading” code, or anything for that matter, you are supposed to be doing the things the post mentions in “hacking versus passive reading”: active exploration, critical examination, synthesis.
For example, Knuth read the code of ADVENT (aka Colossal Cave Adventure), and loved the code (not just the game itself) so much that he rewrote it to share with others to read, in his own preferred literate-programming (CWEB) style that matches the way he thinks (http://www.literateprogramming.com/adventure.pdf). This definitely doesn't sound like “passive reading” to me.
Nevertheless, great post.
I sometimes wonder whether a lot of Knuth's greatness comes from doing more of the stuff everyone knows they should do but don't. If you read this interview (https://github.com/kragen/knuth-interview-2006) with Knuth, he talks about how he was nervous he wouldn't be able to learn calculus so just decided to do all the problems instead of just the assigned ones. Unsurprisingly, partly because he's Knuth and we all know Knuth can do math, he ends up really good at calculus:
> But Thomas’s Calculus would have the text, then would have problems, and our teacher would assign, say, the even numbered problems, or something like that. I would also do the odd numbered problems. In the back of Thomas’s book he had supplementary problems, the teacher didn’t assign the supplementary problems; I worked the supplementary problems. I was, you know, I was scared I wouldn’t learn calculus, so I worked hard on it, and it turned out that of course it took me longer to solve all these problems than the kids who were only working on what was assigned, at first. But after a year, I could do all of those problems in the same time as my classmates were doing the assigned problems, and after that I could just coast in mathematics, because I’d learned how to solve problems. So it was good that I was scared, in a way that I, you know, that made me start strong, and then I could coast afterwards, rather than always climbing and being on a lower part of the learning curve.
I love Lisp. It's a great language with great programmers.
One idea was taking a popular open source tool with a good test suite and trying to get the tests to pass, but I think checking code out at a point before a new feature is implemented and reimplementing it could end up being really interesting. Could even make it more interactive by sharing the checkout point so viewers can do an implementation themselves and doing a stream walking through different interesting implementations.
The second part, knowing what the code the function is calling does, is nearly impossible when they are other functions in the program itself.
In swedish there's a phrase called "snöa in", to get snowed in literally, and that's what happens to me.
So after enough years in the biz I try to hold myself back from this habit.
You're oftening wondering who is gonna dig into this code of yours and think about your coding, esp. when you're not following the best practices and TDD.
Code is like Ogers and Ogers like Onions. They have layers of complexity :P