One of the things that sets a good programmer apart is the willingness to fearlessly dig into someone else's code. I've heard it said before that you're responsible for every line of code you ship to your users. It follows that you shouldn't treat your dependencies as black boxes. Getting a stack trace that's 3 or 4 levels deep in Django/React/<insert library here>? Dig under the covers and understand what's happening. Include that information in a bug report, and better yet, open a PR offering to fix it.
You'll learn a whole lot and be more effective if you let your curiosity expand beyond code you/your team has written.
> ... willingness to fearlessly dig into someone else's code.
I've done this many times. The problem is that it takes a lot of time, and there's no way you can dig through more than a fraction of a large codebase. You gotta pick your battles.
Pragmatism wins the day, for sure. I'll happily read someone else's code (e.g., an open-source GitHub repository I rely on) if it's necessary to troubleshoot issues or implement a feature. There needs to be a reason (i.e., motivation) to read code.
On github and the node/JS ecosystem, the way this will play out:
1. Spend hours digging through layers and layers of crufty JS.
2. Find the issue (and the package responsible for the issue).
3. Find out there is a Github Issue for the issue.
4. Find out there is a pull request for the issue.
5. Find out the pull request is sitting in position #35 of #128 pull requests going back to 2015.
6. Go binge drinking.
This is not always an optimal solution. Because by doing this you have essentially became a maintainer of a dependency you initially wanted to 'just' use.
This is sadly why I keep my dependencies to a minimum.
After coding for almost 20 years now, full time, I have never regretted not using a dependency. But I have regretted using one multiple times.
(the regret is like a walk of shame during the "separation and cleanup" phase at the end, which makes me question the "got work done faster" phase at the beginning....)
In 20 years you've never started work on something and after a bit said damn it I think I'll use that library that other guy developed?
If you've ever switched out dependency X for dependency Y I suppose you regretted not using Y to begin with.
If you've ever stopped working on your own solution to a problem and instead used a solution provided in a library didn't you regret not just using the library from the beginning?
>In 20 years you've never started work on something and after a bit said damn it I think I'll use that library that other guy developed?
nope, I am just saying what I said, I never regretted not using one, and have regretted investing in a few that later caused more headaches than they were worth.
When we switched from inhouse code to a dependency, we never regretted doing it on our own to start with because there was so much insight gained from this and many other side benefits (like direct control, intuitive understanding of how our code worked, etc...).
But when you have a dependency you add that you later have to work around, you simply can't fix in the same way can your own code, and you just hate the mess in a totally different way.
When it's your own code, you can fix anything. And replace it a piece at a time if need be. With a dependency, there's usually catches, hacks or work arounds, or conflicts built up over years that finally have come to a head. And it sucks to fix and in many cases if I had waited even a little while, or did more research at the time, I would have picked a different dependency or none at all.
There are a lot of dependencies we use, I don't think you can run a business properly these days without them. They just are not a part of our core systems anymore. We use them for tertiary systems and addons, things that a replaceable. Then our core systems can't be hijacked by stuff like the latest NPM debacle.
Yeah, this also matches my (+10 year) experience as a professionnal dev ... with one exception, though: the "maintainance-can-of-worm" packages.
These are packages which, by design, can never be considered as finished, and periodically need to be updated to stay relevant in your application.
There are three reasons for this:
1) Those packages implement an ever-growing pile of tricks/heuristics to convincingly solve problems from the domain. They include: physics engines, SMT solvers, compilers/jitters/optimizers, computational geometry packages, video encoders ...
2) Those packages implement an unification layer over an ever-growing/evolving set of underlying APIs/protocols/formats. They include: SDL, ncurses, curl, ImageMagick ...
(these are not to be confused with "bug-factories" packages, which might not solve a complex problem, but still require constant updating to fix the current bugs, and benefit from the new ones).
> In 20 years you've never started work on something and after a bit said damn it I think I'll use that library that other guy developed?
I can't speak for him and I don't know about you, but while deciding whether I would start to work on something, I'm actually deciding if I'm capable of doing it and looking at all the pieces needed to do it. I'm not starting work until I know I have all the pieces in place. Sometimes I may doubt that some piece is missing or might be hard to do, so I make a very small prototype of that system only (this usually takes a couple of hours tops). Once I make a decision and start working, I never go back.
Just like him, I have never regretted doing it myself, but have regretted using another developer's library many times.
Sometimes yes, sometimes no. I've abandoned writing something to use a dependency in place that got the job done, but could have a number of improvements.
I have only been coding for 1.5 years and this kind of comment is validating what I worry is a very bad habit. I hate using dependencies unless they have reached a certain level of legend in the community.
>...unless they have reached a certain level of legend
That is a similar tactic I use now for anything core. I know I commented that we don't have any dependencies in our core systems, but as an afterthought, this isn't entirely accurate. We do, but as you said they are "legendary", and don't change a whole lot anymore.
I think you are on the right track. :)
Edit: Keep business practicality in mind, if using a 3rd party library means you feed your family or are productive for the business, you need to weigh these considerations carefully.
It's likely you need to use a dependency to get your work done, and then later when you are more profitable, you can clean it out if need be. Maybe that is how I should have worded my first comment in retrospect.
Some of this conversation isn't as exclusive and the tone it has.
Many of these dependancies are open source. Nothing prevents you from copy/pasting the code and then maintaining it yourself. Then you alleviate the "hard to change" problem and you can vet it thoroughly inside your own team.
This is a normal feeling. But one weakness of this approach for javascript is that it's easier to become popular because you have a much larger base of developers.
That makes sense. I use Python the for most part. I feel like I have seen this happen with old libraries where they used to be very popular and maintained then became abandoned a few years ago.
That can be okay. Depending on the nature of the library it might legitimately be "done."
Personally, I would rather see this than a bug fix being committed every week. That makes me question how many more bugs there are and how much I'm going to have to babysit updating the dependency.
7. Find out that the sole maintainer of the repo is a "well-respected influencer" in the JS community, and actually has a blog post stating that GitHub should remove the "issues" feature and instead only allow contributions via Pull Requests.
And then there's the problem that you can't hardly convince the maintainer that something is an issue unless they have the issue on their OS and configuration.
It was a dependency of webpack that was broken (you can pick your bride, but you can't pick your bride's family). And the package owner, in a Github Issue said "not my problem, this package should never be used with webpack and should only be used with Browserify." Looking at the date, that Issue discussion occurred in 2015. Which makes sense. But a WONTFIX on a package that obviously is broken (I looked at the code), and is still an issue in 2018 is clearly a problem. The node world is a circular firing squad where no one wants to take responsibility for fixing their shit because each person thinks they are doing the right thing. When, thanks to the mess of npm, no one really knows who the "user" of their package is and you can't just dictate that so-and-so shouldn't be using your package that way. Because someone will use your package that way.
WONTFIX, issues just suddenly "closed" without reason (why, dammit??), or pull request queues 100+ long and obviously unmaintained are way too common for me. I've lost track of the number of times this has occurred.
This is the only reason (at least for now) that makes me dig into a library code. And I always got some benefit from it.
I think we often treat libraries as voodoo blackboxes, but they're more often thatn not created by normal people with great skills. Sometimes it's a great way to learn a way to code, or even better, to understand the behaviour of a library.
I found myself doing it more often when the IDE itself downloaded and linked the sources. (Lazy, I know).
> I found myself doing it more often when the IDE itself downloaded and linked the sources. (Lazy, I know).
Same here. It's about trivial inconveniences. I sometimes can be bothered to download and read through the codebase of a dependency, but most of the times I end up in third-party code happen when it's just a single jump-to-definition keypress away - the same keypress (M-.) which I use all the time for my code. That seamless browsing really blurs the difference.
1. I could dive into the code, find the bug, submit the report and hope the developer accepts the patch. Hopefully this bug isn't there because it is required by some other feature the developer actually cares about.
2. That module wasn't doing that much, I can just redo it by hand and dump the dependency.
3. I could hunt around for a different module that does basically the same thing and use it instead. This might be necessary if the module is doing a lot of work, like an ASN.1 parser or something.
The thing is, "digging into someone else's code" is a skill, so you can get better at it, and it takes less time as you get better. It makes sense to invest some time at this skill.
The inability to justify time spent in Scrum doesn't make the time not get spent, it just gets hidden, defeating the whole point. I abhor Scrum, but I would also argue that in this case the development task weight just gets inflated for "unknowns" on the task.Instead of a 5 maybe it's an 8, or whatever.
I sometimes wonder what fully-honest Scrum would look like.
I mean... 3 points: dealing with ticketing system. 5 points: looking into bugs/issues which will become new tickets. 5 points: taking questions and talking process on other people's tickets.
The list goes on. I like agile, and I don't hate Scrum, but it's unpleasantly easy to end up running a planning/scheduling process which is completely distinct from one's actual schedule.
We need tickets (hell, I have to write them for myself on personal projects, otherwise I get locked), but we need something else too. I also wonder how we can formalize that something else - time-boxing it is harmful, full attention to it is harmful, having to explain it too much is harmful.
I have a strong suspicion that the reason points and time-boxing are so bad is that they're trying to weld two distinct processes together.
Tickets are great; they bound tasks, provide a clearinghouse for all relevant information on an issue, and help monitor and prioritize needed work.
The instinct to turn the "what work exists at what priority?" tool into a "let's organize our work" tool is totally understandable given all that, but I think it's a serious mistake. Tickets define tasks, but their times are highly variable (timeboxing is ugly, storypointed "investigation" tickets are uglier) and they exclude a whole bunch of non-ticket work: communication, bureaucracy, unstructured inspection, and so on.
If you make tickets the basic unit of scheduling, you're suddenly building "buffer" tickets and padding ticket durations to account for "something always comes up". It's Goodhart's law in action; your metric - storypoints - becomes a target and stops being an accurate metric.
The inability to justify time makes people avoid doing the work. Some people will strongly avoid it, other people will just weakly avoid it, but nobody will be more inclined to do it than if the time was taken into account.
If your methodology makes people avoid doing important work, it's a bad methodology.
My experience too. Velocity trumps everything else. IMHO, Scrum is like Communism. Okay in theory, but never implemented properly. And it's always the people at the bottom who suffer.
It's not perfect, but it captures the idea of uncertainty by having a time-boxed investigation, the end result of which is an estimate of how long it will take to fix.
Picking your battles may in fact be (one of) the things that'd set one apart. You can get bogged down in the weeds of dependencies. You can also end up with kludgy half-solutions to persistent problems by treating them like black boxes.
But, before you you can pick battles you need to treat such things as possibilities, something you've done before an feel comfortable doing.
But there are often times when you can trick it into doing what it’s supposed to do.
And a really targeted bug report can often get a bug fixed pretty quickly. Several times I’ve been working at a place where the goddamned IP lawyers were so power mad that I could not file a patch for a bug even though I knew how to fix it, but filing a bug with an exact line number and description of what’s wrong with it can get it fixed anyway.
Filing a razor sharp bug report seems to be a rare skill, but if you can learn it your quality of life is better.
10 points for tricking it. often times its not completely broken, but may have been written with different assumptions, be failing on an avoidable edge case, or have an issue with a dependency.
knowing and deciding you can't reasonably fix it is a much better position to be in than shrugging and hoping it gets fixed in a later release, trying lots of versions or trying to refactor around it.
bsimpson still raised a good opportunity to do this: when you get a stack trace. Another idea is whenever you have no idea how something works... Or when the documentation for a certain piece does it no justice, it sure woudln't hurt to try and look under the hood to see what it's doing, sometimes you might even find comments that do a better job than the docs.
A nice middle ground is to step through with the debugger when you're in your own code and see what other code it's calling that isn't yours. You don't/can't read all of it as you said. But perhaps knowing the bits your interacting with helps a lot.
Along those lines are there any tools that are really good for that? I'm looking for a program that can do some things like IDA does for assembly where there's the ability to label blocks/variables and then see where it's used elsewhere etc.
I always get an uneasy feeling when this is not (practically) possible. Whenever I add a dependency to a project I will at least skim through the source to get an idea of what it is I’m adding, and it pleases me a lot when a library is well written and self contained enough that this does not mostly leave me confused or overwhelmed.
There’s a similar reason why I have such mixed feelings about some build tools and compilers, especially in the JS world - the output is often hideous and so unlike what a human might write. I know you can say that what the output looks like doesn’t really matter, as long as it works as it should. But it just makes it so much harder to feel confident that it actually does.
The output of gcc and clang probably don't look much like hand-written machine code either, though. Or, for that matter, the output of V8 and SpiderMonkey. You can't avoid the compilers :)
What frustrates me about the JS world is that all too often, what you get from NPM is the output, not the input. Maven might not be universally loved, but you do at least get to navigate through the original Java source and stand _some_ chance of finding what you're looking for.
I think they're saying that code on NPM may be written with a bunch of experimental Babel features or in some niche language like Elm, and then translated to vanilla JS. In those cases, you might find code that's been machine translated or minified in node_modules, not the original source.
That's one thing that bothers me about modern Electron-style apps which depend on an entire web rendering engine -- actually digging through all the dependencies to track down a bug can be almost impossible. For example, I was hitting a bug in an Electron app a while back and wanted to track it down. I got the app building with a locally-built version of Electron, but it became clear the real bug involved Chromium's loader code. My poor 4-year-old laptop ran out of disk space before I could even download the whole libchromiumcontent repository, let alone build it!
I don’t think Library designers have adapted yet to a library heavy world. We still write them like we are using dozens but we have hundreds.
Every library demands a fraction of our attention greater than its relative fraction of the code. Complex calling conventions, deep call stacks.
Why am I even using a single purpose library with multiple levels of abstraction in it? At this point a library should BE the abstraction, not contain them. I have to trace code through my code into yours into your dependencies. Just stop already. Keep It Simple Stupid.
I don’t think Library designers have adapted yet to a library heavy world. We still write them like we are using dozens but we have hundreds.
The other day I was debugging an app that - no joke - pulled in 600M of libraries that were all so it could use one file parsing function that could have been re-implemented in <100 lines. Crazy!
Take all my upvotes and offer a newsletter for subscription please.
The lack of digging is a frustration I have with a lot of new guys who join the team. For example, they might create something new and tack on some new, untested piece of code without reading whether something already exists within the same codebase. Even in a startup environment where documentation is scarce, it's worthwhile to ask the question of, "Did someone else here think of this problem here before me? What did they do?"
It comes down to unknown unknowns. Should I poll my supervisor/co-worker about every problem to see if there is already a solution? As the new guy, I have no idea who or what I should poll people about.
The second problem is that people will keep stuff hidden. Have that sweet script that creates 1000 users for testing? You might share it with the team, you might also think its too trivial to bother sharing.
That being said, I can see how it can be frustrating. We have packages upon packages of existing functionality, languishing somewhere in source control, used maybe only once. In the meantime, someone is going through the cycle, not only of ignoring what someone put on gitub, but what has already been tested and qa'd by their own company.
> Should I poll my supervisor/co-worker about every problem to see if there is already a solution? As the new guy, I have no idea who or what I should poll people about.
It helps to have a company culture that encourages openness and asking questions. My current company goes so far as to have a Slack channel, #askanything, where folks can ask anything. We'd much rather you ask early in the development process than pursue a path that will lead to duplication.
Don't let survivorship and confirmation bias cloud your judgment.
What sets many successful people apart is not any one thing, but that they were successful. Many are willing to dive into other people's code in methods that are far deeper than I can comprehend, and yet they have failed at whatever task they were going about.
So, yes. By and large, you shouldn't stop at your dependency boundaries. However, your job is to get results. For most of us, even if the dependency is bad, you are better off changing your code. Why? Because you can more quickly get a change in your codebase than you can in the dependency. And heaven help you if you think it is worth forking.
This is not to say don't update the base place. By all means, please do. But don't wait for that change to land before you do something on your end, too. Even if that means not using your actual fix.
Forking is not that bad, unless upstream has a lot of activity yet takes a long time to accept patches. Many projects are mostly dormant, so keeping a patchset and occasionally rebasing is easy enough.
We use git-aggregator[1] for easily applying pending PRs/MRs over the upstream branch, and for the most part, conflicts are rare.
The thing I find with python/ruby ecosystems - is that the versions are generally pinned, so if you do fork a change for your specific need; you'll end up using it for the project. If the change isn't merged - then at least you may use it yourself.
These are both valid choices. I had not meant it to be that it is not possible.
However, in both cases you now have a new line of work. It is common in my experience to find folks are behind in their first job, so adding a new one is dangerous.
Of course you can expand this to be unreasonable - you could also make a strawman argument about how you need to understand all the compilers, OS APIs, and chip architectures that support your product. Obviously, that's not what I'm saying.
I'm making a few assumptions: the code you're depending on was written by people like you (e.g. other open source volunteers with expertise in the same language as you); the limitation you're encountering is problematic for your users, product, or business; and that you'd rather have it work correctly than design around it (e.g. remove the affected feature). If those are all true, yeah, it's probably worth a week or two to figure out what's wrong and fix it (especially since you'll gain comfort, familiarity, and efficiency in that codebase over time - it might even inspire other enhancements).
I realize that nobody has infinite time and I trust you to make reasonable decisions about what to spend where. Still, I implore you to not let yourselves be afraid of Other People's Code. It's probably not as scary as you think it is; after all, it was written by other people like you.
> It's probably not as scary as you think it is; after all, it was written by other people like you.
Most interesting code has been written by people smarter than me (the kernel, compositing window manager, network stack, web rendering engine, virtual machine, etc.), and while reading that code probable isn't intractable, it would take me decades of effort to understand. At the same time, I think I can be an effective developer by building on top of those technologies (i.e., being a great plumber). And that's OK.
>it's probably worth a week or two to figure out what's wrong and fix it
Ha! I'm convinced that people who say stuff like this haven't worked where their paycheck comes from actual paying customers, as opposed to VC dollars or subsidies from other parts of the business's revenue.
> Still, I implore you to not let yourselves be afraid of Other People's Code.
Wtf? Nobody said they were and what is with the Condescending Proper Noun?
Sure, but I think many programmers underestimate the investment value of that time.
Diving into codebases is a skill that gets stronger with use, such that you can eventually do it radically faster. That makes a much larger set of problems economically practical to fix.
> I've heard it said before that you're responsible for every line of code you ship to your users.
This is a great way of thinking about it. I wish that more people took it to heart when choosing third-party dependencies to adopt, especially in the Node.js ecosystem where people are tragically cavalier about adding literally hundreds of under-scrutinized transitive dependencies to their projects.
> willingness to fearlessly dig into someone else's code.
That's kinda illusion. If you start digging into some really complex code, like distributed processing core, meta-programming framework, low-level optimizations, it can take you years to understand all that is going on. Often it's more economical just to rewrite it, even if the previous system was far better than what you'll produce. Whether companies like it or not, projects live with original team members and once they move on, it's getting complicated to maintain or extend them.
The inherent danger in rewriting dependencies always seems to be greatly underestimated, though. Especially when this is your first time tackling the problem space in earnest.
* Getting the solution 80% of the way there is often pretty fast (which ends up deceiving everyone depending on it "it's almost done!")
* You are nearly guaranteed to encounter a bunch of bugs that were already solved in the original dependency.
* Due to the previous points, there's a big chance that your initial design had a few fatal flaws, and requires significant refactoring to address (making the project take longer, and if not careful making it much harder to understand & maintain)
Rewriting dependencies can definitely be worth it from a code ownership/maintenance point of view; but more often than not, you'll end up creating a cycle where the new version of the dependency is just as obtuse as the previous—and once you leave, the same problem all over again.
If a dependency is well documented, has a clear architecture, and follows consistent coding practices, it will stand the test of time (including loss of authors/maintainers). Most people/companies aren't willing to put the time/effort into that level of polish.
There's also a trade off to doing it too much as the time it can take to really go deep can effectively paralyze the development.
You need to be able to properly use abstraction layers and be able to treat code as black boxes when needed, you only need sufficient knowledge to work with a system. Of course we shouldn't stop until we reach that level and too often developers give up. But it's also common, especially for very skilled or perfectionist developers, to go too far.
I've only found time to do this with CherryPy really, mostly cause every time someone asks for help they say if something is missing in the documentation you could always read the source which is also commented and readable. It really is approachable and not too big. I never have time to go through the source of every single library but you raise a good time to do so: stack traces. Although when doing front-end JS code it's a bit less approachable if your files are already minified (client hands source like this) and the error becomes a lot less useful.
Perhaps I've been asking to wrong people (at the wrong time) - forgive me - but (as dar as I can tell) git has no easy / reasonable way to repo'tize a (vendor/) dependency. That means you have to bend over backwards (read: make too much extra effort) to patch such a library local and then eventually PR that change.
This feels like a fairly significant flaw in the current OSS "ecosystem." It encourages the wrong behaviour, and discourages the right one.
Use git submodules. git submodules interact tolerably neatly with GitHub's PR functionality; if you make changes to your submodule, then it's straightforward to turn that into a PR that you can submit to the upstream repo. And since your updates are a git branch like any other, if the upstream repo won't accept your changes then you can just keep doing fetch+merge/rebase to keep on top of the changes at their end.
Of course, this is git, so the UX is a total disaster, and it's very easy to get yourself in a pickle. But Stockholm syndrome will set in after a while and you'll begin to find it tolerable.
Assuming I've understood correctly, if you want the easy workflow then you just have to have the 3rd party code in its own repo and add that repo as a submodule. Best if that repo is one you look after, so that you can commit local changes to it if you like.
If you've not done this from day one, and you've got changes made to the 3rd party code, that's a bit of a pain, but provided you tracked (or can figure out) the version you were working from you can solve this manually. It's a bit of donkey work though.
(You have to decide from the start whether you're going to just put each dependency in the repo, or whether you're going to keep each dependency as its own submodule. I do think git works a lot better if you keep each 3rd party dependency as its own submodule: easier to patch your repo's copy, easy to make PR from your patches, easier to keep your patches merged in on top of later updates, and so on. But you have to plan for this a bit, and I won't deny that the submodule UX is pretty awful.)
It's not really so much that _git_ is lacking the functionality -- it's actually much more possible to do reasonable vendoring with git than with (say) subversion, as you can use submodules to point at your fork.
To my mind the issue is more with tooling: you don't typically want to vendor your code, most of the time you want to use a published module. And the process for publishing your fork is too complex to be worthwhile.
Also, the work to keep the fork itself up-to-date when you realise it's necessary is minimal, but by switching away from the original you lose tooling support for telling you when you need to make that update.
I understand there are workarounds, but they are just that. I mean, for example, I'm working away but hit a bug in a dependency. Now i have to fork, submodule, etc.
I'm not expecting frictionless per se. But it just feels like there's more overhead than there needs to be.
Also, it's possible (in rare cases) the vendor folder is not .gitignore'd. Now what? Submodules is out.
Again, given the nature of most modern dev (i.e., dependencies) vit feels underprepared.
Re: "...but by switching away from the original you lose tooling support for telling you when you need to make that update..."
Yes. But doesn't this actually support the idea (mentioned above) not to treat your dependencies like black boxes? That is, no pain no gain? Yeah, perhaps some sort of simple notification your fork needs an update. Just the same, ALL code should, in theory, be treated as your own. You wouldn't blindly merge/commit a colleague's work, so why shouldn't your dependencies be semi thoroughly reviewed as well?
>Include that information in a bug report, and better yet, open a PR offering to fix it.
In my experience, never happens on permissively licensed projects. "Contribute time and code to open source on company time? Are you mad? Fork it and keep that as our competitive advantage!" Nevermind how dumb this approach actually is, it's what management thinks. "GPL? Oh, we have to? Oh, well, I guess if we have to."
It's why (in theory) I love Go; as opposed to the JS hell as mentioned by another commenter, it has a lot more emphasis on consistency and, in a way, predictability, so there's less surprises and distractions (e.g. code style) when you browse other people's code. Also no "black box" dependencies (usually), you pull every lib from source and have it on your machine.
Isn't it the purpose of blackboxes, to abstract me from doing this? I'm here to get the work done and hopefully having to debug only code of my colleagues, why would I want to waste time on debugging something that should have been battle tested by myriads of people before me?
Argument like this almost justifies leaky abstractions as something to be praised.
I can only say Amen to this. If you ship it, you either accept responsibility as far as you can, or you're "that" person who redirects you another department because it's not technically their job.
Plus, as you mentioned the opportunity for learning is immense.
I agree with the spirit but in practice more often than not that's a complete waste of time and energy, unless you're trying to figure out the cause of a specific bug when you've already excluded your code as its source.
Do you consider the OS a "library"? Who checks the OS's code before shipping a program? What about the programming language's code? Do you code in assembly to reduce the amount of dependencies?
You'll learn a whole lot and be more effective if you let your curiosity expand beyond code you/your team has written.