Hacker News new | past | comments | ask | show | jobs | submit login
Kill Your Dependencies (mikeperham.com)
409 points by twampss on Feb 11, 2016 | hide | past | web | favorite | 226 comments

I disagree, and this quote I've seen floating around the internet sort of sums the idea up to me (albeit with a music analogy):

> I thought using loops was cheating, so I programmed my own using samples. I then thought using samples was cheating, so I recorded real drums. I then thought that programming it was cheating, so I learned to play drums for real. I then thought using bought drums was cheating, so I learned to make my own. I then thought using premade skins was cheating, so I killed a goat and skinned it. I then thought that that was cheating too, so I grew my own goat from a baby goat. I also think that is cheating, but I’m not sure where to go from here. I haven’t made any music lately, what with the goat farming and all.

The idea in the OP is that if you're going to use the Amen Break, don't require "Hiphop-all", "Breakbeat-all" or, hell, if we're going by some of the Wikipedia examples, "Futurama-all".

Just import "AmenBrother-drums" or something and start from there, because obviously you're not going to use Zoidberg's leftmost tentacle in your cool new sound.


I don't think that's what he's advocating: he's just saying that every dependency you have is another thing you have to worry about, so why not try to limit them as much as possible? Obviously it's impractical sometimes, and that's OK, as long as you understand the consequences.

Every dependency you have is another thing that people besides yourself can worry about with you or for you

Need to do X? If you write your own library that does X, chances are you'll be the only one to ever work on it. Need a new feature? You have to stop working on your actual project and implement that feature. Found a bug? No one else will fix it for you.

If you depend on a library that thousands of other people also use that does X, if you do find a feature you need that it doesn't have, open a ticket, someone will likely do that work for you. More often than not, that feature already exists, all you have to do is read the docs. If you find a bug, report it and and wait for it to get fixed, but there's also nothing stopping you from fixing it yourself and submitting a pull request.

A better way to view this is that if libraries are kept small, then you can pick a more granular set of dependencies than say a larger library that includes the kitchen sink.

I agree with most comments here that it's not a good idea to reinvent the wheel everywhere, but when all you want is a wheel, and not the entire car, it's ideal that we have a way to just include the wheel.

But... That would require you to read somebody else's code. /s

Classic case of NIH syndrome.

Every line of code you write is something you have to worry about. Every dependency you use is something that (in theory) other people are also worried about.

Charming quote. The same argument could be made about making your own food "from scratch." But it's not a solid refutation of the essay. Let's say that if one end of the spectrum is unlimited dependencies and complete indifference to the complexity and size of the project, and the other end of the spectrum is raising your own goats, there must be an ideal somewhere in the middle.

I completely agree, but the extremism in the essay is ridiculous. If I'm writing some software, I don't want to have to roll my own version of every little thing when there are battle tested libraries out there that already do it and benefit from many people using it. If my use case is vastly different, sure, but if it's the same use case as everyone else then I see little benefit.

if they are so battle-tested then they probably are not like the email gem that the author spoke of that was using 10mb more memory than necessary for dependencies...

node-land is also getting a bit silly this way... to say nothing of java etc...

since we are into silly quotes i'll riff and probably get this wrong but: "you wanted a banana but you got the gorilla holding it and the whole jungle too" joe armstrong (erlang) on Class-based OO inheritance etc...

10 megabytes of memory more than necessary! Because that's a massive waste of memory in 2016.

That is 10 megabytes per ruby application per server. I don't know how many ruby instances there are in the world, but the total memory reduction across the world could easily be in the order of petabytes. Small change with large impact.

Allocation of that memory still costs /something/ even if ram is so plentiful as to be free.

Making your own library also costs something, which are probably more scarce than RAM.

no disagreement - better to use someone elses - but if your using one tiny part of a gigantic library of functions, it may be more cost effective to find a better (smaller) library, or write your own.

The kind of thinking that got us web browsers using multi GB of RAM...

Have you read a modest proposal by J Swift. Satire at its best! https://en.wikipedia.org/wiki/A_Modest_Proposal

As a middle-ground, one could leverage existing source and send patches to fix the inefficiencies.

This reminds me of Objectivist-C:


"In Objectivist-C, software engineers have eliminated the need for object-oriented principles like Dependency Inversion, Acyclic Dependencies, and Stable Dependencies. Instead, they strictly adhere to one simple principle: No Dependencies."

'If you wish to make an apple pie from scratch, you must first invent the universe.'

  — Carl Sagan

If we’re talking recording, it doesn’t matter that much how exactly you arrive at some fixed audio sequence. Indeed, use loops or goats, borrow $xxxxx gear from friends, pull multi-terabyte libraries of perfectly sampled cello—as long as it floats your boat (and your target audience’s. If you make stock progressive house for clubs, goat involvement probably isn’t worth it). By way of knob tweaking and some applied randomness you arrive at a take you like and then you delete everything else and let the goat roam free.

Things are different if we’re talking live performance. You’d be fine with that multi-terabyte cello library on stage, but not if sampling plugin brought opaque nth-party dependencies you can’t vouch for and have no time to properly wrap your head around. You don’t want surprises and you want to know for sure there’s enough performance leeway on your concert laptop.

There are steps you’d take to reduce the error margin while you perform that have no parallels in software engineering realm. Maybe you’d bounce your cello loops to audio to never have that computation happening in real time. Maybe you’d bring analog synths, which are bulky and pricey but also simpler conceptually, self-sufficient, well-tested and never give blue screen of death.

Your song is essentially being created every moment your program is running or being developed. You’re putting even more trust in what you’re pulling—the black box abstraction boundaries around your dependencies can contract, the input you receive from audience is more direct, and importantly there’s no you actually playing instruments and directing the performance because the whole behavior is defined by algorithms that often end up partly delegated to dependencies.

that's a silly/wrongful quote IMO... I've produced/recorded music for over 20 years and loops from other people are not only cheating, they are not your damned recordings and you sound like every other loop-arranger out there... it's not really that difficult to record real-world sounds and guess what!? you can use those same audio tools (your daw + plugins etc) to manipulate and sweeten YOUR recordings just as easily as you do with others recordings aka loops. making music with sample-pack loops is basically just dj'ing/remixing... which I also have done for over 20 years so I know the difference... Using boughten drums is not cheating, as it's generally about the way you tune them and play them that make the difference, versus loop manipulation that literally is about modifying otherwise sample-for-sample copies... Guitar tonality = 80% in the fingers and the ear/mind of the player... The pickups and amp/pre-amp combination DO make the other 20% perhaps... As for making drums from goat-skins, well it's not exactly rocket science...

now, onto the the problem I have with the actual issue at hand: it's not like the author is saying to write your libraries in assembler. he's saying that maybe you don't need to include gems upon gems that themselves reference other gems, as the dependencies pileup and get ridiculous and the performance suffers, which is actually a major concern with Rails and other monolithic frameworks with single-threaded processing etc. for crying out loud, he's even giving you permission to specify your behaviors with a ridiculously high-level concise language aka Ruby... how much more direct and obvious can a point be and still be missed?

>I've produced/recorded music for over 20 years and loops from other people are not only cheating, they are not your damned recordings and you sound like every other loop-arranger out there...

Actually there are songs using samples that are 10 times more original and unique than songs other people have totally recorded and played themselves.

You can be inventive and original using samples and you can be a copy-cat bore writing your own stuff.

For example, let's compare a cheesy, but still huge classic and immediately recognizable "U can't touch this" with tons of stupid-ass Michael Bolton cliched ballads or MOR rock.

There's always going to be a balance between reusing code someone else has wrote, and writing new code. They both have their pros and cons.

I'm usually against counter-arguing by referencing fallacies, but this sums up your comment perfectly:


The idealized goat farmer will make a better musician than the one who simply uses others' samples. To bring the analogy back home, the programmer who understands silicon and circuits is better equipped than the one who relies on massive dependencies.

I don't think many embedded systems engineers could write a PaaS

I used to think embedded systems (software) engineers were the be-all-and-end-all until I started looking into how my router worked (or didn't work - dhcpd had crashed).

How can you take Buildroot [1] and break it so badly? A mishmash of Broadcom SDK and ODM Makefiles that will only compile on RHEL5 on a Tuesday [2].

1: https://buildroot.org/

2: https://dlink-gpl.s3.amazonaws.com/GPL1500418/DIR890A1_GPL10... - ~450MB compressed / 1.2GB uncompressed.

Regardless of how perfectly this does or doesn't match the context, I approve as an electronic musician and programmer and I will use this quote forever! Thankyou! www.soundcloud.com/decklyn

as an excuse for not doing original recordings? collage art is collage art, not painting... sorry:)

Some of the greatest painters, from Picasso and Dali to Warhol also did collage art...

Besides, painting is also not ballet or architecture, so?

yet, both are valid ways to express yourself.

A large number of dependencies is only a problem in environments that aren't amenable to per-function static linking or tree-shaking. These include dynamically typed languages like Python, Ruby, and JavaScript (except when using the Google Closure Compiler in advanced mode), but also platforms like the JVM and .NET when reflection is allowed. Where static linking or tree-shaking is feasible, the run-time impact of bringing in a large library but only using a small part of it is no more than the impact of rewriting the small part you actually use.

Edit: Dart is an interesting case. It has dynamic typing, but it's static enough that tree-shaking is feasible. Seth Ladd's blog post about why tree-shaking is necessary [1] makes the same point that I'm making here.

[1]: http://blog.sethladd.com/2013/01/minification-is-not-enough-...

The run time resource usage risks are only one element of deep dependency trees, and certainly the one I worry about least. The biggest risks are the fact that you've handed every person in your dependency tree developer status in your project. That includes not just potential maliciousness, though that is a factor, but potentially disappearing the project entirely, potentially taking it in an unexpected direction, potentially introducing bugs (oh that all projects merely monotonically got better), potentially introducing security vulnerabilities in deep parts of the stack you wouldn't even think to audit, diamond dependencies, difficult-to-replicate builds, etc. There are then tools that can strike back at some of these, but there's something to be said for avoiding the problem.

For that matter, there's no guarantee that tree-shaking would even have fixed the referenced issue; if the library preloaded 10MB of stuff, like a Unicode definition table, that you didn't use, but the tree shaker couldn't quite prove you never would, you'll still end up with it loaded at runtime. (For that matter, you may very well be using such code even though you don't mean to, if, for instance, you have code that attempts to load the table, and uses it if it loads for something trivial, but will just keep going without it if it is not present. The tree shaker will determine (correctly!) that you're using it.)

Basically, tree shaking only sort of kind of addresses one particular problem that deep dependencies can introduce, and that one not even necessarily reliably and well.

You're right; I over-stated the benefits of tree-shaking. My comment came out of long frustration that so many of the languages and platforms we use don't even support tree-shaking, so if we care about binary size, we have to do micro-management of dependencies that the machine should be able to do for us. But you're right; the other issues with dependencies remain.

About your 10 MB Unicode table example, did you have in mind the ICU data table that's included with Chromium and, more recently, all the Electron-based apps?

"About your 10 MB Unicode table example, "

The precise thing that I have personally experienced is Encode::HanExtra for Perl "getting around" in my code base and being able to save a couple of megabytes by loading it only when necessary. But the principle is the same, I'm sure. Full databases on all the characters in Unicode and such gets quite large!

Not always. Dependencies were a huge problem at Google, even in C++ (perhaps especially in C++), because they mean that the linker has to do a lot of extra work. And unlike compiling, linking can't be parallelized, since it has to produce one binary. At one point the webserver for Google Search grew big enough that the linker started running out of RAM on build machines, and then we had a big problem.

There's still no substitute for good code hygiene and knowing exactly what you're using and what additional bloat you're bringing in when you add a library.

That's a pretty significant special case though. I'd be willing to go with the advice "If you get as big as Google's codebase, be sure to trim the dependencies on your statically-bound languages too." But you probably have a ways to go before that's an engineering concern for your project.

(... note that one could make a similar argument for more runtime-dynamic languages. I won't disagree, other than to observe that as a lone engineer, I've managed to code myself into a corner with dependencies in Rails ;) ).

^ THIS. I wish I had more up votes.

The amount of time I've seen wasted trying to scale to Google is insane. People should worry about what Google does when they work for at least a billion dollar company.

For most projects import as many dependencies as you can as you are getting free labour. Sure, once in a while you'll fuck something up and waste a week or two, but it pales in comparison to the months you didn't spend reinventing the wheel.

No one really ever notices that it's all the companies with boat loads of cash that have massive technical debt. Even with the example at Google the first thing I'd try is jamming more memory in those machines, keep going until the linker needs more than 256 GB.

Fuck, Facebook still uses PHP, the stock market doesn't seem to care.

Was it running out of memory because of templates? For instance, parts of boost like boost serialization generates an obscene amount of symbols due to the way they do metaprogramming.

Templates were a problem but not a huge one. They aren't used extensively in the webserver, and in any case they bloat compile-time moreso than link-time.

The bigger problem was that we'd adopted a dependency strategy of "lots of little libraries" instead of "one big library with lots of source files". This offloads a lot of the work from the compiler to the linker. There are various advantages of this strategy - it speeds up incremental rebuilds, it encourages you to explicitly track all your dependencies, it simplifies IWYU, it's easier to parallelize - but linker RAM usage is not one of these advantages.

Wow. Was that with or without link-time optimization?

It was debug builds, which put an additional strain on the linker in that they keep around all the debug symbols. Regular builds were slow but manageable, so it's not like we couldn't release or develop, but it meant tracking down any sort of crash or serious bug became very difficult until we got the dependencies under control.

I forget the exact compiler settings - wasn't my department - but I think it included link-time optimization, and also FDO.

Ran out of RAM or address space? I've run out of RAM trying to ridiculous stuff like native builds on tiny embedded systems (due to whack code bases that wouldn't cross compile). Though, in the end I overcame this with even more perverse solutions like adding swap space via USB1 flash.

That aside, that is the third time in a couple of months that I have heard people mention situations where they ran out of RAM without explaining why swap could not at least work as a stop gap measure.

Ran out of RAM for cloud builds - Google generally does not use virtual memory for anything in the cloud, because it leads to unpredictable, massive delays which can cause cascading failures in services.

It was still "possible" to build on your workstation, but the locality patterns in linking and subsequent thrashing made this extremely an extremely small value of "possible". I recall once during this period I kicked off a local debug build on my workstation on Friday afternoon, went home for the weekend, and it was still running when I got into work on Monday morning. By Tuesday, I had given up and killed it.

Tree shaking is helpful but not enough. It makes dependencies more fine-grained and binaries smaller by removing some false sharing. But library maintainers still have to be careful about true sharing, where a function calls another function, which in turn pulls in something big (like a lot of data stored in a constant).

You need both tree shaking and a community dedicated to keeping code small.

Javascript has the latter; it's not universally true, but lots of JavaScript libraries pride themselves on small code size and few dependencies.

That's great. But you can't stop doing that work just because you have a tree-shaking compiler. For example, there's a lot of work going into making Angular 2 apps reasonably sized and dart2js doesn't magically make it go away.

You're right; tree-shaking doesn't eliminate all instances of dependency bloat. Come to think of it, I've even seen a counter-example in C++. On Windows, a hello-world application using the wxWidgets GUI toolkit is ~2.4 MB, even statically linked. I think the problem, or at least part of it, is that the WindowProc implementation uses a big switch statement to handle all of the Win32 window messages that wx supports. So, for example, the handler for the WM_PRINTCLIENT message has to be included even if your application doesn't do any printing. Same for drag and drop. It would be better if the WindowProc implementation looked up message handlers in a table, and the application could ask wx to register the handlers for just the features that are actually used. I wouldn't be surprised if similar concerns apply to frameworks like Angular, even in Dart.

Yes, UI frameworks are especially prone to this. Anywhere you have generic "display arbitrary thing on the screen" functionality, it logically depends on all the code you might need to fulfill that promise for arbitrary input.

Custom HTML tags typically can have arbitrary children, so the issue inevitably comes up. The reason a normal web page can be small is because the browser has already been downloaded.

I can't disagree with you, but you're also missing other issues related with having a big pile of dependencies. Maintainability being one. Runtime efficiency isn't the only problem that can be solved here.

Yeah, but it seems like you listed most of the platforms most people actually use, all in the X column. How do we get from 100MB Electron deployment and fat, partially used jars and dlls to this magical tree-shaking world?

You're right; most of the languages and platforms used for developing applications don't have reliable support for tree-shaking or fine-grained static linking. But there's hope for some of them.

When building Android applications, it's common to process the JVM bytecode with ProGuard before converting it to Dex bytecode. ProGuard includes a tree-shaking step. Sometimes it's necessary to tell ProGuard about specific classes or class members that it should leave alone, if they're accessed dynamically (e.g. using java.lang.reflect or Class.forName). But it's still better than nothing.

Likewise, .NET applications for the Windows Store are compiled to native code using .NET Native, and that compilation includes a tree-shaking step. This introduces some limitations on the use of reflection. I'm guessing similar limitations will apply to the native compilation option of .NET Core.

As for JavaScript, Google's Closure Compiler can do tree-shaking. But I don't know if the Closure Compiler's advanced mode works with any of the popular JavaScript libraries or frameworks, or just Google's Closure Library.

Good point about Android and .NET.

Regarding the Closure Compiler, ClojureScript touts whole program optimization as an advantage; the ClojureScript compiler just has to play by the Google Closure rules when emitting JS to take advantage. I'm sure it is harder for users of any arbitrary off-the-shelf library to gain the benefits.

It looks like Webpack 2 will have support for tree-shaking: https://github.com/webpack/webpack/pull/861#issuecomment-149...

There's also rollup.js (another module bundler) that supports tree-shaking: http://rollupjs.org/

Either way, it seems like the code needs to be using ES6 modules to make it all work.

JSPM already uses it for sfx builds


ES6 modules allow devs to easily specify and import only the parts of a library that are being used. The bundler takes care of pulling in the necessary parts. No magic required.

The jvm isn't really penalized because the concurrency model tends to be threads, rather than processes, so you don't pay a per-worker cost for a large library. Plus the jvm will optimize only the actually used code, including de-virtualizing.

It helps somewhat, but I feel that the "only is a problem" assertion is too strong.

Tree shaking doesn't help you when you are pulling in every HTTP client in existence transitively. It is still code that is being run, so can't be automatically optimized away, but it is unnecessary.

The casual link between tree-shaking-compatible-languages and accepting a large number of dependencies is not that clear. It could be that those who use large number of dependencies just prefer less dynamic languages when dependencies are very explicit and manageable.

Perhaps I wasn't clear. What I meant is that in environments that do support tree-shaking, you can depend on large libraries, and/or many small libraries, and the run-time impact will be no more than if you had written or copied and pasted just the functionality you need.

This could be very true, but to show the causality it is necessary that tree shaking abilities of language/tools on average precedes widespread use of huge dependency trees. It could be in reverse. That is when for some unknown reasons multiple dependencies appears, tree-shaking tools follow and it is just easier to create them for static languages.

> It could be that those who use large number of dependencies just prefer less dynamic languages when dependencies are very explicit and manageable.

Tell that to anyone using NPM.

Clarification - my point was that a sane person prefers dependencies to be explicit and manageable. It is easier to create tools for that with static languages. As for NPM due to dynamic nature of JS the tooling is rather hard and just not there yet.

Dependencies (and their specific versions) are already managed explicitly via package.json.

The shift to a flat dependency hierarchy in NPMv3 will make managing dependencies of dependencies much more explicit and straightforward to manage.

JSPM already uses a flat structure and shows how simple dependency management can be.

I'm not sure what you mean by "the dynamic nature of the tooling". The JS development ecosystem doesn't attempt to provide an end-all-be-all monolithic core lib. It's a good thing and one of the primary reasons why advances in the JS evosystem are happening at breakneck pace.

PHP doesn't have tree-shaking, as it isn't compiled, but it does have autoloading: the source code files for classes are only loaded from disk when they're instantiated. This is probably similarly beneficial.

As everything, I think a bit of balance is needed.

You're doing a quick MVP to demonstrate that your idea is working? Fuck it, just throw in dependencies for everything, just care about solving the problem you're trying to solve and proving/disproving your point.

Once you verified it, then go and kill your dependencies. But don't do it just because you want to do it. If in the end the users doesn't benefit from you optimizing your dependencies, why do it? (Speaking from a product side rather than a OSS project used by other projects)

Not sure KILL ALL DEPENDENCIES is helpful, but I'm not sure that MAKE EVERYTHING A DEPENDENCY is helpful either so...

That'd be good advice if those MVPs didn't so often become the actual product themselves. If industry and management understood that these things were proof of concepts, and realized that the actual product is going to have to be rewritten, then I'd agree with you.

Ruby is great for prototyping, because its so easy to get things up and running.

The big thing is transitioning to any kind of final production code. The rules for clean code apply as much for Ruby as it does for any other language.

But its a good post, it can easily be something you overlook, due to gems being so damn convenient.

Perfect. Get things done fast. Prove your product. If it succeed, you will have time to tune every aspect and invent your own wheel that fit your needs. But until that, RAM is a lot cheaper than your own time writing from scratch your version of things that are very stable and largely used.

But, the advice is really important for gem writters. As a gem author, I think you really need think a little more about our dependencies as you do with our public interface.

s/kill/understand, which is useful advice for software engineering in general.

As time approaches infinity, the number of magic "I use this package and it does something in my code, and then it all just works" dependencies you pull in should approach 0.

To me, this seems more like an argument for optimizing beyond your own stack. Don't kill your own dependencies.

Your app uses to much memory? Improve a dependency, you have now improved other peoples apps, too.

Your app uses to much dependencies in total? Try to get all your first-level dependencies to standardize on the best http-client. (Which he is partially doing with his post.)

Dependencies may have problems, but shared problems are better than problems only you have.

I agree 100% with this.

I used to bring in dependencies with the "don't reinvent the wheel" mentality. Then I realized how much trust I'm giving to the authors of all dependencies I pull in. Now I tend to do my best to understand the dependencies I bring so I can improve them if I can.

The only problem I find with this decision is when I make an improvement/fix a bug on a dependency, and the project is either inactive or the authors don't give a crap about your work.

    The only problem I find with this decision is when I make an improvement/fix a bug on a dependency, and the project is either inactive or the authors don't give a crap about your work.
True, but I think a temporary fork, which will eventually be merged back in, is still better than your own code with its own bugs.

If you try an fix a kib

Couldn't agree more. I think the title is unreasonably one sided, and saying "be part of the solution" is equally one sided.

Dependencies are great for the reasons you specified, and I saw nothing in that article suggesting otherwise. The part that feels the worst to read is:

> Can I implement the required minimal functionality myself? Own it.

This is largely a judgement call; "can I" and "minimal functionality" are subject to change based on many external circumstances. "Own it" also seems to imply owning it not as a dependency, based on the context, but rather as a part of a monolithic whole.

It is also interesting that the sidekiq product makes use of gem dependencies. At top level 5 without platform dependencies, which (mostly due to rails) expands out to many more. The message should not be to "kill your dependencies", because that mindset is outdated and slow.

So tired of hearing about how bad dependencies or scripting languages are. Would be much more excited to hear about how to contribute to open source dependencies, and how to write efficient scripts.

Another benefit to minimizing your dependencies is security. The less external packages you are using (especially packages without active, security-conscious maintainers) the less likely you are to suffer a surprise vulnerability due to something deep down in your dependency hierarchy.

This goes for client-side JavaScript too. XSS holes are one of the worst web app vulnerabilities out there and could easily be introduced accidentally by a simple mistake in a library. And this stuff is incredibly hard to audit these days thanks to the JavaScript community's cultural trend towards deeply nested dependencies.

but otoh, if you try to reinvent something instead of using a tried & true library, you might as well just add new bugs.

I.e. I'd 100% use libxml to sanitize xml rather than trying and reimplementing xml parsing myself.

As always, trade offs.


OpenSSL has major security issues encountered on a relatively regular basis.

Do not do your users the disservice of rolling your own SSL implementation. ;)

I'm not 100% sure I agree with this as stated. Sure if the functionality is in core lib, use it but... it depends...

Consider these three statements:

- No code runs faster than no code. - No code has fewer bugs than no code. - No code is easier to understand than no code.

For a language like scala where there is no json processing in the standard lib, if there is a json library that is battle tested, then by removing my own json code and leaning on that well tried and tested code for serialization/de-serialization, I've removed a whole bunch of code from my own library. The whole point of having modules as abstractions is to keep concerns neatly tucked in their own places to to increase re-use. By subscribing to the idea that my module should implement all of the functionality it needs, we're loosing the benefits of modularization.

I just went through this exercise myself in a library I maintain - I removed my own json code and put a library in. I removed a bunch of code and made the whole thing simpler by leaning on that abstraction.

I think the point of the article is: use external libraries thoughtfully. I don't think he was suggesting not using them at all.

Your example sounds like a situation where a dependency certainly makes sense.

> ... I've removed a whole bunch of code from my own library

You removed a bunch of code you understood, and added a bunch more code you don't understand, along with whatever technical debt, edge cases, and performance issues which are lingering in that library.

Adding a library is never removing code from your project, it's adding code you don't yet understand to your project. It can still be a net win, but it's not less code for you to maintain.

> > ... I've removed a whole bunch of code from my own library

> You removed a bunch of code you understood, and added a bunch more code you don't understand, along with whatever technical debt, edge cases, and performance issues which are lingering in that library.

Not all code you've written is good code. Hell, not all code you've written you actually understand. Libraries and dependencies make sense in many cases. Don't write yet another JSON parsing library unless you really need to.

> Adding a library is never removing code from your project, it's adding code you don't yet understand to your project. It can still be a net win, but it's not less code for you to maintain.

It's referencing code that you don't maintain. If the maintainer is bad, use a different library.

"The mime-types gem recently optimized its memory usage and saved megabytes of RAM. Literally every Rails app in existence can benefit from this optimization because Rails depends on the mime-types gem transitively: rails -> actionmailer -> mail -> mime-types."

It seems like this could also be cast as a major success for "semi" standard dependencies.

This article would be more accurately written as "prefer the standard library over 3rd party solutions" since all the examples given still required dependencies, but ones that are shipped as part of the language runtime (Ruby in this case).

However when discussing languages with no specific standard library or languages who's standard library is missing feature y, then it's quite understandable to use a 3rd party battle tested dependency. In fact I'd go further and say it would be advisable to use a respected 3rd party library when dealing with code which handles security or other complex concepts with high failure rates.

Matter of fact, sometimes it's better to be using a respectable 3rd party library. Requests vs. urllib2 in Python springs to mind, and I'm sure there's more examples.

I find dependencies to be a very good indicator for how my code should be modularized. That is, rather than pulling a boatload of dependencies into "the application", pull a couple dependencies into a module, and then depend on the module. It makes it very easy for dependencies to be a "well, it gets the job done for now, and I can reimplement that myself if that changes" sort of thing.

I found this approach to work well for me as well. It has payed off many times. I try to wrap most of my dependencies so that if I later feel like I need to pick some low-hanging fruit I can implement some of the functionality internally while maintaining the original api.

Also, this is probably the single biggest difference for me when working with a static vs. dynamically typed language. With Static Typing, there's a "translate the dependency's types into the application's types" step that pretty much screams "put a seam here!". With Dynamic Typing, it's a bit less loud.

That's pretty much a C/C++ problem though. I don't think I've ever had to translate types in C# or F#.

I was referring to that, with Scala, I like to avoid leaking the innards of a JSON serialization library in the code for an API client, instead returning domain objects. Whereas with, JS or Python, the initial approach is to sling around the blob of JSON.

Avoid shims.

There are lots of libraries that just put one interface on top of another interface. They don't do much actual work. Pulling in shims, especially if they pull in lots of other stuff you're not using, should be avoided.

If the dependency does real work you'd otherwise have to code, then use it.

cough Mongoose. Been moving away from it on my projects. While it does provide a nice interface, it just creates more work down the road.

Perl apps have thousands upon thousands of dependencies. It's intentional - reused code in CPAN means less downloading, more efficient code, and less bugs as the codebase gets refined. An app that relies mostly on dependencies is essentially an app with free support by hundreds of developers. That's the case with CPAN anyway; I don't know how Ruby people do things.

Bugs happen, though. If you see a bug in a dependency, it is your job to report it at the very least, if not make an attempt to fix it. Without this community of people helping to improve a common codebase, we'd all be writing everything from scratch, and progress would move a lot slower.

This reminds of of this article


Apparently Microsoft's Excel team had even written their own C compiler.

Obligatory essay from PHK on the effect the author describes:


History continues to repeat itself. Fake reuse and proliferation of unnecessary bloat are two of those recurring themes. Fight it whenever you can. The old TCL, LISP, Delphi, REBOL, etc clients and servers were tiny by modern standards. They still got the job done. Strip out bloat wherever you can. Also, standardize on one tool for each given job plus hide it behind a good interface to enable swapping it out if its own interface isn't great.

Gems I use fall into three categories.

A lot of my projects are just wrappers around one main gem. Rails, Nokogiri, Roo, API wrapper gems. These are 'project gems'. If they give me problems, I'll re-evaluate the scope of the project and perhaps pick another gem to orient the project around. Once the project reaches maturity, I'll default to fixing the problem rather than re-engineering it unless the problems run deep.

Sometimes I'll use gems like Phoner to handle datatypes that are too tricky to do with regular Ruby. I'll call these 'utility gems'. When I include a utility gem, generally it has one job and one job only, it's invoked in exactly one place in the code and gets included in that file. I can generally replace a utility gem with stdlib Ruby code if I really need to.

I also have what I call 'infrastructure gems'. These are gems like pry, capistrano, and thor that I tend to include in every project where it seems they would be useful. These are gems that are worth getting to know very well because they solve really hard problems that you don't want to use stdlib for. If these give me problems I will do whatever I need to to resolve them and understand why the problem exists, because the costs of migrating off of them would be steep.

The decision to use a gem should not be taken too lightly, but nor should it weigh large on the mind. Be quick to try it out, but also quick to take it out.

I was just thinking about this today. But from the point of view of growinga community around a platform!

Would you want to have one namespace for "official" modules and heavily influence everyone to use them? That's centralization (of governance). But, it's not centralization of a process that requires high availability. So the "drawback" is only that you centralize control and can make certain guarantees to developers on your platform.

When you're starting an ecosystem, you can choose a "main namespace" as yum, npm etc. does or you can choose the more "egalitarian" convention of "Vendor/product" as github and Composer do. I think, in the end, the latter leads to a lot more proliferation of crap, and as the articls said, multiple versions of everything existing side-by-side.

I have to deal with these issues when designing our company's platform (http://qbix.com/platform) and I think that having a central namespace is good. The platform installer etc. will make it super easy to download and install the "official" plugins. You can distribute your own "custom" plugins but the preferred way to contribute to the community would be to check what's already there first and respect that. If you REALLY want to make an alternative to something, make it good enough that the community's admins protecting the namespace will allow it into the namespace. Otherwise, promote it yourself, or fork the whole platform.

This is a great read that can be applied to node.js very much. I've seen apps that include 10, maybe 20 dependencies but when you flatten out the full dependency tree? Thousands. It's incredible and if one of those dependencies screws up semantic versioning or just screws up in general it can be a nightmare to debug and fix.

This is why every 1.0 product I work on I include every dependency that speeds up my development. In 2.0 the first things to do is prune all unnecessary dependencies and start minor rewrites when a dependency can be done in house (yeah yeah reinventing the wheel is a problem but most npm dependencies are small and many can be recreated internally without issue).

This is even more important if you're creating a library / module. My msngr.js library uses zero dependencies and yet can make http calls in node and the browser because it was easy to implement the minimal solution I needed without bringing in dependencies to support a single way of calling http.

The worst offender, IMO, is request. I've seen more than a few projects pull it in just to make a single HTTP call. Just look at its package.json:


I'd argue the offending party is the library pulling in Request for making a few simple HTTP calls.

Request itself in my opinion is a great piece of work, it does pretty much anything one would want to do with an HTTP client.

Yes, the npm "request" module is extremely bloated and badly written, to boot. https://www.npmjs.com/package/needle is a good alternative.

Can you elaborate? How do I know Needle is any better?

Why not just use node-fetch? It's tiny.

Could you elaborate on why you view this list of dependencies as a problem?

I've been using request and its many transitive dependencies for years without hitting any real problems due to this.

This sadly also happens in some Linux repositories that add too many dependencies to a few key packages.

On NixOS, last time I tried, installing mutt ended up bringing python as well.

In Nix, it's very easy to make a minimal mutt variant by removing the python input.

Yes, but these things shouldn't be happening. Default builds of packages should be kept more minimal.

I understand it's hard with Nix philosophy, and things are improving with different package outputs. For the record, python was pulled indirectly via the gnupg dependency I think.

It's a difference in philosophy. I think packages by default should be full-featured, and minimal variants can be created for those that want it.

> No code runs faster than no code. > No code has fewer bugs than no code. > No code uses less memory than no code. > No code is easier to understand than no code.

The dependencies you decide to implement yourself in a minimal fashion are code though. I generally agree with the article, but in the end It Depends™

And are generally worse tested, worse supported, and you have to maintain it on your own

It sounds like this is advocating NIH syndrome. If a library is going to make my job easier I'm going to use it, unless there is a very specific benefit of doing it myself.

Perl takes a pragmatic take on this (as well as other takes...) with the collection of ::Tiny CPAN modules that just do one thing pretty OK. Things like Try::Tiny that help immensely with exception handling - something you don't want to really roll you own.

It itself does not have any dependencies that aren't in core:


So, I've been writing a home automation system using the .NET Framework (with Visual Basic, I'll wait until you finish laughing).......... Okay.

I've made a point not to add any third parties references and packages I can avoid. I went ahead and got a third party scheduling engine, and the SQLite provider, but beyond that, I'm writing everything else myself so far.

First of all, I'm learning a lot in having to write stuff myself. At the very least, it's a great educational experience. I've worked with a lot of code samples, so I'm not going totally from scratch, but they're all at the very least tailored to my needs.

But for me, the big thing is keeping everything thin. The program loads in milliseconds. Almost all of the reference data for what it's built on is in one place (the .NET Framework Reference). And key, is that the features my program supports are the features I want and need, not the features some dependency has told me to have.

The biggest dependency I have, Quartz.NET, is actually the most confusing part. It's not structured like the rest of my program is, it's documentation leaves some things to be desired, and it does a lot more than I need it to. There's a lot of bloat I could cut out if I wrote my own scheduler, and maybe someday I will.

Double edge sword. Deps are great! Functionality added quickly. Deps are terrible! They broke my app.

If your app has a long shelf time, the less deps you rely on, the easier to manage from what I've seen.

For some reason Golang feels like it makes sense here. Pretty much everything you need is in core. *Disclaimer, I don't have any Golang apps in prod but I'd love to hear from those that do.

A lot of apps (old-timey Windows apps, for example) have this philosophy, leading them to reinvent things like crypto and image decoding. Naturally, this leads to tons of bugs, including security bugs.

I would revise this to: Don't bring in more code than you need. But if the choice is between writing something yourself and using someone else's well-tested, heavily-used library, always go for the latter.

As an architect, you need to be able to do a cost/benefit analysis of each option. That is what software architects do, why they have experience. For example:

  How much time will it take to implement each option?
  How much time will it take in the future to support it?
  What security risk does each option incur?
  What is the risk of the project being abandoned?
  What is the risk of the project changing in non-backwards compatible ways?
  What are the performance characteristics of each option?
NIH is a disease, but so is import-mania. With experience, you can make a good decision.

Also de-duplicating dependencies is pretty big. For instance, in Java land, I have a project built in Camel, and Camel is deeply in love with the Jackson JSON parser. So I use it there.

On the other hand, I've been learning to use GSON's parser in my Sponge plugin (a Minecraft server) because the SpongAPI dependency pulls that in anyway.

Albeit, both libraries are dead simple to use so it's a bit contrived, but I see a lot of projects that would pull in Spring's RESTTemplate into a Camel project when they've already pulled in CXF or have Apache's HTTPClient readily available via other dependencies.

(And no, URLConnection is terrible. TERRIBLE.)

One thing I've gotten into the habit of doing is looking around the commit history and issue list for any package I import. Was it something somebody wrote in a hurry and hasn't really touched since? Is it something that has a solid set of regular contributors? Are there a lot of outstanding issues relative to how heavily used it is?

I also spend more time actually reading through specs to see how well they exercise the code.

That's probably standard procedure for a lot of people, but it's something that I had to learn to always do.

That's a good idea

Good engineering managers and architects are great at balancing both biases.

Balance is good. Unfortunately ideologues have taken over software as they have taken over politics.

Are there really people out there whose decisionmaking process goes like "well we don't really need this library, but I'm gonna depend on it anyway to further the ideology of our movement"

Ideologies are rarely presented as such. I have seen places where they wrote databases from scratch, and rewrote Windows scrolling bars. Always 100 good reasons why using something else wasn't good enough. Other times I've seen people always want to buy the shiny nice toy rather than write a few lines of code themselves.

I know plenty of people who want to port everything to JavaScript. When you ask "Why?" they don't know. That's an ideology.

In what sense are technical preferences you cannot explain or justify an "ideology"? Ideology is not unjustified/mistaken belief. Maybe you mean cargo culting in technology, of which there is sadly a lot? Or maybe even "let's do this in JavaScript, because it's the only tech I know and I can't be bothered to learn something new"? That happens, but it's not ideological.

You're defining ideology too narrowly.....an ideology is a system of ideas and ideals. It doesn't have to be mistaken, and it can be completely justified and correct.

"I think Javascript is better because I prefer to use technologies I know" can be part of an ideology.

It was the OP, not me, that implied ideology is something negative and linked to poor justifications. For the record, I think ideology is (or can be) a good thing in politics, but a misappropriated term in technology.

> "I think Javascript is better because I prefer to use technologies I know" can be part of an ideology.

But that's not what the OP said. Instead, it was "I think we should use Javascript because... uh... I dunno. Let's just do it!". That's not ideology by any possible meaning of the word.

"But if the choice is between writing something yourself and using someone else's well-tested, heavily-used library, always go for the latter."

Absolutely. However, there are plenty of situations where what you pull down from npm or rubygems isn't actually all that well-written or well-tested.

When I first started programming I kind of had this impression that if an open source library is published on a package repo and people are using it then it must be much better than anything I could write. I have learned the hard way over the years that is not always true.

I feel like I relearn this lesson at least once a month.

There's an important kind of compromise that isn't discussed as often: using an external library but behind an interface of your own design, an "anti corruption layer". If the external dependency is limited to one small bridge in your application, then it's so much easier to see what parts of the dependency you actually depend on, to upgrade the dependency when its API inevitably changes, to replace it entirely if it becomes a burden.

I'll link to the facade pattern, since last time I mentioned it people went a week or so before realizing it wasn't a term of my own invention:


The downside is that you bring in another layer of indirection and, as a result, greater cognitive load for yourself.

Mycode -> facade -> library

Navigating code becomes more cumbersome and stack traces longer. I do like this approach too but it isn't free.

That reminds me of another thing I'd like to tell library writers: please think of your stack traces as somewhat public, because I always end up 40 layers deep trying to debug something and it's just painful.

Hopefully a facade will introduce a small constant number of stack frames...

Anyway, with something like Lodash for JavaScript, I'd even want to "facade" that into a project-specific utilities thing. Right now we're on an old major version of it and upgrading would require changing hundreds of locations. When very many files in a project mention the same external dependency, that seems like a recipe for future sadness.

A facade should only add one layer of indirection.

As for project-level management of external dependencies the tooling can be used to provide a facade for imports. I'm not sure about Webpack but JSPM already does this using a config.js file that maps all of the dependencies to readable import names, sans version numbers so the site doesn't break on future updates.

Ideally, once ES6 modules are used more widely it would be great to see libs start to adopt the facade pattern to provide finer granularity of control without deep linking into a project's source.

>That reminds me of another thing I'd like to tell library writers: please think of your stack traces as somewhat public, because I always end up 40 layers deep trying to debug something and it's just painful.

Unless you're using Spring, of course, then all bets are off. That's the biggest downside of IOC containers, they tend to ruin the usefulness of stack traces and the "step in/step out" functions of the debugger.

There was a code base where people wrapped glibc-functions. Most of the time it was straight calls to the corresponding functions in glibc, but they were called x... instead, so malloc became xmalloc, free became xfree, &c.

At one time there was a lot of zombie processes lingering for a long-ish time until the parent terminated and the zombies were reaped by init. I didn't bother to look at the implementation for xpopen, as I assumed it was just a call to popen. Turned out it wasnt; it was fork/exec with a socketpair turned into a FILE* with fdopen. The child was not waited for in xpclose.

I think there can be times when the facade pattern makes sense. I think there can be times when importing the world makes sense. I think there can be times when the opposite is true too. I think talking about these things in an abstract way can miss the point of the very insanity in some concrete solutions out there.

Heh. Well, yeah, abstract opinions are always suspicious. In your case, making a facade around standard POSIX functions does seem weird, especially if the facade is itself buggy! For something that needs to be portable across many platforms, such a facade could be very useful though.

Hah I stopped worrying about stack trace length when I started writing Scala ;)

Usually only 1 or 2 lines of the trace matter, you learn to skip the rest pretty fast.

Here's an interesting story from the Java world - "Filtering the Stack Trace From Hell":


An even more breathtaking stack trace image can be found here:


And the PDF version:


While I completely agree with the sentiment[1], there is a bit of hyperbole (and/or literary license) in the suggestion to "Kill Your Dependencies". Modular, well-contained code is very good. It's usually a good idea to build on other people's work, though this is yet another trade-off decision that will always be part of the software design process.

A lot of the dependencies discussed in this blog post are libraries that aren't actually adding anything useful: the various JSON parsing libraries that should be replaced with the parser in stdlib, or rspec testing libraries that shouldn't have ever been a regular runtime dependency.

[1] Managing complexity and dependencies is probably the most important concern going into the future, not just in programming, but also in every other complex system.

> Don't bring in more code than you need.

I see it as a sliding scale. If I'm parsing 1 string with the same date format into 1 object, I'm not going to pull in some general purpose time parsing library - I'll write the 10 lines of code myself, a few unit tests, and be happy.

If in the future I start having to deal with different date strings and some need to do more than just throw up a single date on a page somewhere, I'll get a date/time library.

You could also use the third party library for your unit tests.

This is pretty much what Rob Pike advocates in Go: "A little copying is better than a little dependency." http://go-proverbs.github.io/

I pretty squarely disagree with Rob Pike on that one. Copying is how you introduce bugs and insulate yourself from upstream bugfixes. I'm suggesting that you should try to remove code first, add a dependency on well-trusted code if that doesn't work, and only copy/reinvent as a last resort.

  > I pretty squarely disagree with Rob Pike on that one.
TBH that's kind of an indication that you should rethink your position. Rob Pike has a lot of experience, he's seen a lot, and he knows what he's doing. I'm not saying you're wrong, just that you shouldn't snap to the defensive position.

Rob Pike's authority does not counter the negative experience I've had with vendoring dependencies, making local changes, and drifting so far from upstream that it becomes extremely difficult to merge in bugfixes. Or the positive experience I've had with package managers making it easy to bring in third-party code, easier than copying it in.

little dependency is the keyword.

Little dependencies are fine. In fact, they're usually preferable: it leads to less code going unused.

> Rob Pike has a lot of experience, he's seen a lot, and he knows what he's doing.

So does pcwalton. Ever heard of https://www.rust-lang.org/ ?

It doesn't matter. If I were Einstein, and von Neumann disagreed with me, I would take that seriously, even though I were a super-genius etc.

I do take it seriously. I'm no super-programmer. Rob Pike is a better programmer than I am.

I just disagree with him on this one thing.

Deference to authority is a funny thing eh? I find that in life, and especially in running my business, I pretty frequently lock horns with people who disagree with me on account of a seemingly differing opinion of some other Experienced Person(R). It's like, such and such person has lots of experience in the industry, therefore they know the answer to this specific question better than you do, regardless of anything. When I hear these appeals to authority I always think that if I were actually sitting across the table from this Very Smart Person--as opposed to the person citing them--and they were willing to listen, we would probably end up in agreement. Like you say, the person might be WAY smarter than me in general, but... LOOK AT THE FUCKING DATA.

There's a reason smart people are good at solving problems, and it's not because they always defer to what other people 'know' to be true!

For me, that means I listen carefully for wisdom but have to speak louder when countering their occasional bullshit. Copying is considered a code smell by the likes of Fowler due to the problems it leads to. Also leads to bloat and performance issues. Better to cleanly, simply package up reusable solutions to problems like JSON or protocols (eg HTTP) then just keep importing the same one. You get lean apps plus a greater understanding of what they're doing.

And so does the person you hire down the line to extend that app. Never forget that part when talking copying and tweaking code. :)

An appeal to authority? To Patrick Walton?

That may true for Go, a language which still does not have a great dependency management story. I'm not sure this proverb is universal though.

Nice list. He clearly doesn't like reflection:

  > Clear is better than clever.
  > Reflection is never clear.
Of course, everything has its place, even reflection.

Reflection is a sign that your system is inadequate. In some (maybe even all) languages it may be necessary, but for a language designer it is a failure.

Eh, I am using reflection in a small (C#) project at work - I've had to implement a unit test system of sorts (yes, NIH, reinventing the wheel etc etc) and reflection lets me find all methods that return a particular type very easily. I agree it can become spaghetti very quickly but it is quite useful at times.

The funny thing being that the Golang stdlib uses reflection.

So many libraries in the wild aren't "well-tested, heavily used", though, and sometimes it is really hard to differentiate between popular and good.

At the end of the day, the only person who is responsible for the quality of your proje t is you. You have to figure out which parts of your project are key to your operation and which are just window dressing. If your job is to make a blogging platform, I would expect many features of formatting documents to be reimplementations of other people's work, because you have to know you are relying on yourself for your core purpose.

I also don't understand the heartburn other developers have over knowing "OMG L, THERE IS SOMEONE ON TEH INTARWEBS AND THEY ARE REINVENTING THE WHEEL". It smacks of fear, a fear that is ultimately rooted in insecurity. If a person was secure in their knowledge of their skills, their ability to understand problems and fix them, then there should be nothing to fear from a dozen or a million different libraries doing the same thing and running into one or two of them on one's next project. It is just a matter of course.

> If a person was secure in their knowledge of their skills, their ability to understand problems and fix them, then there should be nothing to fear from a dozen or a million different libraries doing the same thing and running into one or two of them on one's next project.

I actually do fear a million reimplementations of, say, RSA.

The impact of bad software is vastly overrated. Almost all of it is bad already, and yet we haven't killed ourselves off as a species yet.

The keyword is "well-tested, heavily used library'. Especially in the web area I see a lot of imported libraries from github or wherever for doing simple things. In the end you end up with dozens of dependencies you don't know how and when to update.

My rule of thumb is if the stuff we need can be reduced to a few functions, it's better to copy so you at least know which code you are using and don't have thousands of lines of code in you repo where you don't know if they are ever being used,

Copying doesn't make anything better; it just insulates you from upstream bug fixes. If copying is better than adding a line to your Gemfile or whatever, then that's a usability bug in your package manager. The entire reason for package managers' existence is to provide an easier-to-use, more reliable alternative to copying.

Copying gives you stability. I work in a regulated environment so every change has to be scrutinized. Updating a library like jQuery is a big deal. If you use 10 libraries of decent size you will most likely rarely update them because it's too much work and risk.

By isolating out parts you actually need you have a better chance of making updates with reasonable effort.

Not saying this is for everybody but a lot of dependencies can be a killer.

A good package manager that supports lockfiles addresses this issue.

Upstream updates can add bugs just as easily as bug fixes.

That's an argument for having an effective review and testing process. Change is inevitable and it's better to be good at doing it routinely than putting it off until an emergency.

This is overhead for every update of every library. In theory it's a great idea and expensive idea, so of course nobody does it.

There are two mindsets in coding, this code needs to work right now and this code needs to work in 20 years. Linking code is very likely to break in the second time frame. Public API's are generally unstable, services goes away, and people break things. But, if all you need is a toy demo then feel free.

I'm talking about integration tests, not 100% coverage of someone else's code. If you need e.g. image decoding, you need to be able to update libjpeg, etc. ASAP after a security patch – and that only requires a simple integration test covering known input / output for the subset of features you support. Since it's automated, there's very little difference between multiple small releases and infrequent large ones from this perspective.

As for your second point, I think you're overly focused on the wrong area. Both linked and static code demonstrably have many problems over that time period – if you recompile, you have to maintain an entire toolchain and every dependency over a long period; if you don't, you're almost certainly going to need to deal with changing system APIs, hardware, etc. — linking doesn't do a thing to make a 20-year old Mac app harder to run. In both cases, emulation starts to look quite appealing – IBM has, what, half a century with that approach? – and once you're doing that the linker is a minor bit of historical truvia.

Tests just give you a bug report early, they don't fix the bug.

Thought they also eased deployment and updates.

Your programs shouldn't do things you do not understand. You do not have to be an expert in cryptography, memory allocation or b-trees, etc, but if this is what your app requires, then you should take the time to read up on it and carefully research what is out there if you suspect it is beyond your abilities to implement.

If you take the time to do your research, the choice between rolling your own, copying or adding a dependency will become clear. If it's not becoming clear, then you haven't finished your homework. Learning is a good thing, yes it takes time, but it's time well spent, and it's fun above all.

You may discover that this thing that you thought was hard and needed a dependency is really a few lines of code (a good example is a graph implementation). It might even change your career path. At least that's been my experience in the nearly two decades of writing software.

Incidentally, though the author says this can apply to any ecosystem, finding it applies to Ruby too often is what pushed me out of developing Rails apps. At least at the time I was using it heavily, the Rails space just wasn't stable enough to trust that I wouldn't have to learn an entirely new wheel to get my work done every time I went in to fix a relatively small problem.

"Can I implement the required minimial functionality myself? Own it" is advice one gives if one can't trust the libraries one depends upon to stay healthy, performant, and applicable to your use-case. Nobody'd recommend re-implementing readline or printf; if you have some heavy-lifting mathematics to do in Python, use numpy.

How long did it take anyone to realize this nightmare? Since we are on the path to major discoveries, let's talk about runtime, runtime-dependencies and all that. Every Ruby app ever created is stuck somewhere on the time axis, before its origin.

Your app/library inherits the technical debt of all its dependencies.

There's a natural tension between code reuse and avoiding dependencies. If you can avoid a big dependency by writing a couple hundred lines of low-maintenance code, its probably worth it.

I find this argument sort related to the framework vs library and opinionated vs agnostic.

Being an old fart Java developer I generally prefer things where you can plugin your own implementation (ie agnostic).

That is there is an extreme for killing your dependencies of either extreme copy'npaste OR which every library offers a plugin SPI (ie inversion of control) (or a combo of both).

The problem with the dependency injection above approach (aka Spring prior to Boot) is that you have developers doing lots of custom crap, bloated/overly engineered libraries, increased ramp up time, and configuration hell.

But I still think this is probably better than ole copy'n paste.. most of the time. I do hate dependencies though.

You're probably more experienced in java development than me. But maybe a good rule of thumb is that most Java libraries shouldn't use reflection or dynamically loaded classes (i.e. using Class.forName or ClassLoader). That rules out most dependency injection frameworks.

That is correct that the libraries should not have DI but I should be able to wire up the library on my own and not let the library do its own static initialization. What is far worse than Class.forName and other crap is libraries self imposed singletons.

Take for example Hystrix. I'm just now fixing that the thing loads up its own configuration framework (Archaius) which uses static initialization. Archaius needed like 10 other dependencies. This is all really because Hystrix uses static singleton (HystrixPlugins) and many frameworks need this or else is incredibly difficult to get an implementation up (ie using pseudo singleton to avoid excessive passing of a context).


Java makes plugin-like scenarios very hard. The kind of thing you'd do with a typeclass in languages that have them. There's OSGi (the horror, the horror) which maybe-kinda-sorta-works, or there's the SPI where you put the class name in a .service file (which will then be instantiated though... reflection). When those are the alternatives, DI frameworks aren't so bad.

One issue I had with Ruby on Rails was getting MySQL drivers to even cooperate on both Windows and Linux. At the end of the day I wound up sticking to Python and other languages instead. I don't mind using any language, but if the language is fighting me due to native dependency hell then I can't really do much. Even to just use SQLite was a bit of a painful experience, yet on Python SQLite works out of the box without any effort on my part (on Windows). Oddly enough.

I'm looking to getting back into Ruby at some point later this year, but I might ignore Rails altogether so I don't miss out on learning a new fun language.

This is not only an issue for running software but also a huge issue for compiling/building software. Each dependency adds the potential to break your builds in new ways. As the software your program depends on evolves the risk increases that it will change the way your program executes or cause it to fail to build. Many devs will insist you do not reinvent the wheel by writing things like JSON parsers but you always have to weigh the cost of adding a dependency. It's not free.

This reminds me of an example I ran into yesterday. I haven't used webpack yet but I saw a question on SO of someone wanting to use some package called glslify. I thought I'd take a look and maybe learn webpage in the process.

From the description all glslify does is look for files with the extensions .glsl, .frag, and .vert and lets you get their contents with `content = require(filename)`.

Sounds like it would be at most 10-30 lines of code. Nope

    npm install --save glslify-loader
    webpack-glsl-test@1.0.0 /Users/gregg/temp/webpack-glsl-test
    └─┬ glslify-loader@1.0.2 
      └─┬ glslify@2.3.1 
        ├─┬ bl@0.9.5 
        │ └─┬ readable-stream@1.0.33 
        │   ├── core-util-is@1.0.2 
        │   ├── isarray@0.0.1 
        │   └── string_decoder@0.10.31 
        ├─┬ glsl-resolve@0.0.1 
        │ ├── resolve@0.6.3 
        │ └── xtend@2.2.0 
        ├─┬ glslify-bundle@2.0.4 
        │ ├─┬ glsl-inject-defines@1.0.3 
        │ │ └── glsl-token-inject-block@1.0.0 
        │ ├── glsl-token-defines@1.0.0 
        │ ├── glsl-token-depth@1.1.2 
        │ ├─┬ glsl-token-descope@1.0.2 
        │ │ ├── glsl-token-assignments@2.0.1 
        │ │ └── glsl-token-properties@1.0.1 
        │ ├── glsl-token-scope@1.1.2 
        │ ├── glsl-token-string@1.0.1 
        │ └── glsl-tokenizer@2.0.2 
        ├─┬ glslify-deps@1.2.5 
        │ ├── events@1.1.0 
        │ ├─┬ findup@0.1.5 
        │ │ ├── colors@0.6.2 
        │ │ └── commander@2.1.0 
        │ ├── graceful-fs@4.1.3 
        │ ├── inherits@2.0.1 
        │ └─┬ map-limit@0.0.1 
        │   └─┬ once@1.3.3 
        │     └── wrappy@1.0.1 
        ├── minimist@1.2.0 
        ├── resolve@1.1.7 
        ├─┬ static-module@1.3.0 
        │ ├─┬ concat-stream@1.4.10 
        │ │ ├── readable-stream@1.1.13 
        │ │ └── typedarray@0.0.6 
        │ ├─┬ duplexer2@0.0.2 
        │ │ └── readable-stream@1.1.13 
        │ ├─┬ escodegen@1.3.3 
        │ │ ├── esprima@1.1.1 
        │ │ ├── estraverse@1.5.1 
        │ │ ├── esutils@1.0.0 
        │ │ └─┬ source-map@0.1.43 
        │ │   └── amdefine@1.0.0 
        │ ├─┬ falafel@1.2.0 
        │ │ ├── acorn@1.2.2 
        │ │ ├── foreach@2.0.5 
        │ │ └── object-keys@1.0.9 
        │ ├─┬ has@1.0.1 
        │ │ └── function-bind@1.0.2 
        │ ├── object-inspect@0.4.0 
        │ ├─┬ quote-stream@0.0.0 
        │ │ ├── minimist@0.0.8 
        │ │ └─┬ through2@0.4.2 
        │ │   └─┬ xtend@2.1.2 
        │ │     └── object-keys@0.4.0 
        │ ├── shallow-copy@0.0.1 
        │ ├─┬ static-eval@0.2.4 
        │ │ └─┬ escodegen@0.0.28 
        │ │   ├── esprima@1.0.4 
        │ │   └── estraverse@1.3.2 
        │ └─┬ through2@0.4.2 
        │   └─┬ xtend@2.1.2 
        │     └── object-keys@0.4.0 
        ├── through2@0.6.5 
        └── xtend@4.0.1 
> 4 meg of source files


update: I think maybe I misunderstood the description. glslify actually parses GLSL and re-writes it in various ways so maybe this is a bad example.

I've seen other though. Like 40k+ lines of deps for an ANSI color library or 200k+ lines deps and native node plugins for launching a browser from node.

I went and looked at the source, and while I find it hard to follow, it's clearly not just implementing the 10-30 line function you're hoping for. It's doing lots of other things.

An enormous portion of that code lives under static-module, which appears to do some kind of bundling/codegen/code swapping logic. So it depends on a parsing library for JavaScript.

I don't know if this is a good design, I don't follow the exact functions of this library, but I'm completely unimpressed by you having a drive-by reaction of "4 megs for this library?" It feels like you just scanned the library and decided it was wrong, without asking what it was trying to do.

In the same vein, what I want to see from this discussion is people delving into specifics. The original article did that a little, but I want to know more. What are the actual costs of tearing our dependencies? Which dependencies are worth suffering through? Perham uses the example of Net.HTTP. What do the various HTTP clients add to it? Are they just more terse, do they help to avoid various pitfalls, or do they actually "abstract" in such a way that they lead you to write bad code?

NodeJS developers ought to be embarrassed at how absurdly huge their dependency trees are.

I think they are a proud bunch.

They're web scale

I think there's an important difference between code used by a tool you employ, and dependencies you're actually bringing into your app that will be around at runtime. Your example seems like the former.

In webpack a loader is a dev dependency not a production one. There is no need to optimize for it if it run well.

Devs have to run `npm install`. It's run for every commit in CI as well. Slow npm installs is bad for your project.

Slightly related, I was packaging up a webapp in docker, and one thing the application did was form-fill PDFs. I had been using pdftk to do this, but it turns out pdftk is written in gcj, and gcj pulls in a lot for its runtime libraries. I wrote a small program using the mupdf libraries and cut the size of my docker image by over 400MB.

Congrats, you just violated the GPL. You were already violating the proprietary license if you're using his for commercial purposes without paying.

Not all free software is free.

I just saw the commercial redistribution clause for pdfTK if that's what you're talking about. It does not affect me since this was not a commercial application, but it would seem to me that that clause itself is a violation of the GPL, since pdfTK links to GPL software and it is also a contradiction of a separate place on the site that claims the same software is licensed under the GPLv2. Preventing commercial redistribution is not compatible with the GPL.

Actually mupdf uses AGPL, and so the program I wrote and linked with mupdf is also under AGPL.

There is a counter example of his reasoning in the Python world. There is a HTTP client library "urllib" in the standard library, but nowadays everyone rather pulls in the external dependency "requests" because the urllib API is terrible. It is mature, well tested, good documented code though.

just as a data point: I don't use request, I onestly prefer the urrlib (was urllib2) API offered by the standard library

Looks like the mime-types upgrade has some sort of hard dependency on Rails 5? I seem to be stuck on 2.99

Its the other way around actually: ActionMailer 4.2 depends on mail ~> 2.5, >= 2.5.4, so you'll get 2.6.3 now. That version depends on mime-types <3, >= 1.16, so you'll get 2.99.

The big change in mime-types 3 is using the columnar store by default, which is where the memory savings come from. It's opt-in from mime-types 2.6 onwards because it's a breaking change. Mail and afaik most other gems have opted in already.

Excellent article. I tried to develop on GitLab once but the sheer amount of gems it pulls in (~100 directly declared, 350+ including dependencies if I remember correctly) with a bunch of installation problems made me decide it was not worth the hassle.

I'm sorry to hear you had installation problems trying to develop for GitLab. Have you tried the GitLab development kit https://gitlab.com/gitlab-org/gitlab-development-kit? If still are interested and experience problems please email support@gitlab.com and reference this comment for help.

I agree with the article, the less dependencies the better. GitLab's gemfile.lock https://gitlab.com/gitlab-org/gitlab-ce/blob/master/Gemfile.... has over 1000 lines and GitLab uses a lot of memory. We try to be careful what we pull in but if anyone has suggestions which can be removed please let us know. Recently we found out that we still had to remove Redcloth as a dependency, it will be gone in GitLab 8.5.

If there is an analogue in the standard library you should have a compelling reason to use an alternative. Wish this could be filed under "common" sense. Thanks for articulating and presenting this principle, among others. Great writeup.

if you're running multiple rails processes on a server like this, couldn't you somehow do the initialization in one process, then fork off the new processes? wouldn't that prevent the base libraries from being copied in memory?

Yes, http://unicorn.bogomips.org/ popularized this for ruby / rack / rails with its forking model and preload_app option. http://puma.io/ does the same thing, but additionally runs multiple threads in each process.

The garbage collector in Ruby 1.8 / 1.9 negated the benefits of copy-on-write forking, but that's fixed since Ruby 2.0

Yes. Many app servers do this. (Not being super familiar with the Ruby ecosystem, I am not sure specifically which ones.)

This is a huge mistake if applied without care. Building things from scratch necessarily will introduce more bugs, more maintenance costs and leave you with a codebase that suffers from a lack of maturity.

Building things from scratch necessarily will introduce more bugs, more maintenance costs and leave you with a codebase that suffers from a lack of maturity.

Unfortunately, in some programming language ecosystems where having many small and transitive dependencies on modules from an non-curated repository is common, none of those three things is necessarily true.

Code reuse is not a trivial problem, and you always have to weigh the benefits against the costs and risks to decide whether it’s worth it. If we’re depending on GitHub repositories with a dozen files and three subdirectories just to provide some simple functionality that any junior programmer could implement directly in five lines of code, we’ve probably lost the plot. On the other hand, if we have a full in-house implementation of encryption algorithms we use to throw sensitive customer data around between the browser and our servers, we’ve also probably lost the plot.

I don't think the sentiment is 'implement everything from scratch', but rather that if something exists natively try to use that instead of pulling in other dependencies, like in the http client example.

I remember one nasty bug where someone has included one function from bootstrap.js library and someone else had included the entire library. So both functions were running causing an issue.

This is not just true for Ruby but also for the entire npm ecosystem.

I wonder how much traffic could be saved by optimizing npm packages... probably on terabyte scale at github alone, methinks.

Yes, this is the sort of thing that scares me away from Ruby.

I'm worried this sort of "screw it just add a library" is going to spread further in my language of choice: Java.

In my time doing open source programming on the side, I've found that it has become more common with the advent of things like mvn and gradle to just slather on layers to your stack even for the simple tasks.

Need a function to turn a byte buffer into a string? Download these 3 Apache commons libraries and their dependencies.

I understand if you are relying on a large portion of a library and you need to use it, but why bring an entire library in for one function.

There are ideas floating around that make it appealing to do just that. For example, the commons library might be considered "battle-tested," and who really knows what could happen with your own custom byte-buffer-to-string function? Maybe you missed something? Maybe there is some "best practice" that you didn't follow? Maybe the commons library is optimized? And writing your own thing doesn't add business value. Developer time is more expensive than dependencies. And so on.

Me, I very often prefer to write things myself, in a way that can get labelled as NIH. My inclination is based on bad experiences with trying to debug external libraries. Sometimes I look at open source library code and find staggering complexity that I have no need for. Yes, maybe the library is great, but if its combinatorial size is 10,000 times the functionality that we need, then depending on its correctness becomes scary to me. And when I need to customize it, due to some requirements alteration, I will find it difficult and tedious.

Black-box type libraries for isolated complicated tasks like codecs and crypto I will happily use.

Otherwise, I'm a fan of the "design patterns" approach to reuse, which is all about learning from others, but without creating reusable formal abstractions in library form. So if you teach me how to write an URL router, I can then use your insights without depending on your code base, and I can adapt the idea so it fits my application perfectly.

I agree completely. I'm not the one who will sit down and attempt to implement my own RSA, or hashing algorithms.

It's like pornography. I don't know how to define it, but I definitely know when I see it.

There are correct times to use libraries. But pulling a 15meg for some simple functionality is not a good practice in my opinion.

Urgh. I find design patterns the worst approach. It's just copy-paste at a slightly higher level. If you really understand a pattern, you should be able to express it formally - i.e. as code.

Richard P. Gabriel's book Patterns of Software has a chapter about that. (It's free and out of print.)

Bluntly, it's kind of like: if you really understand a style of house building, you should be able to deliver it as a prefab. Maybe true in some way, but also neglects the drawbacks of standardized components.

> I understand if you are relying on a large portion of a library and you need to use it, but why bring an entire library in for one function.

Why not? An extra 3MB of disk space? A few milliseconds more of compile time? Maven makes it very easy to work with dependencies (and you want to use maven anyway, even if you have no dependencies, so that you have a proper release process). At this point I think of basic utility libraries like guava-collections, commons-io, httpclient and junit as part of the standard library. If I need even one function from one of them I'll add the dependency. Better that than have two inconsistent ways of doing the same thing.

Interesting to note that the Stripe gem removed one of its dependencies seemingly in reaction to being called out at the end of this article.

This should be one of the perks of Go since the compiler won't let you include anything that you aren't using.

That's an orthogonal issue. You won't accidentally bring in something completely unrelated because you forgot to remove it from the "include"s, but nothing technically stops you from having a deep dependency chain. Culturally the Go community is aware of the issue, though.

Still, it isn't hard to bring in a chain accidentally. I have a program than needs to do a query against the local LDAP system to extract members of a specified group. The LDAP library brings in five more libraries for parsing all the various bits of LDAP. Since this isn't C, I'm a bit less nervous about pulling in, say, a BER decoding library, because at least Go is generally memory-safe, but, still, that's a somewhat large stack for such a simple query. (Traditionally in C, you might as well just expect any library that decodes anything remotely binary-esque will have buffer overflows. C is a DSL for writing buffer overflows.)

And yet, I'd be insane to try to implement some sort of just-barely-minimal LDAP client to do it myself.

Looking at my local godoc instance's full set of packages that have gotten pulled in one way or another is still sort of intimidating. Some of them are cases where I'm just pulling in a subdir and got an entire large repo (the golang experimental repos do that a lot), but, still, I've got a lot of stuff in there. If you're a go programmer and you haven't run godoc locally and had a look at the packages page, have a look. You may be surprised.

Does any of this sound familiar:

- Test gems loading in production.

That does not sound familiar! Is this a thing which happens with Rails?

No, but sometimes people are a bit careless in their Gemfile I guess.

mojolicious does a great job with this, supporting optional dependencies as progressive enhancement (installing EV will speed you up, but you don't necessarily need it)


I have one thing to say about all of this...


They should sell sonatype for this.


So to summarize, to get rid of an extra 10 (or even 100) megs in Ruby (or 0 megs in some other languages) of memory usage (and disk usage don't forget!) spend weeks rewriting, testing, and integrating your own code instead of using already written, tested, and integrated code. Now, that I've clarified the article's point, how can anyone not follow this "best-practices" advice? </sarcasm>

“Oh, I thought you said ‘dependents’” — Abraham

what's the point? none of them are realistic.

"no code" - well, it's there for a reason. "own it" - do i really want to write my own minimal implementation?

i understand that dependency is a pita but this post doesn't provide anything worthwhile.

The problem with dependencies is that developers approach it from a top-down approach. The question they answer is: I need an HTTP client, JSON API, monitoring tool, logging framework.

Never do they ask the opposite: what kind of foundations do I need? What elementary blocks do I need to have or learn in order to make a JSON parser in 5 lines of code? Is it possible to do logging without all the cruft? Can I write the library in the same amount of time as I can read the docs? Could the code I write be the docs?

Similar line of reasoning: can I leverage my OS to do scheduling/IPC/monitoring/security? If it can't, should we lobby for better OSes (that might scale over multiple machines?) Does Linux/Docker offer the right fundamentals?

Dijkstra was truly right: the art of programming is the art of managing complexity.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact