
Why I often implement things from scratch (2006) - clarry
http://armstrongonsoftware.blogspot.com/2006/09/why-i-often-implement-things-from.html
======
enriquto
There's a clear-cut "scissor statement" that you can use to divide programmers
into two opposite sides. You ask the following question:

"\--If you need to invert a 2x2 matrix in your program, do you write down the
algebraic formula for the solution by hand or do you call a linear algebra
library? (that you would not need otherwise)"

The answer to this question is _obvious_ to most programmers. Also, it is
obvious to them that the opposite answer is clearly wrong and absurd.
Unfortunately, not everybody gives the same answer!

~~~
seisvelas
Okay, you got me. I am in the camp that says calling the library is (in the
general case) obviously the better choice.

My (camp's) reasoning:

1\. Your handrolled solution is likely worse than the library anyway

2\. You're wasting effort reinventing the wheel

3\. You risk adding unnecessary opacity by littering the program with dense
math

4\. Other programmers will assume there is a pertinent reason you handrolled
the algorithm instead of using the library. Now everytime someone touches that
code they have to reverse engineer your approach to see what it does different
from the library such that you had to roll it yourself (surprise: nothing!).

The two exceptions I imagine are 1) if you really do absolutely need some
slightly modified microoptimization of what the library does or 2) if you are
working for a platform where there is good reason to avoid even slight bloat
(embedded).

For the people who read OP's comment and believe I am (obviously) wrong, what
is your camp's way of looking at this?

~~~
throwsprtsdy
> Your handrolled solution is likely worse than the library anyway

This assumption led to a lot of O(n^2) left-padding happening over the years,
via _left-pad_ from node.js.

Ideally libraries would be well-tested and of high quality, but at this point
there may be too many of them to vet.

My own 2 cents is that if it's easier to implement the code from scratch than
to evaluate and choose a library, perhaps the invent-it-here choice works
best.

~~~
danShumway
Parent's comment is getting downvoted to all heck, but it is _completely,
obviously_ true in the Javascript ecosystem. If you're a competent-to-expert
Javascript programmer, you will regularly run into libraries that are not
coded to your standards.

And honestly, even throwing those libraries out, it's sometimes not even a
question of doing a "better" job than the library. The library is designed for
general-use, but my needs aren't always general-use. Sometimes libraries are
_good_ , but they're wasting time on stuff that I don't need.

This is the reason why every Unity game on the web takes 4-5 seconds to load,
and my custom engine takes on the order of milliseconds to load. It's not
because I'm a better programmer, it's not because Unity devs are crap, it's
purely because I know exactly what my engine needs to do, and I don't waste
time on anything that it doesn't need to do.

I can think of tons of examples where I've started out using an existing
library and then realized that getting rid of abstraction made my code faster
and easier to debug. Heck, I can think of tons of times where I've privately
forked libraries and deleted codepaths or rewritten algorithms to make them
more efficient for my use-cases. There are very few JS libraries that I
regularly use where I have not at some point needed to care about their
internals.

And I am definitely not a crazy, insane, masterful programmer. If I'm
occasionally circumventing d3 internals or doing stuff manually, it's not
because I'm special. People have this assumption that if something has a bunch
of stars on Github, there's no way they could possibly code something more
appropriate or efficient, and for a lot of people, that just isn't true.

~~~
Paperweight
"Surely this core cryptography library with 21 contributors and led by a
superstar Node.JS dev with 9+ million weekly downloads and 1000+ dependants
will be a great example to read and learn from..."

 _Opens source code._

 _Closes source code._

"Well, looks like I'm going to be a Rust developer from now on."

~~~
NohatCoder
Just don't ever look into the crates, as long as the lid stays on the crates
are perfect, no reason to believe otherwise.

------
cogman10
Too many devs don't understand that dependencies add fragility to a system.

The larger and more complex a system you design, the more likely it is that
you end up with something breaking because two dependencies are incompatible
or even your code and a dependant break due to updating the dependant.

Further, security is rarely considered. The more code you have, the more
likely you are to have a security vulnerability. You have to worry not only
about code you maintain, but also the code of your dependencies.

If you can write the functionality you crave rapidly, do it. You'll save
yourself (and others) headaches in the future.

~~~
jonnycomputer
Or worse, the developer gets bored and stops supporting the project, and it
stagnates until it breaks everything else. VisionEgg I'm looking at you.

~~~
caymanjim
Except that happens far more often when you roll your own code internally. For
anything of even moderate complexity, only the author will know how it works.

~~~
hinkley
If this person is bad at knowledge transfer, they end up being the only expert
and thereby gaining status for having done so, when objectively this should
cost them status instead.

------
jchook
Once upon a time there was a systems programmer who wanted to use a readymade
3rd party solution to an specific problem.

He evaluated 3 options, then downloaded, compiled, configured and integrated
the best choice into the project over the course of a full day. Its limited
database integration configuration necessitated some odd quirks like
duplicating a db column, but oh well.

When finally done, integration tests revealed that it didn't actually solve
the entire problem. It only solved 90% of the problem. Not only that, it had a
significant but elusive bug that only occurred in 5% of situations. He filed a
bug report, but the project hadn't received any updates in over a year. Damn.

When it became clear neither of the other two readymade libraries would solve
100% of the problem either, the programmer wrote his own solution. It took 3
days of work but he learned a LOT about exactly how it works, and the result
solved 100% of the problem with no major bugs.

All of the quirks disappeared. The solution fit the problem exactly, and
remained flexible and maintainable. It eventually served as a solid foundation
for future developments which had no readymade solution.

~~~
hinkley
There’s a questionnaire out there for anyone writing a library and the first
couple of questions are what ones did you try, why don’t they work, and why
can’t they be fixed. Sounds like your coworker wrote a library the “right”
way.

We had our own snowflake wrapper around the saucelabs client. I eventually
figured out how to file a PR to do the same. Now I get to use that solution on
every other project I ever work on. I have similarly used my own Stack
Overflow answer to solve a problem I also had three years prior. And I
recently found my SO answer to another question verbatim in our own codebase.
Ironically, from a guy who often ignores my advice.

Get as many things out of your head and into the group memory as you can.
Things get forgotten otherwise. Speaking of, did you guys ever consider open
sourcing his solution?

------
caymanjim
While the author's main point has some merit and is worth discussing, the
example chosen is wrong, and the reason it's wrong is the reason you should
take a beat before you consider ever rolling your own solution.

So now we've got some simple code to copy files between machines, because we
hired someone who was too academic to figure out how to set up a simple FTP
server. Here's what comes next:

1\. We need to open firewall ports so these two machines can communicate. What
are they? Who knows. If it were FTP, the sysadmin would know.

2\. We need authentication; we can't just allow any old machines to connect.
If we were using FTP, we'd have that built-in. And it would have been audited,
so we can have at least baseline confidence in its safety and accuracy.

3\. We can't go having the disk overflow. We need something to manage quotas,
because even though we trust our users, sometimes software goes rogue. Maybe
the default FTP server can't handle this, but some other product can.

4\. We need logging so there's an audit trail, in case something goes wrong
and we need to track it down.

And on and on and hey look you've reinvented FTP and now no one else on your
team knows how it works, it's full of bugs, you've hardcoded a bunch of
inefficiencies, there's no documentation, we have to explain what it is and
how it works to the operations team, and hey here comes the boss asking us to
make sure it works over SSL.

If you think you can do it better, make sure you think about what you're
doing.

~~~
hinkley
Maybe the two most painful experiences in my career are the customer who
changes his mind every six hours (and once a week that happens twice in one
day) and when the smartest person in the room is also the biggest fool. They
can write the most arcane code that nobody else will want or be able to try to
untangle, and then act surprised when people are frustrated with them.

------
nartz
This is extremely context dependent. For instance, a dev has to consider:

\- guess whether or not this is an isolated case, or whether or not this will
become core functionality

\- a self-assessment of the true difficulty of the problem

\- a self-assessment of their own skills and knowledge in the area

\- security reasoning

\- API access/readabilty for other developers to use this code

\- maintainability of new code

I have personally seen personal implementations that lead to bug, after bug,
that have already been reasoned about in equivalent libraries.

Often for the simple fact that other devs who have to work on this code, its
likely that the abstraction and readability of a third party library is
probably greater than the 'quick-and-dirty' implementation.

~~~
hinkley
That’s a good list. I’d throw “facility at evaluating existing solutions” on
there, too.

The person who reaches for the “New File” button is often not used to asking
if the thing they need already exists. They can end up duplicating or
triplicating business logic which results in other engineers confidently
declaring that a problem has been fixed when they only fixed the obvious
occurrence.

------
danShumway
> If you have the right tools it's often quicker to implement something from
> scratch than going to all the trouble of downloading compiling and
> installing something that somebody else has written.

Heavily agree. Shameless self plug, this is the philosophy behind
Distilled[0]: provide a good-enough, low-abstraction tool so you can build on
top of it without it getting in your way or forcing you to memorize
configuration options.

Adding unnecessary black-boxes to your code can occasionally come back to bite
you. I try not to be religious about this, but I am a little biased towards
avoiding the abstractions of the 3rd-party libraries unless they come with a
lot of benefits. A big "aha" moment for me was trying to teach interns how to
use Grunt/Gulp, and eventually realizing that just teaching them Bash was
easier. Then I started noticing other stuff -- for example, that the
documentation for our inline documentation library was longer than some of the
documentation pages we were generating with it.

There's a principle here that's true for abstractions in general[1], but I
find it is especially true for some 3rd-party dependencies. I regularly need
to read the source code of 3rd-party dependencies that I use, so I no longer
treat them as free.

Obviously this stuff isn't a hard rule. I still use 3rd-party dependencies.
But there's a balance here. You should be at least a tiny bit cautious about
ready-made solutions.

[0]: [https://distilledjs.com](https://distilledjs.com)

[1]:
[https://peertube.danshumway.com/videos/watch/2daaee22-4d92-4...](https://peertube.danshumway.com/videos/watch/2daaee22-4d92-47cf-a211-9d44a5e3e50a)

------
pvtmert
if any given task is trivial (eg. can be done within minutes manually)
author's method is the way I go too.

Instead of downloading node.js application with numerous dependencies, I also
prefer implementing it in few lines of code.

Some can claim running a 'proper' application (like nodejs/express with JWT
authentication) is better. But if your business not solely depend on your
solution, trivial stuff wins on the long run IMHO.

Of course security/access-control is debatable...

~~~
hinkley
I don’t think anyone is suggesting downloading a library is better than
writing a single short function. It’s a false dilemma.

People who are salty about this subject have dealt with NIH situations where
people think they can do better, and are either wrong immediately, or become
so when their attention shifts to some other problem.

I have written and maintained internal libraries. Many times it’s the wrong
choice, sometimes it’s the only choice. But I’m also more responsive to
feedback than the NIH bozos my coworkers typically complain about, which may
in part be why I get to hear about their grievances.

There’s an old half-joke, half-theory that we should elect a President who
doesn’t want the job, instead of people who do.

You should have the individual contributor who doesn’t want snowflake code
write the snowflake code. They’ll keep it no-nonsense.

------
bluedino
Imagine you have a project where you need to draw a fairly complex object. You
have found a file in a standard format that you can use.

However, there are 10 dependencies, some of which are non-trivial. You spend 4
hours changing versions of libraries, downloading a newer compiler than the
one you are using, and never get the thing running.

You could have slapped some code together to read the file, and then draw it
yourself. It would have taken a half hour.

However, later on you are tasked with with representing many more objects,
which have corner cases you didn't account for. This eats up hours and hours
of development. Also, you're then tasked with drawing those objects in many
different ways, more and more development has been added.

The two take aways are choose based on what you have to do, and make that
choice based on knowing 100% what you have to do. If a customer can't make up
their mind, they're hurting development whether they know it or not.

BTW, the objects were GIS shapefiles.

~~~
ldng
If there is a domain I would not DIY, it is GIS. It is very often much harder
than it looks. Was by any chance one of the dependencies GDAL ?

~~~
bluedino
basemap, maplotlib, geos...

The easy solution was to just use Conda.

------
amelius
Sending data between two machines should not be difficult even in 2006, and
this post shows that it isn't. However the author seems to want to imply that
this would hold for _any_ task, which is of course totally false.

------
hirako2000
Writing from scratch costs more, but provides full scope for creativity.
Reusing increases productivity, at the costs of reduced scope for creativity,
along with other hidden costs..

------
hirako2000
I wouldn't call the article biased, but I don't know many people who can write
erlang while I do know many who can write Java, c#, c++

The code and steps needed to compile/bundle/start/maintain for many other
languages would be far more than in erlang or other beauties.

Where are the tests? Even in erlang, unit tests are useful. Why would I trust
my code? What about security? You might not want this server to allow nasty
things to happen if unintended use takes place.

So even in this particular case, I think it is worth evaluating opensource
solutions, their maintenance, their license, and have a proper server in place
that will be far easier to maintain and trust its robustness (security and
availability)

~~~
yxhuvud
I think it is fair to assume that Joe, the original author of Erlang, was
working on a project that already was using Erlang. Sadly, he is not around
anymore so it is not possible to bring clarity into the particulars.

------
thijsvandien
An important reason to try it from scratch first is that it helps your
understanding of the problem. That way, you can either confirm that it's
better not to take on a dependency, or leverage the experience to find,
evaluate, and appreciate the right dependency when one turns out to be
desirable. Just like people build their own web frameworks before they settle
for an established one. That's a good thing, in my opinion.

------
SnowflakeOnIce
RIP joe We miss you

------
jackcviers3
The biggest reason to introduce dependencies, in my mind, isn't immediate
productivity boost, but:

> With enough eyes, all bugs become shallow.

------
smitty1e
Everything is easy when you know how to do it.

It's that learning curve. . .

~~~
convolvatron
right. the other side of that learning curve is that if you implement some X
instead of integrating a third party library..even if you didn't before, you
now know a lot more about X. enough that you might be able to debug issues
with X, or have an informed opinion as to the quality of other implementations
of X

------
airstrike
[https://xkcd.com/1205/](https://xkcd.com/1205/)

~~~
clarry
That's a nice chart to keep in mind, but kinda tangential. Joe's point is that
with the right tools, it is sometimes faster to write your own solution than
it is to pick one from all the available canned solutions and learn to build &
install & configure & use it (and figure out what to do when it doesn't work
like it's supposed to).

~~~
zimpenfish
> Joe's point

I think his point is more it's faster to solve the problem you have ("I want
to get files from machine X") rather than the problem you imagine you have ("I
need an FTP server") - it's just wrapped up in "I solved it myself using
Erlang".

~~~
clarry
Well both Joe's program and an FTP server & client are possible solutions to
the problem "I want to get files from machine X."

But I very much agree that the speed of the custom solution can in large part
be attributed to the fact that you can reject the cost and complexity of
existing solutions like FTP (which set out to solve more than just Joe's
immediate problem).

Similarly, I often transfer files by piping them to netcat.

EDIT: the point about tools still stands. For example:

    
    
        host1$ nc -l 1234 > my.file
        host2$ cat my.file | nc host1 1234
    

This is very easy to do given that unix is my toolbox. It'd be much more
involved to write from scratch in (say) C.

On the other hand, Joe's tool is much more powerful; it doesn't require him to
manually interact with the shell on two different hosts, and his program can
list files on the remote host and automatically download them under the
correct name whereas I'm stuck manually redirecting nc's output to the right
file (and cursing myself when I accidentally forget to change the name and
overwrite a file I wasn't supposed to fetch). Those features were very easy to
add to his program because Erlang is a powerful tool. On unix shell, it's much
harder.

~~~
gmfawcett
Consider adding 'tar' on each side to avoid naming errors. If you squint just
right, 'tar' is "the pipe fitting that remembers what things used to be
called." :)

