
Show HN: Smallest Node.js Docker images - astefanutti
https://github.com/astefanutti/scratch-node
======
tootie
The new game of Docker Golf. I once spent like a day trying to debug an issue
with pruning dev dependencies from my prod docker image before I stopped to
realized how much money I was wasting to save $0.0001 of cloud disk space. It
is kinda fun though.

~~~
drinchev
That's something that I ask myself everyday. Is it worth it to push the big O
notation to the limit, saving couple of megabytes RAM or I can simply deliver
and have a happy boss.

I've always been fond of "premature optimisation is the root of all evil" [1],
but still... wasting resources makes me feel bad. I feel like the guy that
buys 10 plastic bottles of 0.5l water instead of 2x2.5 litres.

The guys building ( owning ) Slack, cryptocurrencies, e-commerce websites are
on the other end though. I hardly can count a day when my MacBook's fan
doesn't spin like crazy, because I'm just visiting an HTML page to read the
text.

In other words. Yeah ... you can save some precious time to optimise that,
because you don't want to be the person that starts the fans of colleagues'
Macs ( having a docker image with your app running on my Mac ). Energy is what
you are saving and that's priceless in today's polluted world.

1:
[http://wiki.c2.com/?PrematureOptimization](http://wiki.c2.com/?PrematureOptimization)

~~~
hinkley
Make it Work, Make it Right, Make it Fast.

On several projects the management and sales team were at odds with the dev
team about app performance. They wanted it faster and the dev team was full of
premature optimization cargo cult members. By cult member, I mean people who
say that phrase to get out of thinking.

The line between right and fast is blurry. There’s a big chunk of refactorings
that improve readability _and_ performance and if you work with those you get
better at your job, avoid the cult members and please the business.

Now that I’ve typed that last paragraph I want to go through Refactoring and
categorize the ones that fall under “right and fast”...

~~~
jay-anderson
Very much agreed. Like all things balance is needed. Often it's worth thinking
about performance during initial design. There's the idea that fixing bugs
earlier in development is cheaper. This also applies to performance issues.
That's not to say you spend an inordinate amount of time focusing on just
that, but it needs to be given its due.

------
rubenhak
This is great! Thanks for sharing. I can recommend a small change to get most
out of docker caching and reducing docker build times to the maximum extent.

In the "builder" you only need package.json and package-lock.json files to
install dependencies. The rest of the sources can be copied in the "scratch-
node" image. This would make caching work till the last line where only code
changes are included. Code is modified much more frequently than dependencies.

You sample can look like this:

FROM node as builder

WORKDIR /app

COPY package.json package-lock.json index.js ./

RUN npm install --prod

FROM astefanutti/scratch-node

COPY --from=builder /app/node_modules /node_modules

COPY ./ ./

ENTRYPOINT ["./node", "index.js"]

------
martin-adams
That's pretty cool. What it appears to do is to build Node statically in one
builder container, then exports a scratch container and copies the one binary
and one user config into it. So what's built is extremely minimal.

~~~
web007
Multi-stage Docker builds are an underutilized pattern.

Go ahead, go crazy and add all of the dev dependencies you need to build your
package. Once you've done that, take the built package and put it into another
container that has _only_ the runtime dependencies.

The ideal use-case for this is compiling Go, since you end up with a 1GB build
container and a 12MB single-binary production container if you compile with
static linking. Just beware when going the FROM SCRATCH route that you get
nothing to go with it, you can't shell into the container or run "ps" or
"lsof" for debugging because none of those exist.

~~~
weberc2
> Just beware when going the FROM SCRATCH route that you get nothing to go
> with it, you can't shell into the container or run "ps" or "lsof" for
> debugging because none of those exist.

Your image processes run on the host anyway, so just `ps` or `lsof` from the
host. I've never had to exec into a Go/scratch container.

~~~
amazingman
> Your image processes run on the host anyway, so just `ps` or `lsof` from the
> host.

You can’t do this if you don’t have access to and root privileges on the host.

------
tzaman
Honestly, the size doesn't matter.

I was once in the camp of small Docker images, but realized it's simply not
worth the tradeoff, since there's only one upside to them, and that upside is
fast transfer of images.

However, that argument becomes pointless when using a proper CI/CD stack. As a
developer, you don't normally upload images yourself, but push changes to
GitHub, then Jenkins/Travis/whatever takes over, builds the image, and pushes
it into production/staging/whatever. Since CD tool of choice is usually also
on the cloud, we don't have to worry about image size, nor to any of the CD
vendors charge for data transfer.

I'd rather have bigger images (I base mine off Debian now, used to be Alpine)
and not have to worry with lack of ported tools and libraries, than vice-
versa.

~~~
moltar
What if you need to release a hot fix and your image is 1Gb?

~~~
tzaman
1\. I push a hotfix to GitHub. 2\. Jenkins (which is on Google Cloud) builds
it, and it already has all the Docker steps cached from previous builds, so
it's fast. 3\. Jenkins pushes the image to Google Cloud repo, which is almost
instantaneous 4\. Kubernetes (also on Google Cloud) pulls the image and makes
a new deployment

No big deal. :)

------
WD-42
This is neat! However I don't think I've ever seen a node project who's
node_modules wasn't at least 10x the size as one of these images.

~~~
vidarh
Size is only part of the equation. Fewer binaries you have to worry about
security updates of is another.

~~~
kpcyrd
I see that argument a lot, but this assumes that vulnerabilities in binaries
that are never executed are magically exploitable from the internet.

It doesn't really matter if a container contains a 5 year old imagemagick
binary if that binary is never used by anything. It's the equivalent of a bug
in unreachable code.

~~~
phamilton
Unless the exploit makes unreachable code reachable.

Security (and privacy) are largely about minimizing surface area.

~~~
kpcyrd
The "surface" in "attack surface" implies reachability.

Your argument doesn't make any sense, how would "the exploit" make an
unreachable vulnerability reachable without being able to execute the
vulnerable code in the first place?

Please don't say "using a different vulnerability that allows us to execute
arbitrary code".

~~~
phamilton
> Please don't say "using a different vulnerability that allows us to execute
> arbitrary code".

Sorry, but I'm going there anyway. Imagine two different exploits. One is a
remote code execution exploit and the other is a privilege escalation exploit.

Let's say your application has an exploit an attacker then manages to obtain a
reverse shell (imagemagick, xml parsing, etc all have had multiple such
exploits over the years). If you're running things correctly, that reverse
shell is not privileged. It's the apache user or something. Not a good
situation to be in, they can do a lot of damage, but at least some things are
safe. They don't have root.

Now the attacker finds that you have X11 installed. It's an old version that
was installed by default. It happens to have a root privilege escalation
exploit via fonts. Now the attacker has root.

That's what I mean by surface area. Thinking in terms of "have we been
compromised" isn't sufficient. Being able to contain the attack is important,
and dead code lying around factors into how well you can contain it.

------
xaduha
Similar one here [https://github.com/mhart/alpine-
node](https://github.com/mhart/alpine-node), base-* versions are static.

~~~
styfle
That was my first thought too but scratch-node is a bit smaller than alpine-
node. I created a ticket to see if alpine-node could get smaller:
[https://github.com/mhart/alpine-
node/issues/133](https://github.com/mhart/alpine-node/issues/133)

------
ajuhasz
There’s also google cloud’s distroless project:
[https://github.com/GoogleContainerTools/distroless](https://github.com/GoogleContainerTools/distroless)

We’ve used the Node.js containers in production and investigated the other
languages, but never deployed them. We did have some issues with devops not
being able to log into the running containers, but always found a solution
that I believe in the end was a better long term pattern for ops.

------
mikepurvis
A scratch container doesn't even have busybox in it, does it? If not, this
wouldn't be able to run npm, much less install anything which has bindings to
other libraries.

Definitely a cute experiment, but probably of limited real-world use. I wonder
what the smallest _practical_ node container would look like?

~~~
philplckthun
I suppose you could apply the same builder/scrstch separation as this
Dockerfile and install using npm/yarn/etc and copy over the result

It's certainly only practical when every byte matters. At that point it might
also make sense to prebundle the dependencies and copy that over.

It'd be interesting to see whether this becomes relevant if serverless-like
constraints suddenly apply to a Docker cloud service

~~~
orliesaurus
wait, do you mean adding another layer for npm installing other deps?

~~~
jopsen
No, doing npm install in a different container and copying over the result.
Docker has multi-stage build, which allows using a container to build and
copying over in a different result container..

I suspect this could be relevant many place, keeping images small also hardens
security.

~~~
orliesaurus
Oh, what are the advantages of doing that in a different container as opposed
to a different layer? Trying to understand the pros/cons.. Isn't it faster to
add extra layers, than extra containers?

~~~
heroic
This is the builder pattern in docker. You do things in one "throw away"
container, use results from this container and copy them into another. The
final container gets only one layer now, instead of maybe 10s that your "throw
away" container had.

------
bloopernova
Figure I would ask here: Have any DevOps or build folks had to deal with
compliance audits regarding their docker containers?

It's something certain developers I've encountered seem to ignore, even when
creating something that might handle health or financial information.

Did you have to build your docker images from scratch, or did the security
audit folks certify upstream images? What about updates?

~~~
choosegoose
We use Twistlock ([https://www.twistlock.com/](https://www.twistlock.com/)) as
it does the CVE scanning but you can setup rules for compliance, binary
monitoring and a whole plethora of other security/auditing type things. It
also has a jenkins plugin so you can fail builds if a certain threshold of
CVEs/compliance failures are introduced by developers (the only way to
actually get the team to care about security).

Our security folks haven't really decided what to do with containers although
some people are just using RHEL7 base images since its "enterprise-y". Our
group personally uses alpine base images. If we have something like a java
service hosted by Tomcat, we build alpine then build tomcat and then build our
"service" container. While most people are fine pulling from Dockerhub, we do
work in closed-loop environments and have a private docker registry where we
host our "chain" of docker images which are versioned and updated regularly.

~~~
bloopernova
Thank you, I will check that out.

------
tlrobinson
A 15MB Node.js Docker image, to which most people will add 200MB of
dependencies from npm :)

------
alexnewman
Good this way I can't debug my docker container by jumping into it. Also I
wouldn't actually want to call any OS features, so this protects me from that.
Just kidding. This is a cool project, but probably isn't super productiony

~~~
vidarh
Pretty much everything you might need to do to debug it can either be done
from the outside or can be achieved by copying binaries in temporarily,or
execute the same image with a volume mounted.

------
moltar
Does it work with statically linked modules?

------
kowdermeister
What does scratch mean in this context?

~~~
granra
The scratch image in docker is an image containing nothing.

------
nurettin
but will it work on armv7 or later?

------
11235813213455
no npm, no shell, not practical :)

~~~
bloopernova
Why? Containers are supposed to be idempotent. They shouldn't need npm, a
shell, or other stuff really.

(not looking to argue, if there's another reason to keep these tools around,
please let me know, I'm always looking to learn more)

