
Shrinking my static site - herohamp
https://hampton.pw/posts/shrinking-this-sites-docker-image/
======
heroic
You could even use nginx on scratch to remove alpine as well.

[https://github.com/gunjank/docker-nginx-
scratch/blob/master/...](https://github.com/gunjank/docker-nginx-
scratch/blob/master/Dockerfile)

------
mattacular
Sometimes you can get space savings on docker images from seemingly odd
sources. For example, I found that running a chown command on files after
they've been COPY'd in bloats the image size significantly (100s of MB).
However, at some point Docker added a "\--chown" flag to the COPY command
which brings it back in line.

------
nikeee
I'd suggest changing this:

    
    
        COPY package.json .
        RUN npm install
    

to this:

    
    
        COPY package.json package-lock.json .
        RUN npm ci
    

`npm ci` installs the exact dependencies specified in the lockfile. This way,
transitive dependencies that were upgraded via `npm audit fix` are guaranteed
to be installed. It therefore forces the image to be rebuilt when a transitive
dependency changes. Copying only the package.json wouldn't do that. It also
errors if the lockfile and the package.json are inconsistent.

[https://docs.npmjs.com/cli/ci.html](https://docs.npmjs.com/cli/ci.html)

~~~
noahtallen
+1. npm ci is typically quite a bit faster too in my experience.

------
saagarjha
Can we put "Docker image" in the title somewhere? Otherwise, it seems like the
article is talking about the site itself (i.e. having less JavaScript,
optimizing the images, …)

------
oefrha
3 minutes to build a static blog (with a grand total of five posts) that
doesn’t look any different from decade-old blogs. Pulling hundreds of MB from
the Internet in the process. Wow.

[https://github.com/herohamp/eleventy-
blog/tree/master/posts](https://github.com/herohamp/eleventy-
blog/tree/master/posts)

~~~
herohamp
yes it is sub-optimal, but that is not because building it takes so much time,
it is because of the node_modules. I am looking into migrating to Hugo as has
been suggested by MANY people

------
superkuh
Back in 2015 when cloud offerings were still marginally new, a lot of big
providers were gettting into the game with Docker offerings (ie, IBM Bluemix)
where the charge was based entirely on RAM*Hours.

Naturally this lead to me gaming the system and making my docker images as in
RAM usage small as possible. In the end I even abandoned SSH as too heavy and
switched to shadowsocks (2MB resident) for networking the docker instances
together.

------
Drdrdrq
Aside: one should not use package.json to install dependencies. Use either
package-lock.json (and command "npm ci") or yarn.lock (and... I forget). Keep
the lock file as part of repo too, or each build could be different.

~~~
slezyr
> or each build could be different.

Or not working.

------
discordance
How about binary patching your container? - pretty sure I've seen this done
somewhere but can't find the link to it.

------
globular-toast
Why even have a docker image for a static site? If the site has to be built
like this one then just put the output of the build process behind some
webserver. We were doing this back in the 90s and didn't have to write blog
posts about how _not_ to make your site 400MB.

~~~
IanCal
> just put the output of the build process behind some webserver.

That's what this is doing.

You are skipping over two other parts which are installing and setting up
nginx / similar and creating a process for the build itself. This ties the
three together.

------
philshem
The initial build time is “about 3 minutes” but I’d like to know the build
time of the final image.

------
francislavoie
You can probably shrink it even more. The Caddy alpine image is 14MB
compressed.

[https://hub.docker.com/_/caddy/](https://hub.docker.com/_/caddy/)

You also get automatic TLS certificate management and tons of other goodies
that nginx doesn't offer out of the box.

~~~
herohamp
all of my docker containers are placed behind traefik which handles TLS
certifactes, HTTPS redirect, compression, and routing to the correct container

~~~
francislavoie
This post talks about a single-container setup though, with the static site
bundled with Nginx. That's what I'm replying to, not the usecase where you
have many containers you need to proxy to.

------
alpb
If you're using multiple steps anyway, there's no need to use nginx base on
every step.

    
    
        FROM nginx:1.17.10-alpine as npmpackages
        RUN apk add --update nodejs npm
    
    

Just do:

    
    
        FROM node:10
        RUN npm [...]

~~~
herohamp
that is true, ill switch to that when i get around to it

------
pvtmert
tl;dr author discovers multi-stage build to throw away useless nodejs
dependencies.

Same also applies to even Java. Maven downloads tons of stuff these days. You
may only be using single static string from a dependency.

------
jt2190
This approach is akin to installing all of the build tooling inside of Docker,
_then_ generating the build artifact. I'd think it'd be even slimmer to
generate the build artifact first, then just copy that into the container.

Is there an advantage to building inside of Docker?

~~~
steve_adams_86
Wouldn't the advantage be consistent build fragments? If you use the build
tools outside of docker, you won't get the same repeatable build artifact
across different machines (or as your own host system changes). Maybe I'm not
understanding you though.

------
miganga
Why are we scared of disk space in 2020?

~~~
watersb
I'm scared of _everything_ in 2020.

------
microcolonel
I'm glad they shrunk the image to make space for the _HUGE_ documentation
link.

------
quezzle
Static site and docker shouldn’t be seen together.

~~~
tailsdog
Could you elaborate on that statement, why should static site and docker not
be seen together?

~~~
alexhornbake
It depends what you’re optimizing for.

One of the beautiful things about a static site is it’s ability to be served
by object stores like S3 as your origin, and cached by a CDN.

From an operations standpoint, you are not responsible for maintaining much of
anything, the performance is super fast, highly available, and relatively
cheap (no dedicated servers, just paying for bandwidth and storage costs)

Contrast that with a docker container as your origin... it must be running
(and that is your problem to ensure it is).

If you’re optimizing for developer convenience, your traffic is low or not
mission critical, or maybe you have a globally distributed highly available
k8s cluster and that is “the way” your company does all the things... sure why
not

~~~
jrockway
Docker is nice because you end up with the same configuration in development
and production. There are many hidden details that "just let someone else host
your site" or "just rsync your files to a server" gloss over. Who is renewing
your TLS certificate? Where do you configure headers, redirects, mime type
mappings, etc.? Where do access logs go? How do you update the version of the
web server? What effects does that update have?

When you manually do these things, you rely on a bunch of implicit defaults.
Maybe your production server and your workstation happen to have the same
version of nginx, and happen to set the same defaults. So you can test a
change on your workstation and the same change works in production. But more
likely, that is not the case. So you get weird differences between development
and production, and you only notice when you push to production. That is not
ideal. Building an image with your webserver and static files ensures that you
see the same things in both places. There is no need to test anything in
production, as you have a copy of the exact code and data that is going to be
running in production, locally. You can tweak and poke to your heart's
content, confident that you'll have the same effect when you push to
production. There is no need to maintain documentation about how to build your
project and what versions of things you use; you specify them in a machine-
readable format and the machine dutifully builds the project correctly every
single time.

(One disadvantage of clean builds, though, is that sometimes you want old
artifacts to exist. Consider a case where you use webpack to generate
javascript. Typically, you'll output a bundle like "main.abc123.js" which is
loaded from "index.html" via a script tag. What happens when the browser loads
index.html from your last build, then you the next request goes to an updated
server, which says to get "main.def456.js" instead? The page silently breaks,
because the server doesn't have a file called "main.def456.js" anymore. "rsync
--delete" has the same problem. And if you never delete anything, you
eventually use an infinite amount of disk space. So there is definitely room
for improvement here, but "it will probably work if I don't think about it and
cross my fingers" is not the improvement we're looking for.)

------
azangru
> This docker image resulted in a 419MB final image and took about 3 minutes
> to build. There are some obvious issues with this. For instance every-time I
> change any file it must go through and reinstall all of my node_modules.

He doesn't tell whether there have been any build time improvements after the
changes to the Dockerfile. Will the builder docker images get cached and thus
reduce the build and deployment time?

~~~
licebmi__at__
Well, the article mentions an obvious improvement. The node_moduoes won't be
rebuilt after any changes to the code, only changes specific to the
packages.json.

Basically the main step is the COPY . . step (previously COPY . /app) which
will invalidate the cache every change to the code.

Also that on the builder, the steps beyond npm run are not needed, but well
they won't improve much on the performance or space of the overall process.

~~~
azangru
> The node_moduoes won't be rebuilt after any changes to the code, only
> changes specific to the packages.json.

But that's only if the builder image gets cached and can be reused, right?

~~~
Drdrdrq
Correct.

------
1337shadow
I have no idea why they need an nginx image in their second FROM, they are
just doing some npm.

~~~
herohamp
yeah, that is a mistake. I just did not fully rethink my code when I moved the
build layers. I will be removing that tonight along with a few changed
suggested here

~~~
1337shadow
On a second thought, I also don't understand why do npm in two different
images, why not just copy the webpack bundles from the builder image into the
nginx image ?

For me the cause of the big image size was in

COPY --from=npmpackages /app /app

From your third Dockerfile, it seems replacing the above with the following
would have done the trick without adding an extra stage

COPY --from=npmpackages /app/_site/ /usr/share/nginx/html/

~~~
herohamp
I do npm in two different images so that the node_modules can be cached
between builds. this massively speeds up my build. The npmpackages layer only
installs the npm modules

~~~
1337shadow
I see, this speeds up when you change a dependency, because at least then your
whole node_modules is not thrown away is that right ?

------
kissgyorgy
YOU DON'T NEED DOCKER FOR A STATIC SITE. That's why it's called "static".
There should be no moving parts in it.

~~~
CJefferson
I have had several problems with static sites being broken by library updates.
I now have a vagrant disc image which will let me always build my site.

------
jrururufuf666
in my eyes this is all madness. deploying sites via github to some docker
shit.

how about good old ftp and a cheap shared webhost? like its been done for 30
years.

~~~
epmaybe
To be honest, GitHub sites have been amazing for me. I just have a simple
static website with personal information. HTML and CSS with minimal JS.
Purchasing a shared webhost still means extra cost on top of domain
registration. With GitHub, I just followed their instructions with my domain,
and everything just worked. I'm also not paying anything per month or year
besides domain registration which is really nice when you have to start
budgeting tightly.

I don't know about docker and all of that, though.

~~~
1337shadow
With GitLab Pages it's even easier, just fork one of the examples in
gitlab.com/pages, push a content update in a commit and it's online a minute
later thanks to GitLab-CI and GitLab Pages.

~~~
epmaybe
This workflow seems almost exactly the same as GitHub Pages, I'm not sure it's
worth my time to switch platforms and go through the hassle when everything
works now.

------
rumanator
This post says a lot more about the javascript ecosystem than Docker. Multi-
stage image builds are nothing new or extraordinary, and in fact it's Docker
101. However, being forced to install 500MB worth of tooling and dependencies
just to deploy a measly 30MB static website on nginx is something
unbelievable.

~~~
throwaway8941
How is that any different from build tooling for any other language? On my
system, gcc with a bunch of commonly used dependencies requires just about the
same space (and it's a full Linux system, not a trimmed down container).

~~~
masklinn
> How is that any different from build tooling for any other language?

On the one hand yes, on the other hand it's a large amount of crap just to
build a static site. `npm install @11ty/eleventy` pulls 600 packages and
yields a 100MB node_modules folder.

Even Sphinx (whose scope goes way beyond static site generation) "only" pulls
in about 75MB worth of stuff (a third of that being Babel) over two dozen
dependencies (half a dozen being sphinx's own subcomponents).

~~~
ashishb
The problem with JS ecosystem is less about size and more about the number of
files.

~~~
onion2k
There _are_ real problems with NPM, like the proliferation of packages that
include things they shouldn't (babel included a picture of Guy Feiri for a
while...) but the simplistic "there's a lot of files" argument is nonsense. If
you need a lot of files then you need a lot of files.

~~~
filleduchaos
> babel included a picture of Guy Feiri for a while...

I refuse to believe that there are real people so eager to jump on the JS hate
bandwagon that they unironically believed every word of an obviously satirical
article.

(babel has never included a picture of Guy Fieri.)

~~~
oplav
It might not have been a picture, but there was an ASCI art of his face that
got checked in.

[https://github.com/babel/babel/blob/f36d07d30334f86412a9d277...](https://github.com/babel/babel/blob/f36d07d30334f86412a9d2771880cb566a82a9b6/packages/babel-
core/src/api/node.js)

~~~
filleduchaos
It was checked in as a humorous response to the very satirical article I'm
talking about, and never actually published to the package registry.

But again, people are so ready to froth at the mouth wherever JS is even
mildly concerned that I really shouldn't be surprised they'd take a meme that
lasted for a handful of days seriously.

------
ig0r0
I use Hugo as my static site generator, single binary, no dependencies,
generating hundreds of pages in milliseconds ... so reading this feels so
wrong, I want to call it JavaScript masochism.

Taking a simple concept like a static site and adding a ton of complex tooling
around it because it is the trend now?

Why would you even need a docker image to run a static website? The best thing
about a static website us you can host it everyone without requiring any extra
resources, like putting it directly to some CDN as files, etc.

~~~
sireat
Speaking of complex tooling you don't think Hugo is a bit complex itself?

Took Hugo for a test drive using official quick start.

Half the themes from official docs wouldn't compile.

Started looking into architecture but official docs do not explain the big
picture just a bunch of instructions.

The official documentation does not explain on what is going behind the scenes
very well.

I mean, I will obviously want some sort of scaffolding and a way of changing
it but this [https://gohugo.io/getting-started/directory-
structure/](https://gohugo.io/getting-started/directory-structure/) is not
helping very much.

Since you are not supposed to write go code everything has to be done through
.toml config files as far as I could grok.

I feel I'd rather write my own static generator in language of my choice(say
Python) than keep up with the Hugo docs.

~~~
UweSchmidt
For simple websites you need no "scaffolding", you can just edit the html in a
text editor. Really just write the text between the tags. Better to meditate
over simplicity and minimalism than to start another software project just to
publish a blog.

~~~
sireat
That's the thing, I've been writing HTML for 25 years and JS for over 20.

Lately I've been writing a lot of .md when writing Jupyter Notebooks.

Thus I figured that Hugo would offer some advantages to relatives who wanted a
dead simple site that they wanted to add some posts from time to time.

Static site would be perfect for them but I do not want to manually add their
posts...

Started looking into Forestry.io and well it is not quite as simple as
Netlify.

~~~
UweSchmidt
Ah ok, you started with suggesting Hugo might be a bit complex...

I thought I might have found an ally, someone who has been around for a bit
and wary of the cambrian explosion of software projects. Someone who no longer
believes in all that software that comes out, with new, cool features that add
functionality, but, in my opinion, add subtle difficulty along the way that's
hard to account for, death by a thousand cuts that must eventually lead to
some kind of fatigue. Things that update and break and change and are
deprecated and abandoned and require use of a different language, different
build chain, different mental model, just to "add some posts from time to
time" to a blog.

Please excuse this little rant. I truly believe that anyone can find their
sweet spot with any system. But what is the right fit for your relatives who
want it _dead simple_?

