
A Static Future: The magic of compile-time workflows - joshwcomeau
https://joshwcomeau.com/gatsby/a-static-future/
======
andrewingram
"It's exactly the kind of intensely-dynamic application that seems
inconceivable to build statically"

Statically is to some degree how these things used to work. Before hosting
with databases was widely available, internet forum software (such as UBB)
used to work by regenerating all the affected pages when posting new threads
and replies. They relied on simple files as the data source. They worked
really well to a point, until each update took so long as to be intolerable.
The practice was then to prune older content to fix the performance issues.

The biggest problem area for SSGs is sites with user-specific content where
dynamic "pop-in" after load isn't a tolerable solution, or where there's a
combinatorial explosion of pages. The ecommerce example is interesting because
it's not the 50m product pages that's the problem, it's all the faceted
navigation/search pages where it _may_ be desirable to support search
indexing, the number of unique pages here may as well be considered unlimited.
All of these problems are solvable with SSGs to some degree, but there's
always going to be a tipping point where the amount of engineering contortion
you're having to perform just to avoid using servers isn't worth it.
Especially when we have a little thing called cache headers which work really
well if you set them up correctly.

~~~
EE84M3i
I think 4chan still does this to generate their site. It's all plaintext files
-- even the "api" is just flat JSON files! It would be interesting to see what
caching instructions they send to the CDN, because you can definitely notice
it sometimes if you're refreshing busy threads.

------
chrismorgan
This is not a new concept by any means—it’s reviving a concept that has been
around since at least the very early days of the web.

It’s known as push-based caching, as distinct from pull-based caching.

Push caching: when something changes, you immediately propagate the change to
every location, to be valid until replaced. That _used_ to be the standard
model. It’s precise, but its eagerness can lead to prohibitive storage
requirements (combinatorial explosion is very easy to achieve, and now you
must store every computed state, not just the wrapping and the data) and cost
of editing if the change has to propagate far (e.g. if you put a list of the
five most recent blog posts in a sidebar of a site, making a new blog post now
requires that _every single page_ be regenerated). Also it’s conceptually more
difficult to get right—doing no caching is far, far easier.

Pull caching: responses are generated on demand and cached for typically a
limited time, meaning that you may serve stale data for up to your cache
lifetime. But the laziness (that it doesn’t generate everything possible)
works in its favour for many situations.

Hybrids are possible. Your cache may have the ability to programmatically
invalidate entries, so that if you can calculate which pages _would_ need to
be regenerated, you can tell the cache, supplanting time-based caching and
potentially yielding the most database-efficient result (no extraneous content
generated, but pages cached exactly as long as they will be valid).

Build pipelines like Make are similar: you pull by specifying the resource you
require, but its validity is determined by the dependency tree, and the tool
essentially pushes things, materialising the necessary resources, until the
requested resource is complete.

Push caching has an elegance that pull caching lacks (and I much prefer it,
when feasible), but there _are_ reasons why it’s not the standard model of the
web any more. It’s tougher to implement well, and has scaling limits that can
constrain your design.

\----

As a practical application of this: in the article itself, it speaks of the
reduced costs and better scaling of the static approach. But it doesn’t take
into account the possibility that you generated a bunch of pages that weren’t
ever requested (e.g. everyone read the blog post about widgets, but no one
opened the list of posts tagged “widgets”, even though you had gone to the
trouble of generating it), and so could easily have made more requests than
you would have if you had instead had regular pull-based caching in place.

~~~
PaulDavisThe1st
There are thousands (millions? billions?) of websites that are not replicated,
do not use a CDN, have no multiple locations.

When you rebuild the site due to some change, you are updating one single
folder heirarchy on one server somewhere. No caching. Just build (and if
necessary as a separate step, copy/install).

I didn't not see anywhere in his writeup that suggest that you generate pages
based on requests.

~~~
chrismorgan
If you are modifying HTML directly, no cache is involved.

But if you are modifying something that is used to _generate_ the HTML, what
you deploy is a complete cache of the operation of building the site. The fact
that you could throw it away and build a new version is what makes it match
the definition—just because no piece is _colloquially_ known as a cache
doesn’t stop it from being just that. Build artefacts are a cache of the
operation of compiling the source.

To demonstrate this, putting the “requests” part into it: imagine hooking the
static site generator up to the server directly, so that a request to
/widgets.html turned into “run the static site generator, but only ask it for
what it would output for /widgets.html, then respond with that”.

The terminology I used is quite correct and widespread in the industry at
large—it’s just that it’s not at all in vogue in the JAMstack world.

------
ealexhudson
To be honest, if you dynamically generate a site and include the relevant
headers, a web cache in front of the site will do all the "compilation"
without any of the headache of a specific ahead-of-time process.

If you get a really large site (= in terms of pages), it's often easier to
invalidate an entire cache and allow it to rebuild lazily than go through and
linearly reprocess every page, and you don't get any examples of pages having
both "old" and "new" content as you browse the site while republishing is
taking place (although you can also accomplish this with content-switching
once the full rebuild was done).

~~~
zozbot234
Wikipedia does this. Logged-out readers get cached versions of the Wikipedia
page (though they can use a manual "purge" action in order to trigger an
update when needed), logged-in users get a SSR version rendered by PHP
starting from replica databases. Edits go to the master/source-of-truth DB.

------
siscia
I am quite deep into this idea of static website and I had a little of
experience building one ( now defunct ) and I provide tools targeted to static
website developers (
[https://simplesql.redbeardlab.com](https://simplesql.redbeardlab.com) )

The experience of creating one with python and netlify was superb, you just
push out all the pages that you need and you call it done. Simple, fast,
cheap, nothing to complain.

However, I need to nitpick the author, what he calls compile time, is "compile
time" only in the world of frontend development (and in most of the case is
not a compilation step). Moreover, also Gatsby call the step "build" and not
compile (and for good reason) so I don't see the reason to introduce another
source of confusion. Indeed I was running a python script that at _runtime_
create the correct webpages.

SimpleSQL provides API to interact directly with databases from JS or making
API call. That I believe works amazingly for static website, provided that
people want to write SQL.

One problem that I see is how to do authentication and I am working on fixing
it. Different credentials, one only for logging, and if you are able to login,
that you will receive a credentials for accessing your own data. The logging
credential can be stored in the cookies or wherever make sense for the
application.

~~~
MereInterest
>However, I need to nitpick the author, what he calls compile time, is
"compile time" only in the world of frontend development (and in most of the
case is not a compilation step).

I run into that issue with a lot of web frontend terminology. Sometimes, it is
harmless, like calling it "tree shaking" instead of the more standard "dead
code elimination". Other times, it can cause a large amount of confusion, like
with "client-side rendering" and "server-side rendering". In all other
contexts, "rendering" refers to the generation of an image. Yet somehow in web
development, "rendering" refers to the generation of HTML, and has nothing
whatsoever to do with the generating of an image to display to the user.

~~~
Tomte
"Tree shaking" is standard terminology in Lisp (and possibly Smalltalk?).

It's also a narrower term, since DCE is used for standard optimizations in
e.g. C that have nothing to do with a heap of objects.

------
arkanciscan
If your definition of "static site" is simply that the server doesn't generate
any HTML at request-time, then any SPA is "static" (just one empty index.html
and a web packed bundle.js generated at compile-time and an API call does the
rest). But if that's the case the static future started about 7-8 years ago!

Personally I think that if your client script is making AJAX requests that
patch the content of the document after the initial HTML is loaded and there's
no other way to see that content then it's not a static site. In other words,
a static site to me is simply one that works with JavaScript turned off.

The author seems to be talking about JamStack, where-in every change to the
content of the site constitutes a recompile. I have serious reservations about
using that technique on a site with many users making changes, like a social
network or a comment section.

Also, since you're all probably gonna try to fight me; there's nothing wrong
with SPAs, and Gatsby and Next.js both do SSR which ticks my "works without
JS" box.

~~~
MaxBarraclough
Interesting point. Wikipedia's definition of 'static web page' quite
explicitly includes pages that use JavaScript, it just precludes the server
from dynamically generating the pages. To quote: _Dynamic functionality must
be performed on the client side_.

An SPA would be expected to use APIs, which wouldn't be static, but if we
treat the APIs as separate from the static web page, they might still make the
cut.

This random article [1] considers _API-powered static website_ not to be a
contradiction.

[0]
[https://en.wikipedia.org/wiki/Static_web_page](https://en.wikipedia.org/wiki/Static_web_page)

[1] [https://hackernoon.com/quickstart-an-api-powered-static-
webs...](https://hackernoon.com/quickstart-an-api-powered-static-
website-1cc140205df9)

~~~
arkanciscan
I didn't mean that a static site doesn't use JS at all, just that it _can_
still work when JS is disabled. It's not that we necessarily expect people to
turn off JS, but that the site can still be fully represented using only
static files.

------
kasey_junk
The static movement needs to figure out a clever way to do Authn/Authz before
it will really take off.

If I have to hit a db for that every page I lose a lot of the value and might
as well not bother.

~~~
EE84M3i
Don't a lot of sites offload auth to the CDN layer? I think every popular CDN
has a way you can do it.

------
baybal2
The biggest thing about that are CDNs.

The biggest growth pain for sites passing 1 million per hour mark is that the
application side is not scaling well.

The moment you go for a multi-DC setup, from then on all your code have to be
written with that in mind. Very often that is an overkill just to show some
pictures, user profiles, and articles.

Third point, having one or more application servers going down becomes nowhere
near as critical when your site was served from a single nodejs instance,
which you have to debug under load to see why it crashes

------
xg15
Seems to me, you could archive similar benefits by generating your HTML
server-side on request (the classic pre-ajax way to do things) but also
allowing your generated pages to be cached.

I guess the challenge would then be cache management, as you'd have to return
Last-Modified or ETag headers for dynamically generated pages - meaning you'd
have to hit the database to populate those headers.

However, seems to me, this could still be less work than statically generating
everything up-front.

~~~
jacobr
Instead of running build_entire_site when updating content, you can run
invalidate_cache

------
jacobr
> If you change the header component, and that header component is shown on
> every page in the site, you will have to give it some time.

To me this invalidates SSG for any busy site with a somewhat large menu. Any
time you change the order of some options or rename an item, everything needs
to be rebuilt.

If you combine it with something like Edge Side Includes it could work I
guess, only rebuilding the part if the page that’s needed.

------
fxtentacle
I remember some years ago, generating your data files directly into a web-
enabled S3 bucket was all the rage. Basically, that's what is now being called
"incremental builds".

It appears that the sad story of web development frameworks is that a new team
will reinvent the wheel every year so that nothing will ever be fashionable
and production-grade at the same time.

I have some internal C++ servers running for 10+ years now. I wouldn't be able
to imagine that with Rails. When we set up our current website, AngularJS 1.4
was new and considered very fashionable. By now, everyone treats the entire
AngularJS project as archaic.

I predict that in a year, there'll be a new hip web framework and its acolytes
will then have their own Heureka moment with static files.

~~~
zozbot234
10+ year old C++ is most likely not good C++. I wouldn't trust such a network
service to be reasonably secure. There are better solution nowadays even in
the web development domain, that don't compromise on performance.

------
Arkdy
It's in Clojure, not Gatsby, but I've been working on statically serving
citations to covid-19 data ([https://devpost.com/software/coronavirus-charts-
org](https://devpost.com/software/coronavirus-charts-org))

------
thelastbender12
Building a component one time and using CDN as a cache seems really efficient.
The article also mentions plausibly extending this to support an application
like Spectrum. Really curious how would that work since Spectrum has real time
chat like features.

------
lachlan-sneff
What a beautiful page.

