
Incremental Builds in Gatsby Cloud - Dockson
https://www.gatsbyjs.org/blog/2020-04-22-announcing-incremental-builds/
======
skrebbel
Having never used a static site generator in anger, can someone explain to me
like I'm five what's going on here?

My understanding is that Gatsby is a tool that converts a bunch of markdown
files into a static HTML website. Why is slow builds a problem for any static
site generator? Why does it need a cloud?

In other words, what problem am I supposed to be having that any of this
solves?

Note, I'm trying not to be skeptical here - my company's website is hand-
maintained HTML with a bunch of PHP mixed in so I can totally imagine that
things may be better. But I don't understand the kinds of situations where
using a 3rd party cloud to generate some static HTML solves a problem.

~~~
elviswolcott
Gatsby is a fairly complex static site generator. At the highest level, it
provides an ingest layer that can take any data sources (CMS, markdown, json,
images, or anything that a plugin supports) and bring them into a single
centralized GraphQL data source. Pages (which are built using React) can query
this graph for the data they need to render. Gatsby then renders the React
pages to static HTML and converts the queries to JSON (so there's no actual
GraphQL in production).

This process is fairly fast on small/simple sites. Gatsby is overall very
efficient and can render out thousands of pages drawing from large data
sources rather quickly. The issue is that Gatsby isn't just used for personal
blogs. As you can imagine, a site with thousands of pages of content that is
processing thousands of images for optimization starts taking a long time to
build (and a lot of resources). For example, I'm building a Gatsby site for a
photographer than includes 16000+ photos totaling a few hundred GB. Without
incremental builds, any change (e.g. fixing a typo) means every single page
needs to be rebuilt.

Incremental builds means you don't have to rebuild everything. Because the
data is all coming from the GraphQL (which Gatsby pre-processes and converts
to static JSON), it is possible to diff the graphs (i.e. determine what data a
commit has changed) and determine what pages it affects (i.e. which pages
include queries that access that field). From there, Gatsby can only rebuild
that changed pages.

This not only means faster build times, it also means that only the changed
pages and assets have to be re-pushed to your CDN. This way, content that
hasn't changed will remain cached and only modified pages will have to be sent
down to your site's users.

~~~
earthboundkid
But if you have 16,000 of anything, why are you using a static site? Surely
the access patterns are long tail and you need to build more often than most
pages are even accessed.

~~~
cameronbrown
Cheaper to build a site and dunp files to a bucket than running Wordpress code
on every request.

~~~
flukus
Surely there's a CPU/disk trade off at some point. Static pages are much
larger (less likely in memory) and would cause disk reads much sooner than the
same files being generated dynamically. Of course wordpress isn't known for
it's efficiency so the static page preference is probably quite high.

~~~
robertlagrant
> Surely there's a CPU/disk trade off at some point

At an extreme case, yes. Disk is SO CHEAP.

~~~
flukus
I was thinking more about the time cost. Disk is cheap but slow compared to
memory.

------
kylemathews
Gatsby founder here.

Really appreciate the feedback and support for our launch today! The team
worked super hard to get Incremental Builds live in public beta but are taking
all the feedback (here and all over the web) as we go into full launch. Let us
know what you think. Thanks!

~~~
denster
Kyle,

Just read the post, congrats on the launch!

We've been using Gatsby on:

[https://mintdata.com](https://mintdata.com)

for the past few years, and are _huge_ fans of your work.

I still recall the day when I brought Gatsby into our org, our front-end guys
almost ate me alive :D

They said: a React.render(...) + GraphQL thing, why do we need it? What's the
big deal?

Fast forward a few years later, and Gatsby dominates (in my opinion) the best
way to build a static website based on React.

Keep up the awesome work!

Your true fan, Denis

~~~
rayshan
Wow MintData looks so cool! I was just trying to figure out whether Webflow
can be used to build simple apps, then I saw this, a whole new level. Is there
a way to try it?

------
turadg
This is great! Is there any technical limitation keeping this from being part
of the open source version?

I get that Gatsby company put a lot of effort into this and wants a return on
that investment, and good for them. I assume a third party could offer the
same but why would they compete at the same value prop.

However an open source version to not be reliant on any company would be
compelling to many.

~~~
kylemathews
We recently introduced build optimizations for incremental data changes for
self-hosted environments: [https://www.gatsbyjs.org/docs/page-build-
optimizations-for-i...](https://www.gatsbyjs.org/docs/page-build-
optimizations-for-incremental-data-changes/) and are continuing to improve
build speed across platforms.

To reliably provide near real-time deployments, we need tight integration with
the CI/CD environment to optimize and parallelize the work; that's why you’ll
see the fastest builds and deploys through Gatsby Cloud — the platform is
purpose built for Gatsby!

------
seanwilson
How much is the speed issue related to the language used? I know Hugo is an
order of magnitude faster than most static site generators for example - it's
written in Go with e.g. 2 seconds to generate about 10K pages
[https://forestry.io/blog/hugo-vs-jekyll-
benchmark/](https://forestry.io/blog/hugo-vs-jekyll-benchmark/).

I would have thought the generation process could be massively parallelised
and a typical blog page would only need a modest amount of computation e.g.
concat header, footer, pull in body text, resolve a few URLs. I can't help but
think about how much work a typical computer game is doing in comparison 60
times per second even without a GPU.

~~~
turnipla
I don’t think it’s a language issue. Even for JavaScript bundlers you have the
slow extensible bundle and the “new super fast bundler” that dies in a month
because it only fits one use case.

How flexible is Hugo? And how many plugins does someone generally use?

~~~
seanwilson
> How flexible is Hugo? And how many plugins does someone generally use?

It processes Markdown, JSON, YAML and SASS, can pull in data files from URLs,
and has custom templates/themes, custom macros/shortcuts, image processing and
live reload. It doesn't have a plugin system as far as I know but nothing
stops you combining Hugo with other tools e.g. run a JS script to pull in and
transform a JSON file before Hugo runs.

~~~
turnipla
I think that’s the point. No plugin system. Compare Babel to Bublé or even
Sucrase for example:
[https://github.com/alangpierce/sucrase](https://github.com/alangpierce/sucrase)

Preparing data for external use always takes extra effort.

You can build an efficient self-contained tool in JavaScript too.

~~~
ratww
A counterpoint: Babel's extensibility doesn't matter in practice at all, other
than helping the Babel team organize their code.

Pretty much every new ES6 feature required parser and babel-core changes just
to be able to be used.

Example: a lot of changes that only worked in Babel 7 (that was on Beta for
months) were not possible in Babel 6, and so on for previous versions. A
plugin was not enough: you also needed parser/core changes.

Other than for novel non-standard features (like code substitution), plugins
are not exactly that powerful, and even things like that are frowned upon in
most environments, as 99.9% of people just want ES6 features.

------
gnalck
Is the technology behind "incremental builds" being upstreamed into the open
source project?

------
dergachev
Our team's been waiting on this for a year to start moving larger sites to
Gatsby. Can't wait to try it.

------
alexgvozden
so this is cool release, and no objection on that but if your pipeline has
automated testing, security scans and more then you are not actually deploying
in 10s

more technical details would be good but I guess either I missed it or they
look at it as IP

~~~
ascorbic
You wouldn't be running automated testing etc on data updates though, surely?
That's what this feature is for, not code updates.

------
WnZ39p0Dgydaz1
Javascript re-invents "code typing" (Typescript)

Javascript re-invents "Promises" because callback hell

Javascript re-invents "compilers" (babel)

Javascript re-invents "build systems" (webpack, etc)

Javascript re-invents "caching" (incremental builds) - but paid, and in the
cloud

Because why not.

~~~
kaishiro
/s/re-invents/implements/g and suddenly JavaScript is running a successful dev
cycle.

------
arpowers
You still need to use an api for everything. Good apps need a backend; not a
JAMStack fan for anything but the most basic of sites.

~~~
bmelton
The 'backend' here is ... HTML. For read-only a blog, that's likely more than
enough. Otherwise, for dynamic content like contact forms and such, I don't
know if there's a meaningful benefit to building out a whole site in
PHP/Python/Rails or something (and paying commensurately more in hosting) than
to use Formspree or something similar.

Yes, it calls an API. And thankfully with Formspree, it's pretty easy to see
the price breakeven points vs. hosting, but there are benefits to be had.

~~~
dgb23
Content has to come from somewhere. Operational complexity is increased, not
decreased, since you are running a build server in addition to your CMS. The
benefit is easier optimizations in the frontend.

~~~
Cthulhu_
But you can outsource those (build server, CMS) to e.g. Netlify and
Contentful.

But if you don't want that, there's a billion alternatives; the CMS market is
one of the most saturated ones out there.

~~~
subpixel
I actually feel like Gatsby will be developing a CMS to round out their paid
feature-set. I have zero-inside knowledge, just a hunch that to get customers
to pay up handsomely, the product needs a deeper "fit" in the publishing
pipeline.

