
Ask HN: How does a static site generator work? - litzer
Short of reading source codes, I can&#x27;t find any reading explaining how they work.
======
patio11
Short version: you maintain some data on our own machine which describe your
content. Most commonly, this is done in flat files formatted with markdown or
another plain-text option.

Your static site generator is built out of code. You execute it on your local
machine, periodically, generally when you have a new blog post to publish or
similar. The generator combines your content plus templates (either ones it
ships with, ones you build, or ones you found from the community around that
generator) to spit out a bunch of compiled HTML/CSS/Javascript.

You then upload the entire compiled site to your host of choice. From the
perspective of someone visiting your website, they don't know it was generated
at all -- they just download HTML really, really fast which _appears_ to have
bells-and-whistles from dynamic sites like e.g. navigation which is consistent
across multiple pages, related article links, etc etc.

There exist _numerous_ other ways to achieve this objective, but the above
description of a static site generator is the one most programmers
independently re-invent when they attempt to build one. I had built four in my
younger years before I realized that software like that had a name and was
available on the Internet for free.

------
imakesnowflakes
It takes text/articles in markdown format [1] and converts it to html with
some basic html tags, like headings, anchors and lists. Where these should
occur can be declared using markdown syntax.

This html with simple html tags is injected into a bigger/complex html (which
often contains the main styling/design) to form the final page of the site.

[1][https://daringfireball.net/projects/markdown/syntax](https://daringfireball.net/projects/markdown/syntax)

------
tomcam
The simplest way to create a website is just HTML and CSS, right? You can type
that shit up on your desktop, post it to a server, and voilà--website!
Advantages: fastest possible performance and way less surface area for
attacks.

Suppose you decide to add a footer to every page. If you're not a wiz at sed
and command-line scripting you'll have a big chore ahead of you. Plus you
might forget some pages. Plus logging into FileZilla or using SSH in order to
post the files can be dreary and time-consuming.

If you're putting up a catalog or other site derived from a database, you have
a whole other set of problems. Now you really want to automate. You could
write this in Symfony or web2py or Django an put it on your server. If people
go to yoursite.com/catalog/flashlights it's actually running a file called,
say, flashlights.php that dynamically queries the database and creates HTML on
the fly with a list of the flashlights category in your catalog. Advantage:
it's always up to date. Disadvantage: Slower (every visit to
yoursite.com/catalog/flashlights results in a database hit) and much broader
attack surface.

BTW that's a little bit of a simplification. You can cache the results of the
visit to yoursite.com/catalog/flashlights as an HTML file at the cost of added
complexity.

A dynamic, data-driven site like that can sometimes be intercepted via evil
methods such as a browser-issued SQL query or CSS injection (probably not with
web2py because it's got great security, but you get my point). Dynamically
generated sites, which is most of the big ones, are particularly vulnerable
because of the database. Database managers are big and complex.

OK, so imagine you have a program written in Python or C or Java or whatever
on your desktop. It can easily be modified to add footers or render HTML files
from a database, then post the pages to your server. Advantage: pre rendered
HTML files are fast and way less likely to get hacked. Disadvantage: could be
less up to date. The catalog database may be updated and the rendered HTML may
lag.

If you really want to get fancy you can have the program running on the
server. It goes through your database and templates to create each page of the
site, but instead of creating dynamic pages that run PHP or Python in order to
render a web page, it creates spits out straight HTML (aka "static HTML") with
the contents once a day or whatever. Now a visit to
yoursite.com/catalog/flashlights actually resolves to
yoursite.com/catalog/flashlights.html.

In theory static site generators are easier to understand and get started
with. That's not always true, but it is true that page renders are invariably
faster because it's just pure HTML and not database queries are needed.

~~~
tedmiston
I like this answer because it gives the context around why static site
generators came to be.

Back in the day the simplicity of a pure html/css site was wonderful. Then as
soon as we needed templating or a shared component like a footer, we bumped to
to the most approachable full-blown server side language, PHP, ten years ago.
Today we can use a static site generator as an alternative to fill that use
case. Besides the benefits in security and performance, you can also benefit
from cheaper hosting -- for example, you could host the static front end for
free on GitHub Pages (even with a custom domain) or for pretty cheap on S3.

------
budparr
Some of the formatting is wonky, here, but you can read the post at the link
at the bottom.

CONTENT IN

Most, but not all, static site generators use markdown files to store content,
with information about each file of content specified using YAML formatted
front matter. It looks like this:

\--- title: 'The Title of an Entry Here' date: 2015-04-15 category: 'News'
excerpt: 'A short excerpt here' layout: default foo: 'Bar' \--- Some markdown
content here, which is likely the longest part. This can be quite long, in
fact.

HTML OUT

Now, imagine a folder of these markdown files on your hard drive. We'll call
this your project. The code of our static site generator could say "for each
markdown file in folder x, move it to folder y." But that's not too exciting.

Let's say, instead of just moving that file, the generator reads all the files
in a certain directory or directories and creates an "object" with them: "For
each file in folder x, list its contents in our object." That object is just a
stored list of all the files, including their file names, any meta-data
specified in the front matter, and their content.

That object might look something like this (in Ruby, as output from Jeyll):

 __\\{ "layout"=>"default", "title"=>"The Title of an Entry Here",
"category"=>"News", "excerpt"=>"A short excerpt here", "published"=>true,
"url"=>"/news/2015/04/14/post-title-here/", "dir"=>"/news/2015/04/14",
"date"=>2015-04-14 00:00:00 -0400, "id"=>"/news/2015/04/14/post-title-here",
"categories"=>\\["news"], "next"=>, "previous"=>, "tags"=>\\[],
"path"=>"\\_posts/2015-04-14-post-title-here.md", "content"=>"Some markdown
content here, which is likely the longest part. This can be quite long, in
fact.", "section"=>"post"} __

You can tell already that in creating the object the static site generator has
extrapolated some information, like a "url" created from our specified date
and category: "For each file in folder x, write to our object and combine its
category, date and file name to create a url in a format like '/category/yyyy-
mm-dd/original-base-filename.html'."

Once we've generated our object, the generator can do quite a bit with it. We
can create an HTML page with the url as created above, and that file could be
manipulated further using some of the meta-data we've specified.

More than likely, we don't want to display content written in Markdown, so the
generator will convert our markdown into HTML before outputting it to our new
file: "for each item in the object, convert the value of 'content' from
markdown to HTML."

And, of course, we don't want to just display plain text, we want to style the
page. We've likely created a set of templates in our project and specified in
our content file which template to use for each file.

In our templates we lay out our HTML and also some tags that specify which
parts of our file (the content or some piece of metadata) to display: "For
each item in our object, apply its content to the template specified as
'layout' as we output it as HTML."

Each template may output other content too. A template might call: "for each
item in our object, loop through the ones that have a 'category' equal to
'news' and display the title."

So, along the way to just creating the HTML file, the generator will read the
templates and output HTML in the format we've dictated. In the end, we have a
new folder with our fully formed HTML files.

Asset files, like CSS and javascript are just moved into the new folder, so
that in the end, that folder contains our fully formed website, which can then
be moved to a web server.

So a static site generator converts one set of files into another using
templates for instructions on how to output them in their final form.

If any of the formatting looks bad here, this is from my site "Static is the
New Dyanamic" about static site generators:
[http://www.thenewdynamic.org/articles/how-do-static-site-
gen...](http://www.thenewdynamic.org/articles/how-do-static-site-generators-
work/)

------
ghubbard
How do you think they might work?

