
Hablog – High-availability, distributed, lightweight, static site with comments - mrb
http://blog.zorinaq.com/release-of-hablog-and-new-design/?
======
ytsb
How do you take a server out of rotation if one goes down? For example, if I
hit a server in the round robin that isn't online, I could try force-refresh
the page to initiate a new lookup but in that case 1 in every 3 requests to
the site would still fail (assuming only one server is down, and hoping my
client doesn't decide to cache the initial resolution?). In your post you
mentioned a browser can handle this transparently - did you mean that if the
client can see that the domain has multiple A records and the initial
connection fails on one of the IPs, it will automatically try to establish a
connection on the next IP in the round robin if the first connection times
out, is that correct? Is this a browser standard or is this behaviour handled
differently between different browsers?

Either way interesting project, thanks for sharing.

EDIT: Also it'd be cool if the code wasn't in a tarball. I'm on a mobile
device (as I'm sure many of your users are too) that doesn't allow me to
save/extract the archive. Would've liked to have had a browse through it!
Maybe consider uploading to a service like GitHub, or having an extracted
version available so we can view the contents directly in a browser? :)

~~~
mrb
I don't have to do anything to take a server out of rotation. Browsers
automatically try all IPs, and then stick with the first IP they find that
works. Even if some servers are down, your HTTP requests continue to all work.
This is a standard browser behavior. For a really extensive outage (2+ days),
I would probably bother to manually update the DNS records.

Thanks for the feeback about making the code browsable. Will consider.

~~~
honkhonkpants
That's only true if origin doesn't respond at all. If one of your IPs accepts
connections and hangs, then the browser will display an error.

~~~
mrb
The browser would time out after a few minutes. However if the user stops
loading the page and hits Reload, the browser will try another IP and the page
will load successfully. I verified this behavior with Chrome. For a personal
blog, that's definitely "good enough" HA.

~~~
fudged71
If a webpage I view hangs on loading, I'm going to bounce and not come back.
Because it's just a blog makes me less inclined to wait it out.

~~~
honkhonkpants
When you think back on it, this was the true genius behind RSS aggregators. I
could still read your blog when your blog was not working, because Google
Reader (or your favorite aggregator) had already downloaded it once. It's too
bad that RSS died.

~~~
fudged71
Yes! Absolutely.

------
jalami
I just use hugo for site generation, isso for self hosted comments, rsync for
dumping files on the server and postcss for all the styling. All the scripts
are in a package.json so I can download and run everything with an npm run
build-init. I don't think most people's personal blogs need to worry about
high availability, multiple servers and the complexity that goes along with
it, but I'm sure some do. I simply don't have that traffic.

The orange line is neat, I opted for a css avatar replace as isso tags the
generated avatars with an id I can grab. The two columns I'd have to get used
to, maybe I'm just easily distracted.

Thanks for posting this.

------
alfredxing
The HA part looks pretty cool!

I'm not a big fan of the design though. I find the visual hierarchy of the
layout a bit confusing:

\- There isn't enough whitespace around the content, making it feel cramped

\- Scrolling isn't a bad thing. I usually find a top-down flow (from title to
content) easier to navigate than reading down from the title to the byline and
date, then back up for the content

\- Comments and comment form below the content makes sense since people
usually like to read and post comments after they've read the blog post.
Having them at the top forces users to scroll up again.

~~~
mrb
Thanks, I will act on this feedback.

------
gizmo
After the DNS resolves the IP will be cached by the client for hours. So if
any server goes down the blog goes down with it for those users. If you can't
reboot any server without causing downtime I don't think it can be called high
availability.

Taking a simple jackyl blog on a single instance and adding cloudflare would
work much more reliably, and it doesn't require all sorts of rsync tricks.

~~~
mrb
_" After the DNS resolves the IP will be cached by the client for hours"_

Not true because my TTL is low (5 min). Chrome will even try another IP within
tens of seconds if the initial IP becomes unresponsive. I tested this on
Linux. That said I recognize this is not a production-quality mechanism to
implement HA (large sites or CDN providers use anycast, load balancers, etc.)
But for a personal blog, this is perfectly fine :)

~~~
gizmo
In my experience TTL isn't respected at all. Clients will cache IP addresses
no matter what.

~~~
mrb
I tested this pretty extensively. When I switched IPs, 99.9% of the HTTP
traffic hit the new servers within the TTL time (5min). The only residual
traffic hitting the old IPs I saw was from poorly configured bots & webspiders
who probably don't re-resolve hostnames frequently enough.

I remember also reading a post from Amazon EC2 Route 53 engineers who
investigated DNS propagation times across large-scale tests in the world, and
my observations aligned with theirs. They concluded the "DNS doesn't propagate
according to TTL" story is mostly a myth (modulo rare issues here and there).

------
bahjoite
From the deterministic comment IDs section it seems like the design allows a
commenter to repeatedly update the timestamp of a comment they authored, which
might be amusing:-

\- Remember the "seed" value and submit a comment

\- Wait for a reply to the comment and then submit the same comment again with
the same seed

\- Now one's comment appears later than its reply!

~~~
mrb
Very good thinking!... but not possible. watch-db only takes into account the
first timestamp (see "del c['edited']").

~~~
bahjoite
ah yes, I see it. well done.

------
bikamonki
Great read and serving 2500 hits/sec for $15 is awesome! Even better: in
Digital Ocean you could have a snapshot of your server and if needed deploy
another node in minutes.

Off-topic: how does rebuilding the blog's menu/index works with Jeckyll? Say
you add one entry under home/foo/bar and you have hundreds of pages. Will it
actually re-create hundreds of html pages to update the navigation? Or does it
use some sort of HTML include/JS magic?

~~~
curun1r
> serving 2500 hits/sec for $15 is awesome

That's peak...I doubt he'd still be paying $15/mo if he was sustaining that
kind of traffic. Even as lean as his site is (looks like it's a bit over a
kilobyte being served from his domain...wow!), 2.5 MB/sec is around 6.5 TB/mo,
which would cost another ~$100/mo from DO.

However unless you have some desire to roll your own setup and maintain/admin
the servers, I just don't see the rationale for hosting your own statically-
generated site. GH Pages is free. And if you need more than what it offers,
Netlify's $7.50/mo plan will be fine for more than 90% of these kinds of blogs
and has many advantages...no servers to maintain or worry about going down,
use of Akamai's network for faster content delivery, and automatic site builds
from source code. And all for less than a 2-server hosted setup. And there's
also Surge, Forge and a few other competitors in that space, so it's not like
you'd get locked into a single vendor.

~~~
mrb
The main rationale for hosting my own site is that I do not want to depend on
a single provider, ever. Using GH, or Akamai, or Cloudflare, etc means putting
all your eggs in one basket. And outages do happen at these companies.

I argue that the statistical chances that 3 outages occur at the same moment
at my 3 different providers on 3 different continents is less than the chance
of a single outage at a single CDN. Not because of technical reasons, after
all they are extremely redundant, but because of process reasons: they are a
single company with company-wide processes and sometimes human errors happen
causing company-wide outages.

I do realize it sounds silly to be obsessed as I am with redundancy :) but I
love exploring these sorts of ideas to see how well they work in practice.

~~~
curun1r
> I argue that the statistical chances that 3 outages occur at the same moment
> at my 3 different providers on 3 different continents is less than the
> chance of a single outage at a single CDN.

I'm not sure I'd agree with that. You're probably running a pretty similar
software stack on each box, so there's a chance that a software update could
take down all three or a vulnerability could put all three at risk. I also
have a hard time believing that any one cloud virtual server has many 9s
uptime, so 3 together isn't likely to get you to Akamai's level.

But I prefer to rely on the large players simply because there's advantages to
staying with the herd. If Akamai has an outage, much of the internet will be
down. People will see that your site is down and attribute it to a larger
problem. If your 3 boxes have a problem or your DNS provider has a problem,
the fact that your site is down will be more apparent to users.

Also, uptime is not just a function of the percentage chance of an outage. The
time to recovery also matters. What kind of monitoring do you have in place?
How long are your outages likely to last vs those of a large player? If it
take you 24 hours to recognize a failed server, then the chances of all 3
failing in a 24 hour period are probably a lot higher than a big player having
an outage.

There's a lot that goes into high availability that you just don't get by
putting your eggs in three small baskets.

~~~
mrb
_" I also have a hard time believing that any one cloud virtual server has
many 9s uptime, so 3 together isn't likely to get you to Akamai's level."_

Actually even if the servers are only 98% available (7 days of downtime per
year!) then they provide 99.9992% availability, aka "five 9s": 1-((1-.98)^3) =
0.999992

Of course this statistical result assumes downtimes are random and
uncorrelated. As you say the risks is that a common bug or vulnerability
affects all 3 of them at the same time. Hopefully this risk is mitigated
because my software stack is _extremely_ simple (a web server serving static
files + ~400 lines of Python code) and I will not be performing software
updates on all 3 servers at the same time.

 _" or your DNS provider has a problem"_

I do not use a provider, but run 3 parallel authoritative DNS servers on these
3 servers :)

------
minitech
Looks great at 1230px wide, but 1231 is too cramped. I don't want to write a
new comment before reading the article, and then there's a big blank space
that takes up 40% of my viewport when the article continues past the comments.

I'd suggest rethinking that breakpoint or doing away with it altogether;
centering may feel like a waste of space, but it makes for comfortable reading
and your large images can still take up the full width.

~~~
mrb
I can't have more than 80-90 characters per line:
[http://baymard.com/blog/line-length-
readability](http://baymard.com/blog/line-length-readability) So 40% of your
viewport has to be either margins or comments.

Perhaps it would look less strange if I swapped the order of the 2 columns?

PS: I increased the margins around there orange line to make it look less
cramped.

~~~
minitech
Sorry, I meant specifically that the 40% is all on the left and the text is
all on the right.

~~~
mrb
I see. Maybe I'll shift the 2 columns against the left edge of the viewport.

------
nanch
Thanks for sharing your setup with us, I found this very interesting. Great
read and your new blog looks _great_!

