
How to Set Up and Deploy to a 1000-Node Docker Swarm - emmetogrady
http://blog.nimbleci.com/2016/08/17/how-to-set-up-and-deploy-to-a-1000-node-docker-swarm/
======
n0us
This is nice but it's characteristic of most other Docker tutorials which say
"just run this command" and don't bother to go into how or why it works. As a
result, all the reader can do is copy and paste (if the command even works in
the first place or the tutorial isn't out dated). What I think most people
need to know is how they can do this but modify the app to work for their
purposes instead of just "hello world".

------
dockinator
The title should be corrected to read "How to set up and deploy a 1000 node
botnet". There's no mention of securing the hosts or the swarm what-so-ever.

~~~
HyperLinear
I liked the sentence: "There are a few other arguments that you’ll need to add
to those commands but if you follow the docs you’ll have no problems at all."

Oh, you mean like 50% of the work?

As for the reverse proxy: Traefik all the way! :D

~~~
sbarre
This is why I love HN.. I'd never heard of Traefik and it looks great!

------
gamedna
This is clearly a doc based on a hypothetical assumption that he can deploy
1000 nodes. If thats the case, why not go for a cool million?

Just a suggestion to the OP, it not hard to setup and share a 5 node vagrant
cluster on your laptop. Give concrete examples that people can run locally and
test your assertions themselves. Once that foundation is laid, you can
extrapolate to 10 nodes, 100 nodes, 1000 nodes.

Anyone that has deployed a cluster of that size knows that the article is
missing a bunch of items, not limited to the following: \- Overhead Instances
(manager, service discorvery, loggging, etc.) \- Configuration Management \-
Security Implications \- Monitoring \- Failure mitigation (its going to happen
at that scale) \- Update strategy at this scale

For those that are interested, one official doc and a good place to start when
leaning how to deploy a large docker 1.12 cluster is this guide by docker.

[https://docs.docker.com/swarm/plan-for-
production/](https://docs.docker.com/swarm/plan-for-production/)

~~~
luka-birsa
Fully agree, this is a typical example of a shitty Swarm/Kubernetes/etc blog
post, which gives no real substance - this stuff is covered in first 5
minutes.

We are just deploying our first Kubernetes cluster in production and anything
more than basic hello world would be welcome. Like how to configure networking
in production, how to route traffic to containers, how to provide volume
storage (backups, etc..).

I mean, we'll get there of course, but we opted against Swarm as data is even
more lacking than Kubernetes.

~~~
joshdev
I think the main issue is that beyond the very basic examples the new Docker
1.12 swarm features aren't really production ready. At this point there is a
significant amount of tooling required to get a production cluster running
with your applications. The Distributed Application Bundles (DAB) files, one
of the features I was most excited about, are still marked as experimental.

Overall I think Docker is heading in the right direction, but for now
Kubernetes, ECS, etc... are better solutions for orchestration. I was hoping
to only use Docker for my current project, but I think I'll have to wait until
the next one rolls around and Docker releases a few more updates.

~~~
gamedna
"Production Ready" is a funny term. Just like any other product, docker swarm
will be production ready once we see a whitepaper or writeup that details the
scaffolding around it to meet some sort of SLA. I would be willing to build a
production cluster of say 20-50 nodes with swarm in order to learn its
intricacies. Building anything larger would have to be done incrementally with
lots of monitoring and transparency.

------
rcarmo
"replacing 3 in wherever you see 1000 in this post is probably a good idea"

...which really means "I have no clue where this will break when scaling".

Cute, but not terribly insightful, and possibly risky in an age where
following recipes off the Internet is too often the first step towards
production :)

~~~
dockinator
What do you mean it won't scale, scaling is as simple as changing the 3s to
1000! Risky??? Just change the 1000s to 2000 and you have even moar scale!

~~~
rcarmo
I see what you did there ;)

------
timthorn
Is the definition of bare-metal changing? It seems clear from the context that
virtual machines are being used, but is that a distinction that those further
up the stack don't worry about now?

~~~
emmetogrady
Definitely possible that I'm misusing the term bare-metal but nope, no VMs, I
meant 1000 (bare-metal) servers with only docker installed

EDIT: I had been misusing the term bare-metal, thanks for picking up on it.
Examples should hold on both bare metal and VMs though.

~~~
felixgallo
You what now

------
geggam
um..... bare metal is a VM now ?

Buy 1000 bare metal servers

This one is easy. Pick your favourite cloud provider and buy lots of servers

~~~
rco8786
Yea that's what initially got me too. If a VM from my favorite cloud provider
is now bare metal...what's actual bare metal called?

~~~
pmalynin
Following the logic it should be the hypervisor

~~~
tedunangst
Hyperbare metal.

------
LoSboccacc
basically this is how the howto handles the hardest part:

>Basically you will run docker swarm init on the first node and then docker
swarm join on all the other nodes. There are a few other arguments that you’ll
need to add to those commands but if you follow the docs you’ll have no
problems at all.

worse part of the setup is how to build the cluster nodes store in a way which
is redundant and reliable, since provisioning it for HA is largely
undocumented and left to an exercise for the reader

~~~
emmetogrady
Maybe I'm misunderstanding what you mean by "cluster node store" but in Docker
1.12 its built in, setting up a swarm is now really easy. No need for consul,
etcd, etc. It's high available by default.

~~~
LoSboccacc
if you mean using the swarm token, that relies on a third party service which
might or might not be available at any point in the future. and won't work on
a private network iirc

~~~
emmetogrady
Documentation is scarce, I had understood that Docker 1.12 uses it's own
service discovery mechanism.

From here ([https://docs.docker.com/engine/swarm/swarm-mode/#view-the-
jo...](https://docs.docker.com/engine/swarm/swarm-mode/#view-the-join-command-
or-update-a-swarm-join-token)):

"...starts an internal distributed data store for Engines participating in the
swarm to maintain a consistent view of the swarm and all services running on
it"

Here are the docs for "swarm init" in docker 1.12:
[https://docs.docker.com/engine/reference/commandline/swarm_i...](https://docs.docker.com/engine/reference/commandline/swarm_init/)

~~~
LoSboccacc
yeah see that command? --token

that goes to some docker owned server, ask for nodes, and join/create the
swarm as necessary. I wouldn't build the cornerstone of an infrastructure on
this, like, never.

there's an undocumented feature I just discovered to use a local token server,
it seems, but then you're back to square one.

> documentation is scarce

for being a container, documentation is not just scarce but outright
insufficient - especially in regards to its failure and recovery modes

~~~
jlhawn
These swarm tokens (SWMTKN) are not sent to any external service. The tokens
are only sent to an existing swarm manager in your cluster.

Here's how it works:

\- When you run `docker swarm init` it initializes the current node as a
"manager" which has a datastore of cluster configuration and is responsible
for assigning tasks to "worker" nodes (including itself). This init process
also generates a cluster CA and two secrets authorizing manager and worker
joins. The output of this command will be two tokens which you can use to join
new nodes to the cluster as either a "manager" or a "worker". These tokens are
structured like this:

    
    
        SWMTKN-1-<cluster CA hash>-<manager or worker secret>
    
    

\- When you install Docker on another node, you can join it to your cluster by
running `docker swarm join` specifying the token and the address of an
existing manager. This new node is able to authenticate to the existing
manager by asking for the cluster's CA certificate and hashing it to make sure
it matches the value in the token. The TLS connection to the manager is
verified by the new node which then makes the join request with the secret
which indicates to the existing manager whether this new node is joining as a
manager or a worker. These tokens are unique to your cluster and never sent to
any external system.

\- If the new node is joining as a manager then the cluster configuration is
replicated to to the new manager. Docker uses the Raft consensus algorithm to
maintain consistency of the configuration data. As long as a majority of your
manager nodes are available then the managers will be able to coordinate and
issue work to the available workers in the cluster.

I hope this helps you better understand how the cluster is secured.

~~~
LoSboccacc
[https://docs.docker.com/swarm/install-w-
machine/](https://docs.docker.com/swarm/install-w-machine/)

"The create argument makes the Swarm container connect to the Docker Hub
discovery service and get a unique Swarm ID, also known as a “discovery
token”. The token appears in the output, it is not saved to a file on the
host"

are there two type of tokens now?

~~~
Schweigi
jlhawn answer is about Docker Swarm Mode while your link is for Docker Swarm.
Docker Swarm Mode has been released with Docker 1.12 and I guess will replace
the old Docker Swarm. The new Docker Swarm Mode doesnt need any 3rd party to
keep the state because it has a Raft based consensus feature built-in. Thus it
is really easy now to use the new swarm mode.

~~~
LoSboccacc
they just HAD TO use the same name, hadn't they? XD

------
redwood
This must be an ironic piece?

~~~
discordianfish
I was really thinking the same after
reading[https://news.ycombinator.com/item?id=12303075](https://news.ycombinator.com/item?id=12303075)

So this article talks about how you would deploy a 1000 node cluster without
actually doing it? Why not saying this is how to deploy a 100000 node cluster?

~~~
collyw
I want to know _why_ you would do it. What sort of problems would it be suited
to?

------
crypt1d
Meh, the article doesn't really bring anything new to the table.

If you are doing something like this, please keep in mind that this kind of
DNS failover is, at best, unreliable. You have no control of how DNS is being
cached on client side, and whether the client is going to switch to the next
IP in the cycle if the previous one is unavailable. Proper way to do HA would
be to use some kind of VIP + load balancer combination (eg, keepalived +
HAProxy), which would allow you to failover the IP instead to just rely on the
hostname. However if you also have a database backend to think about, then u
will most likely need something like Pacemaker to ensure you don't end up with
data inconsistency (brain split scenario).

------
sinneduy
am I the only one that thinks this is pretty clearly satire?

------
drc0
how ironic that now there is this article as first on hn
[https://circleci.com/blog/its-the-future](https://circleci.com/blog/its-the-
future) :)

~~~
roddux
"How to Set Up and Deploy to a 1000-Node Docker Swarm -- for your first CRUD
webapp"!

~~~
tinco
Except that it doesn't work like this, as the CRUD web app needs a database to
connect to, and this article doesn't explain how to scale a database over a
1000 node cluster which is not trivial at all, and not explained in any of the
linked tutorials either.

So it's up to the reader to imagine what kind of application would need a
thousand web frontends without any form of persistant storage..

~~~
roddux
Probably a service to left-pad strings :^)

