
Running a modern infrastructure stack - samber
https://blog.barricade.io/running-a-modern-infrastructure-stack/
======
skywhopper
Hmm, this is the second such post I've seen recently that lays out the
company's infrastructure stack and after a few points, it mentions how they've
outsourced all their logging and alerting to DataDog who solved all their
problems in that area. DataDog seems like a nice product, and I know nothing
about it, but after seeing how ... aggressively they were marketing at
re:Invent, color me skeptical that these stack discussions are entirely
spontaneous.

~~~
duggan
> color me skeptical that these stack discussions are entirely spontaneous.

Author here, I suspect all this means is that Datadog's marketing is as
effective as their product.

It's rare I give any software or service truly glowing praise. Rare enough
that I'm pleased to do so at any opportunity I get.

------
twic
> To me, an orchestration system should control the entire provisioning
> process, turning a plan defined in code into a production system. As a
> result I (perhaps unhelpfully) don’t believe anyone has built an
> orchestration system

Stackbuilder!

[https://github.com/tim-group/stackbuilder](https://github.com/tim-
group/stackbuilder)

The docs for Stackbuilder are still horrible, but here's an example stack:

[https://github.com/tim-group/stackbuilder-config-
example/blo...](https://github.com/tim-group/stackbuilder-config-
example/blob/master/stack.rb)

Assuming you have a compute fabric in place, running a script like that
through the tool will provision and configure VMs for all the parts of a
production system: it knows about Java applications (started with java -jar),
Apache proxy servers, Linux NAT, IPVS load balancing, MySQL databases,
Puppetmasters, and possibly other things. The fabric provides KVM for VMs, and
BIND for DNS, controlled via some custom MCollective plugins. Configuration is
done via Puppet, but Stackbuilder creates the Puppetmaster as part of the
build. Application firewalls and Nagios checks get configured as part of the
build, but i can't remember if it's Stackbuilder itself that does that, or
some of the Puppet code.

BOSH is the same space:

[https://bosh.io/](https://bosh.io/)

Again, it requires an existing fabric, but that can be plain AWS or OpenStack
(or VMware). It can then build anything you can write a manifest for; it
operates at a lower level of abstraction than Stackbuilder.

~~~
falcolas
Saltstack, with the cloud module, fits into this role rather well. It's more
of a declarative setup and a discoverable setup, but I've found that to be
acceptable.

------
mwcampbell
Have you looked at Joyent's Triton service? They claim they can run containers
securely on bare metal, since the underlying kernel is Illumos (but it can run
Linux binaries and thus Docker containers). So the trade-off between isolation
and efficient resource usage would disappear.

On the one hand, they don't have all the same higher-level services as AWS,
like ELB, let alone managed database services. On the other hand, I don't
think I'd sleep well running anything based on EBS ever again, since EBS is so
notorious for cascading failures.

~~~
pradeepchhetri
I got the chance to play with Joyent's Triton Elastic Container Service. Yes,
the trade-off between resource isolation and efficient usage would disappear
since they use SmartOS zones rather than Linux namespaces or cgroups to
provide strong isolation between containers. They have forked Mesos[1] and
added capability to run Triton containers as Mesos tasks[2]. Some of the
questions which need to be explored:

\- Whether they provide some kind of built-in service discovery.

\- Whether all existing Mesos frameworks will support Triton based Mesos
deployment since many frameworks make use of different networking modes,
docker storage engine.

[1]: [https://github.com/joyent/mesos](https://github.com/joyent/mesos)

[2]: [https://www.joyent.com/blog/mesos-by-the-
pound](https://www.joyent.com/blog/mesos-by-the-pound)

------
flowerpot
Maybe I'm missing the right articles or I'm thinking about this the wrong way,
but I can't seem to find any resources on how people run databases in these
kinds of infrastructure stacks. I mean I have no problem understanding how I
can deploy my 12 factor application on kubernetes for example and load balance
to those, but persistence seems to be missing. Do people just use
Amazons/Compose/etc Database offerings and don't worry about it themselves?

~~~
duggan
Yeah, I sort of alluded to it in the article but didn't expand on it - we're
doing a combination of things.

1\. Using AWS services where we can (DynamoDB)

2\. Provisioning more "traditionally" to instances, and managing those
independently of the Mesos scheduler

3\. Experimenting with scheduler based solutions[1] (which are still pretty
bleeding edge, but are promising)

As I mentioned, EMC are (to me) doing the most interesting stuff here[2]
because they're leaning on a lot of existing production systems like EBS and
Mesos' own scheduler.

[1]:
[https://github.com/mesos/elasticsearch](https://github.com/mesos/elasticsearch)

[2]: [http://blog.emccode.com/2015/10/08/enabling-external-
volume-...](http://blog.emccode.com/2015/10/08/enabling-external-volume-
support-for-any-mesos-framework/)

~~~
flowerpot
I would love to see approaches similar to the attempt for elasticsearch for
all major databases as out of the box highly available, that would make me
sleep a lot better at night. For the mean time I'm just going to manage my
databases traditionally (through a service or run by myself). It seems like
its really a problem that needs to be solved, other than that I'm a huge fan
of these infrastructure approaches.

~~~
twic
Cloud Foundry has done quite a lot of this. Cloud Foundry is built on BOSH:

[https://bosh.io/](https://bosh.io/)

So you can write a BOSH 'release' which is a script for setting up a database
cluster with all the necessary, like this:

[https://github.com/cloudfoundry/cf-mysql-
release](https://github.com/cloudfoundry/cf-mysql-release)

However, you have to buy into BOSH and CF to make use of this

Also, most of the interesting and high-quality services are part of Pivotal's
paid version of CF. Which might well be worth the money - many of the services
are built by colleagues of mine, and i can assure you that they are of the
finest quality! If not, there are a bunch of open-source contributed services.
For example, a metrics service with InfluxDB and Grafana:

[https://github.com/cloudfoundry-community/metrics-
boshreleas...](https://github.com/cloudfoundry-community/metrics-boshrelease)

A logging service with Logstash and Elasticsearch:

[https://github.com/cloudfoundry-community/logstash-docker-
bo...](https://github.com/cloudfoundry-community/logstash-docker-boshrelease)

PostgreSQL:

[https://github.com/cloudfoundry-community/postgresql-
docker-...](https://github.com/cloudfoundry-community/postgresql-docker-
boshrelease)

I have no idea how highly-available and resilient these are, though. It's much
harder to write an HA service release than a normal one.

~~~
fidget
Christ the amount of native advertising in these threads is getting annoying

~~~
twic
And i'm not even being paid to do this! Sorry that it came across that way - I
absolutely understand how it could.

In my defence, the Cloud Foundry services are the only serious attempt I'm
aware of to package highly available database setups for deployment on your
own infrastructure, and that's something I really want to see become
commonplace. I'd be even happier if some other group came along and did it in
a way not tied to Cloud Foundry or BOSH, Ansible playbooks or something. But
as yet, as far as I know, they haven't.

------
markbnj
Great post, and aligns with many of the practices we're following now. Also
links to two other great posts from Segment and Joe Beda, and taken together
the three are quite valuable.

On the subject of kubernetes and heterogeneous environments, kubernetes itself
may allow a mix of instance types (node types) in a cluster, but as
implemented in GKE for example it does not. I believe ECS has similar
constraints. Our response has been to think in terms of separate clusters for
services, edge routing, persistence, etc.

------
handimon
I think these posts are not only good to see what other companies are thinking
about infrastructure and the tools they use, but how many different levels of
control one can apply to the problem of creating software in the cloud these
days. Also the amorphous role of devops and new challenges and solutions that
come up all the time.

I also appreciate the end where Ross talks about limiting the amount of
innovation and trying to be practical whenever possible. Too many people are
completely reactive to technology and using it only trades out one set of
problems for another because it all has to work together.

The design on the blog and main barricade site is really awesome too. Great
post!

------
room271
One question related to AWS is how to, or whether it is worth, segmenting
access to resources for containers. If you are using single instances for each
service you can use instance roles for this. These are great as they are
temporary, tied to the machine, and most of the client libraries will detect
them automatically.

If you are using containers, you need to create proper users presumably. Or
you need to give all the permissions to the instance.

Maybe this isn't a problem in practice because people group related services
together into separate orchestration clusters (i.e. a Kubernetes for each
service grouping).

But it would be great to hear some real experiences on this.

------
nodesocket
Great post. In terms of logging, check out Papertrail. They are super simple.
Just setup rsyslog to point to Papertail, then update your services (nginx,
postgres, mongodb) to use syslog.

------
ftwynn
> [Logging:] I can’t be the only one who thinks this area is lacking its
> Stripe equivalent.

Can I ask you to expand on this a little? There are a bunch of cloud log
management solutions, and I'm not sure what makes any of them Stripe
equivalent or not.

------
IceyEC
It might be worth checking out Canonical's Juju, we're working on deployable
segregation (like security groups) defined in code!

~~~
zimbatm
Is anyone using Juju in production ?

