

Building Microservices the Lean Way, part 2 - tmwatson100
http://blog.hubblehq.com/building-microservices-the-lean-way-2/

======
latch
FWIW, I've found that building a robust and deep "API Gateway" is the key to
making SOA/Microservices work. Otherwise, you end up with duplication and
latency.

Routing and authentication are obvious candidates. It's also a good place to
track stats and tag each request with a unique ID so you can trace it as it
flows through your services.

By "deep", I mean that it should be application-aware. Caching is a good
example. For many applications, url + querystring results in too many
permutations. If the cache is aware, it can often use more meaningful keys.
Additionally, events in the underlying services can be used to cache purge,
which can result in a wicked cache hit ratio.

A more complex example has to do with duplication. Say you're building an
ecomm platform. You have a service for search, and one for recommendations and
one for the main catalog. They all need to return the same representation of a
"product". Do you duplicate the logic? Do you tie them together and pay the
latency and reliability price? No. Have them all just return IDS, and let the
API Gateway hydrate the actual results. It's a form of API-aware server-side
include and it works well.

~~~
jamiesoncj
This is really interesting. Would you consider writing a more in-depth post on
this? I'd love to read it.

~~~
latch
If you mean the SSI: [http://openmymind.net/Practical-SOA-Hydration-
Part-1/](http://openmymind.net/Practical-SOA-Hydration-Part-1/)
[http://openmymind.net/Practical-SOA-Hydration-
Part-2/](http://openmymind.net/Practical-SOA-Hydration-Part-2/)

I'm playing with it again on a new project, with a twist. Each item can be
heavily personalized. Sticking with the ecomm example, the virgin product
might look like

    
    
        {
          "id": "434p",
          "name": {"en": "..."},
        }
    

But we want to expand that, per user, with stuff like:

    
    
        "liked": true, 
        "bought": "22-1-2015"
    

We don't want to burden the clients with having to make multiple calls. I'm
still working it out, but the services will continue to just return a list of
product ids, and the API Gateway will now hydrate both the product and the
personalized pieces. Something like:

    
    
       res = upstream(req)
       ids = extractIds(res.Body)
       products = getProducts(ids)
       personalized = getPersonalized(ids, currentUser)
       reply(merge(products, personalized))
    
    

Not critical, but worth pointing out that, for me, the API Gateway acts like a
gigantic product cache with _every_ product in-memory. When a product changes,
it gets an event and updates its cache. It isn't really a "cache", since
there's never a miss. Trying to figure out if I can do the same with
personalized data. (Even if you have tens millions of product, you can easily
store it in memory).

~~~
jamiesoncj
Awesome! Thanks :)

------
AndrewHampton
So here's a question we've been talking about at my office. When developing a
micro-service on your development machine, do you need to run the whole stack
or just the service you're working on?

For example, let's I am working on service A, which depends on services B and
C. Do I need to run all 3 apps and their data stores locally?

We currently will typically point A to the staging B and C. However, we have
some long running jobs that A will initiate on B and B needs to post back to A
when it's finished. This doesn't work when pointing to staging B.

~~~
lobster_johnson
Can't speak for the author, but I can tell you what we do in our company,
which is also completely microservice-based.

Backstory: We used to have a helper tool that allowed a developer to run any
app locally. It tried to set up the same stack as we were using in production:
HAproxy, Nginx, Ruby, Node, PostgreSQL.

It was problematic, because people had machines that differed slightly:
Different versions of OS X (or Linux), Homebrew (some used MacPorts), Ruby,
Postgres, etc.

We could have spent a lot of time on a script that normalized everything and
verified that the correct versions of everything was installed, but the
problem with developing such a tool is that you won't know what's going to
break until you hire a new developer who needs to bootstrap his box. Or until
the next OS X release, or something like that.

Syncing everything with the production environment was also difficult. The way
we configure our apps, a lot of the environment (list of database servers,
Memcached instances, RabbitMQ hosts, logging etc.) is injected. So with this
system we'd have to duplicate the injection: Once in Puppet (for production),
a second time on the developer boxes.

So we decided pretty early on to mirate to Vagrant.

\---

We now run the whole stack on a Linux VM using Vagrant, configured with the
same Puppet configuration we use for our production servers.

The Vagrant box is configured from the exact same Puppet configuration that we
use for our production and staging clusters. The Puppet config has a minimal
set of variables/declarations that customize the environment that need be
tweaked for the environment. From the point of view of the Puppet config, it's
just another cluster.

We periodically produce a new Vagrant box with updates whenever there are new
apps or new system services. Updating the box is a matter of booting a new
clean box and packaging it; Puppet takes care of all the setup. We plan on
automating the box builds at some point.

To make the workflow as painless as possible, we have internal "all-round
monkey wrench" tool for everything a developer needs to interact with both the
VM and our clusters, such as for fetching and installing a new box (we don't
use Vagrant Cloud). One big benefit of using Vagrant is that this internal
tool can treat it as just another cluster. The same commands we use to
interact with prod/staging — to deploy a new app version, for example — are
used to interact with the VM.

One notable configuration change we need for Vagrant is a special DNS server.
Our little tool modifies the local machine (this is super easy on OS X) and
tells it to use the VM's DNS server to resolve ".dev". The VM then runs
dnsmasq, which resolves "*.dev" into its own IP. We also have an external .com
domain that resolves to the internal IP, for things like Google's OAuth that
requires a public endpoint. All the apps that run on the VM then respond to
various hosts ending with .dev.

Another important configuration change is support for hot code reloading. This
bit of magic has two parts:

\- First, we use Vagrant shared folders to allow the developer to selectively
"mount" a local application on the VM; when you deploy an app this way,
instead of deploying from a Git repo, it simply uses the mounted folder,
allowing you to run the app with your local code that you're editing.

\- Secondly, when apps run on the VM, they have some extra code automatically
injected by the deployment runtime that enables hot code reloading. For
Node.js backends, we use a watch system that simply crashes the app on file
changes; for the front end stuff, we simply swap out statically-built assets
with dynamic endpoints for Browserify and SASS to build the assets every time
the app asks for them (with incremental rebuilding, of course). For Ruby
backends, we use Sinatra's reloader.

\---

Overall, we are very happy with the Vagrant solution. The only major pain
point we have faced is not really technical: It's been hard for developers to
understand exactly how the box works. Every aspect of the stack needs to be
documented so that developers can know where the look and what levers to pull
when an app won't deploy properly or a queue isn't being processed correctly.
Without this information, the box seems like black magic to some developers,
especially those with limited experience with administrating Linux.

We also sometimes struggle with bugs in Vagrant or Virtualbox. For example,
sometimes networking stops working, and DNS lookups fail. Or the VM dies when
your machine resumes from sleep [1]. Or the Virtualbox file system corrupts
files that it reads [2]. Or Virtualbox suddenly sits there consuming 100% CPU
for no particular reason. It happens about once every week, so we're
considering migrating to VMware.

Another option is to actually give people the option of running their VM in
the cloud, such as on Digital Ocean. I haven't investigated how much work this
would be. The downside would obviously be that it requires an Internet
connection. The benefit would be that you could run much larger, faster VMs,
and since they'd have public IPs you could easily share your VM and your
current work with other people. Another benefit: They could automatically
update from Puppet. The boxes we build today are configured once from Puppet,
and then Puppet is excised entirely from the VM. Migrating to a new box
version can be a little painful since you lose any test data you had in your
old box.

As for your question about what services to run: It's a good question. Right
now we only build a single box running everything, even though we have a few
different front-end apps that people work on that all use the same stack.
We'll probably split this into multiple boxes at some point, as memory usage
is starting to get quite heavy. But since all the apps share 90% of the same
backend microservices, the difference between the boxes are mostly going to be
which front-end apps they run.

[1]
[https://www.virtualbox.org/ticket/13874](https://www.virtualbox.org/ticket/13874)

[2]
[https://www.virtualbox.org/ticket/819](https://www.virtualbox.org/ticket/819)

~~~
curun1r
We do roughly the same thing, but with Docker and Fig. We're already using
Docker to package our services for deployment, so using Fig to spin up a full
environment is pretty painless. And Fig also makes it simple to spin up
environments for running integration tests as part of CI to ensure that the
Docker containers being deployed and used by developers are always in a
working state.

~~~
lobster_johnson
We are indeed planning to migrate to a Docker-based deployment system. I have
looked briefly at Docker Compose (new name of Fig), and also at somewhat
bigger orchestration systems like Kubernetes.

One goal is to get rid of Puppet (which is, frankly, a buggy, badly-designed
mess) and move to a more dynamic, fluid orchestration system based on
discovery and autoscaling.

------
akbar501
Questions for Tom:

1.) How are you handling auth? Are you using a home grown solution or using
OpenID Connect + OAuth 2.0?

2.) Is the JWT behind the firewall using a pre-shared key?

3.) What does the public token look like and how does the API Gateway perform
auth? Does the token passed into the API Gateway contain only a user id? And
does the API Gateway have to perform a database query to populate the full
user object?

side note: Thanks for writing the article.

------
anton_gogolev
Is it just me or this is a tech homeopathy article?

~~~
jamiesoncj
What do you mean by that? Not sure I understand your point

~~~
anton_gogolev
The article is, to put it mildly, not very dense on the content. A lot of
hand-waving and shallow thoughts.

~~~
jamiesoncj
Oh right. I didn't get it. I thought tech homeopathy was some new framework /
library / approach I hadn't heard about. Maybe homeopathy.js or similar? On
the plus side, your comment did make me re-watch this:
[https://www.youtube.com/watch?v=HMGIbOGu8q0](https://www.youtube.com/watch?v=HMGIbOGu8q0)

------
codewithcheese
Hi Tom, I too have a django monolith. But, I hesitate to go down the
microservices route, since I reuse alot of classes in what would become
different services. Can you comment on how your class structure has changed,
and how you have maximized (or not) code reuse?

~~~
tmwatson100
What sort of classes do you mean? Views? Models? Other? For us it hasn't
changed much. Our apps were pretty self-contained so that splitting them into
separate services isn't very arduous.

Stuff that is shared between apps is often related to 3rd party integrations,
which could be moved into a separate (often asynchronous) worker/ service. In
reality most of these design choices are done on a case by case basis, based
on time/ cost/ maintenance.

~~~
codewithcheese
Yeah my main concern is for models. I can see it helps if you have distinct
django apps already, in my case I have one main monolithic app. As an example
I use elasticsearch, but I post process the results using models. ES is a
service already do I really want to iolate some logic and build another
service on top of that?

------
hannes2000
> Finally, how do we deal with our monolith? We decided to treat it as if it
> was a (very large) microservice.

Judging from your team size (3 engineers on the team page), this is probably
still a very normal-sized microservice :)

------
gabrtv
> The services are considered to be in a trusted network and are accessed by a
> private token passed in the ‘Authorization' header plus the user id of the
> requester in an ‘X-USER’ header.

This reads like the user ID is exposed in a header without any sort of
encryption.

~~~
adaml_623
What does 'trusted network' mean to you?

A lean quick service is not going to want to wait on encryption handshaking.

~~~
wtbob
On modern hardware, I believe AES and SHA-384 are very cheap. But yes, in a
private network it's overhead.

