
Data Infrastructure at IFTTT - devinfoley
http://engineering.ifttt.com/data/2015/10/14/data-infrastructure/
======
lemevi
> Lastly, in order to help monitor the behavior of the hundreds of partner
> APIs that IFTTT connects to, we collect information about the API requests
> that our workers make when running Recipes. This includes metrics such as
> response time and HTTP status codes, and it all gets funneled into our Kafka
> cluster.

> This way if you query Elasticsearch to find all API errors in the last hour,
> it can find the answer by looking at a single index, increasing efficiency.

This is a really good way to know if third party APIs are having problems.
Staying up to date with all those APIs they support must take up significant
amount of engineering effort. Many APIs are just second-class citizens for
their product owners. Bugs are introduced, changes are made without
announcements, and even if there are announcements when you're dealing with so
many different APIs it's hard work keeping track of them all and making
changes in your app to keep it running especially when APIs are turned off, or
schema changes are happening. This seems to be the hard problem IFTTT is
solving, integrating into APIs.

I'd shy away from starting a project that involves so many other companies
APIs just because of how hard of a problem that is to manage, but IFTTT is
doing a great job here.

~~~
johns
I used to work at IFTTT and this exact set of problems is why I left to start
a company. Pretty much everything I've done in the intervening years has
stemmed from exactly what you describe. It's great to see the IFTTT team
tackle this stuff head on.

------
matsur
To the author (or anyone else with experience): Any insight on why you guys
chose Kafka + Secor over Kinesis?

~~~
jonathankoren
I worked with Anuj at LinkedIn, so I'm thinking the most pedestrian answer is
that Kafka works, and it's a familiar tool.

~~~
goyalanuj
Thank you Jonathan!

As Jonathan mentioned, we made this decision around 9 months back and at that
time Kinesis wasn't as mature and had less flexibility around retention period
etc.

Kafka is very reliable (as I had seen it handling billions of events a day at
LinkedIn) and has a huge open-source community around it. At IFTTT, we always
prefer to use and contribute to open source (
[http://engineering.ifttt.com/oss/2015/07/23/open-
source/](http://engineering.ifttt.com/oss/2015/07/23/open-source/) ).

~~~
dekayed
I'm assuming that you run Kafka within AWS. Much of the hardware
requirements/suggestions I've seen for Kafka are all for non-virtualized
environments. If you can get into it, could you share some details...

\- What is the size of your Kafka cluster

\- What instances types do you use?

\- Do you use EBS or use ephemeral storage?

\- How much do you over-provision to deal with instance loss?

\- Any other gotchas/considerations?

Thanks!

------
cheetos
Anyone have an idea why they would use MySQL over PostgreSQL?

~~~
lgas
The usual reason is having more experience with MySQL than PostgreSQL.

------
ambicapter
> In order to fully trust your data, it is important to have few automatic
> data verification steps in the flow.

Is this a typo? Did they mean 'a few'?

~~~
goyalanuj
We have fixed it! Thank you!

------
BradRuderman
How do you feel about the data being delayed, sometimes 1 day? Why not stream
the data in realtime to the kafka cluster?

~~~
goyalanuj
Whatever data we need in realtime, we do stream it to the Kafka cluster.

We don't do it for the production database because we don't need it in
realtime.

------
SEJeff
Original link: [http://engineering.ifttt.com/data/2015/10/14/data-
infrastruc...](http://engineering.ifttt.com/data/2015/10/14/data-
infrastructure/)

Can a moderator such as dang fix this please?

~~~
jreed91
He is probably just syndicating this to medium with the new api's introduced.

~~~
danso
To hijack your tongue-in-cheek comment...I genuinely would appreciate it if
someone, at some point, looked at analytics before and after syndicating to
Medium to not just compare number of visitors between the original blog and
Medium, but whether PageRank for the original blog dropped due to duplication
penalties applied by Google's algorithm.

~~~
minimaxir
It's worth nothing that Medium described this syndication workflow as an
expected use case during the announcement.

I don't think they would have done that if there was a knowing SEO downside.
It would hurt everyone.

~~~
danso
I want to take them at their word...but it's more up to Google, isn't it? And
doesn't Google still rely (in part) on the canonical meta tag to resolve
duplicates? Currently, the posted Medium post has this:

    
    
          <link rel="canonical" href="https://medium.com/engineering-at-ifttt/data-infrastructure-at-ifttt-35414841f9b5">
    

It would be relatively easy for Medium to appropriately set that tag -- I
mean, if "original URL" is captured somewhere in the API, or in the Medium
post-create-admin interface (I don't know, I haven't logged into my own Medium
for awhile)...that would explicitly resolve the ambiguity, though at an
obvious cost to their own SEO.

------
ddollar
If you're using ECS to manage your clusters on AWS take a look at Convox.
We're building an open source platform that makes building, versioning, and
deploying code to AWS via ECS incredibly easy.

[https://convox.com](https://convox.com)

