
How CloudFlare extracts a signal from 10 trillion log lines each month [video] - jgrahamc
http://www.thedotpost.com/2015/06/john-graham-cumming-i-got-10-trillion-problems-but-logging-aint-one
======
nodesocket
Good god, 4 million log requests per second. 400 TB a day (compressed) if they
stored the logs.

I recently setup a fun project using a RethinkDB cluster and Node.js express
middleware[1] that logs all requests to RethinkDB in JSON. I did some load
testing and sustained 1,400 writes per second, and was quite happy thinking
this would scale up very larger. However, not 4 million write per second
large. :-)

[1] [https://github.com/commando/express-rethinkdb-
logger](https://github.com/commando/express-rethinkdb-logger)

~~~
deegles
It would be a better comparison if we knew across how many nodes they're
sustaining that. Maybe they just have 4,000 nodes doing 1k TPS each.

~~~
teraflop
I don't really think the word "just" belongs in that sentence ;)

------
sinzone
Interesting how NGINX + Lua is becoming more and more widely used in mission
critical applications with huge amount of traffic. Since the introduction of
LuaJIT performances have been outstanding and many companies like Netflix,
Alibaba, Cloudflare, Kong [0], Airbnb all run on a customized nginx with Lua
modules; doing amazing things from security to API management.

[0] [http://github.com/mashape/kong](http://github.com/mashape/kong)

~~~
meowface
I predicted about a year ago that nginx/LuaJIT (OpenResty) is the sleeping
giant of web development. I've seen more and more companies start using it,
and I wouldn't be surprised if people start talking about it as a Node.js
alternative in the not so distant future.

~~~
codinghorror
HAProxy will be using Lua in future releases for similar reasons. Interesting
trend!

~~~
sinzone
And Lua is taking off because of these reasons such as the ability to extend
Nginx or HAproxy. Plus, is simple to use, easy to learn and highly efficient,
without the need to touch C/C++.

------
mdaniel
What is the deal with the recent "thedotpost.com" envelope for perfectly valid
YouTube URLs? I want to flag this post just for that.

[https://www.youtube.com/watch?v=LA-
gNoxSLCE](https://www.youtube.com/watch?v=LA-gNoxSLCE)

~~~
fweespeech
jgrahamc is the speaker, it was at a conference run by the same company as
thedotpost.com.

I'm not really seeing the issue. Complaining about how someone chooses to link
to their own content seems silly tbh.

~~~
mdaniel
My apologies, I just noticed a flood of these previously unrecognized domains
which had a lot of "web chrome" around a central YouTube video. I shouldn't
have included the language about flagging it, as that made my comment more
negative than I intended it. I am slowly learning to pare down my HN comments
to avoid that editorializing.

~~~
vacri
In this particular case, the website logo is the same logo on the speaker's
podium and on the wall behind the speaker :)

------
sinzone
That is 4M requests per second. To put this in prospective, the established
Akamai runs at 25M/s ([http://www.akamai.com/html/technology/real-time-web-
metrics....](http://www.akamai.com/html/technology/real-time-web-
metrics.html)), which means that Cloudflare is quickly growing to hold 20% of
Akamai traffic.

------
rch
[https://github.com/cloudflare/jgc-
talks/blob/master/dotScale...](https://github.com/cloudflare/jgc-
talks/blob/master/dotScale/2015/10trillion.pdf)

I like that github extracts the slides for me, but what would be better is if
it could extract just the plain text.

------
clebio
The Nginx + LuaJIT approach is new to me. I went looking and found their blog
post about it [1]. That post mentions log aggregation, but it sounds from this
talk like they're doing dynamic routing via the Lua code. Is the idea to
accept requests for entirely separate sites via one front-end Nginx host(s),
and then perform a sort of NAT or virtual routing to different client
properties based on the request headers?

At any rate, very interesting stuff, and more to read up on now.

[1]: [https://blog.cloudflare.com/pushing-nginx-to-its-limit-
with-...](https://blog.cloudflare.com/pushing-nginx-to-its-limit-with-lua/)

------
mtourne
In the world of lossy counting. This [1] could be something to look at if you
wanted to answer the question "how many requests/secs towards jgc.org at
noontime, two months ago"

[1] [https://github.com/dgryski/hokusai](https://github.com/dgryski/hokusai)

~~~
meowface
Why not HyperLogLog?

~~~
jodrellblank
Presumably because there's no point replying to a video talk about HyperLogLog
with a comment saying "you might want to look into HyperLogLog".

------
Thaxll
No love for Elasticsearch :/

~~~
mdaniel
Are you complaining about the obvious ES JSON in one of the slides not getting
a mention or just the lack of mention at all?

If the latter, it's highly likely due to the findings (possibly even
independently verified) of the treatment ES received from Jepsen, which was
revisited in a talk at that same conference:
[https://news.ycombinator.com/item?id=9778291](https://news.ycombinator.com/item?id=9778291)

