
Pushing Nginx to its limit with Lua - jgrahamc
http://blog.cloudflare.com/pushing-nginx-to-its-limit-with-lua
======
qnk
I'm really glad to see a Cloudflare post at the top of Hacker News, I've been
following them and they have lots of interesting things to say. I switched
from Amazon Cloudfront to Cloudflare two months ago and can't be happier. My
hosting bill was reduced a 95% and my website speed decreased only a 13%, a
figure my weekend project can afford to make it profitable.

~~~
saurik
What caused the reduction in your hosting bill? Was it Amazon's supposedly-
poor cache hit ratio, or a CloudFlare-specific feature? Have you tried
negotiating with a larger CDN?

~~~
eastdakota
CloudFlare typically reduces bandwidth usage and load by about 70%. That can
translate into cost savings if your host (e.g., AWS) charges you for
bandwidth.

~~~
mattdeboard
> CloudFlare typically reduces bandwidth usage and load by about 70%.

How does it do this? I'm going to be flipping our content-service from IIS to
CloudFront soon so I'm familiar with CloudFront on a theoretical level. Not
clear on what you mean.

~~~
ryankirkman
One big thing they do is cache static assets for you at their edge locations.
The majority of the bandwidth costs imposed by static images, CSS and
JavaScript (assuming appropriate cache settings) should be offloaded onto
CloudFlare once you set it up.

~~~
mattdeboard
Well, yeah, but my point is that CloudFront does this too. The quote I
selected made it sound like CloudFlare reduces bandwidth usage by 70% _in
comparison to CloudFront_. So I'm curious how that works.

~~~
eastdakota
Simple: CloudFlare doesn't charge for bandwidth, CloudFront does.

See: <http://www.couldflare.com/plans>

~~~
saurik
$3k/mo buys you a lot of bandwidth, though; even at CloudFront's somewhat-
high-for-a-CDN pricing, that's 30TB of bandwidth; to see a 95% reduction in
your hosting costs over CloudFront with $3k/mo unmetered bandwidth you'd have
to be pushing 1.8PB of data. (edit: I originally said 600TB, but I had done
the math wrong for the later discount brackets.)

Even if you were down at the $200/mo plan, that's 45TB/mo before you get to
the "95% less expensive" point; I have tens of millions of users worldwide
downloading megabytes of packages from me (while the Cydia ecosystem has tons
of things much larger, I don't host those: I just have the core package), and
I don't often go above 45TB/mo.

Is the idea here that CloudFlare is seriously giving you ludicrously unlimited
amounts of bandwidth (and will not give you any crap about it) with a high
cache-hit ratio even at their $20/mo plan? If so, I'm going to have to run
some insane experiments with their service ;P. (Part of me isn't certain that
I want them to hate me that much, though ;P.)

(edit:) Ok, I looked into this some, and this argument ("they don't charge for
bandwidth") is just as false as one would expect given that it isn't feasible
of them to price that way ;P. Their terms of service makes it very clear that
they are only designed for HTML, and that "caching of a disproportionate
percentage of pictures, movies, audio files, or other non-HTML content, is
prohibited" <\- yes, even "pictures".

With this glaring restriction, there is really no way I can imagine any
reasonably-normal company getting a 95% reduction in hosting costs over
another CDN, even CloudFront: if you are pushing tens of terabytes of mostly-
HTML content a month, you are doing something insanely awesome (and we've
probably all heard of you ;P).

~~~
druiid
We're running some e-commerce sites with 8-20k items through the $20/month
plans and have never heard a complaint from Cloudflare. That said, any sites
we 'care' about are running on their business or enterprise levels which are
much higher than $20/month :P.

~~~
saurik
Right, which is why I started that evaluation at the top-end of the scale. How
much data do you move a month?

~~~
druiid
Probably you won't see this reply, but if you do... we move a decent amount,
but not a crazy amount. In the last 30 days it was around 3TB total (through
Cloudflare... we only saw about 2/3 of that).

------
moonboots
I'm using nginx+lua as the backend to <http://typing.io>, and I've found the
combination to be a fast and robust alternative to more full featured web
stacks like Rails.

------
stcredzero
It would seem that this approach gives all the advantages of async with none
of the drawbacks involved with writing callbacks. If someone added a type
annotation extension to the language and parser, this would be a killer
combination for large code bases. Lua is also a good match for "The Good
Parts" of Javascript, so it should be possible to have all the code in one
language.

~~~
kyrra
Lua was never designed to be used for really large code basis. It's a
lightweight scripting language on top of C. The entire runtime is written in
20k lines of C. It also is designed to be easily incorporated in C programs.
Nginx is a popular server case, while World Of Warcraft is another popular us.

It's really a pretty amazing language, when you embed it you can choose exact
what functionality is exposed to the scripts you run in it, so it's pretty
good for sandboxes type code (though, imposing memory restrictions is much
harder).

~~~
_dps
> Lua was never designed to be used for really large code basis

I'd certainly agree that "maintainable for large code bases" was probably
never on Ierusalimschy & Co's language design requirements document, but you
could say the same of same of Perl, Python, Ruby, Javascript, and most Lisps.
None of those languages have any additional mechanisms traditionally
associated with enforcing uniformity and consistency across large projects.
This hasn't stopped people from building large successful projects using these
languages.

As for your comment on the implementation size: I'm not sure which way you
meant it (complimentary or pejorative) but I often find people react to this
in exactly the opposite fashion I would. I see "self-ish/JS-ish/python-ish
semantics in 20kloc of ANSI C? Sign me up!". I find some people see it and
assume it to be a toy. It reminds me of the Bill Gates line about
(paraphrasing) "measuring software's success by lines of code is like
measuring an aircraft's success by its weight".

~~~
moe
Playing devils advocate here, Lua does indeed have some warts that make it
less pleasant to use in a heterogenous environment (which pretty much all
large code-bases are). Not so much due to deficiencies in Lua but due to the
impedance-mismatch versus other languages.

The most obvious issue that everyone stumbles across is the "counting from 1".
It seems like a minor thing, but the context-switch remains a drag when you're
dealing with complex data-structures in two languages and only one of them is
Lua.

The impedance-mismatch becomes even more apparent when the table-abstraction
meets serialization. The lack of distinction between an "array" and a "hash"
is awesome when you're in a pure Lua-environment, but it becomes a real
problem when you need to exchange data with languages that do depend on this
distinction (e.g. if you feed Lua an empty "array" it will later serialize it
back to an empty "hash").

The final issue that I can't resist mentioning here is not a language but a
community/mindset one. Up to this day Lua doesn't have an established package
manager akin to RubyGems, Pip, Maven, Leiningen etc. (Luarocks exists but
is... well, I've yet to see someone actually using it)

This is a deadly sin in terms of mainstream adoption. It makes deployment a
serious pain in the ass.

GoLang shows how a modern language is supposed to handle this
(importing/bundling packages directly from urls). I keep hoping someone will
add something similar to lua-core, but I'm sadly not very optimistic about it.

I think many of the driving people behind Lua just don't care about it
becoming a mainstream language or not. They care about it shining as an
embedded language (and it does!) - it's just a little bitter for those of us
who would love to use it on a broader scope.

~~~
_dps
You raise some good points; I understand you're playing the devil's advocate
but I thought a few would benefit from a friendly counter :)

> ... everyone stumbles across is the "counting from 1" ...

Fair enough :) I find this objection to be largely a matter of taste; it was
never an issue for me [added in edit: even when interoperating with C and JS
code]. People have made similar complaints about Matlab that I never found
persuasive (there are other more persuasive criticisms of Matlab's language
design). I think the core argument I'd make here is that if you're using Lua
tables in a way that requires array-offset semantics for the index variable,
you could probably step up a level of abstraction using ipairs/pairs and save
yourself worrying about 1 vs 0.

> The impedance-mismatch becomes even more apparent when the table-abstraction
> meets serialization.

Lua tables naturally serialize to Lua table syntax (modulo cycles). This is in
fact Lua's origin story (if Lua were a spiderman comic, it would be a story of
a table description language being bitten by a radioactive register-based VM).
At a technical level, how are Lua table literals any less successful a
serialization format than JSON (i.e. JS object literals)? To put it another
way: JSON doesn't map naturally to XML; should we then conclude that it has an
impedance mismatch with respect to serialization?

> doesn't have an established package manager ...

<old guy hat> The idea that a _language_ should have a package manager has
always seemed... confusing... to me. C doesn't have a package manager; people
still seem to be able to get the relevant packages when they need them through
the OS's package manager. That, to me, seems the sane solution. I realize I
may be in the minority. </old guy hat>

Having said that I agree that LuaRocks' comparative weakness relative to
Ruby's gems limits adoption in mainstream programming applications. OTOH, it
is _vastly_ easier to get started embedding Lua in a host program than any of
its competitors (this is in fact what drove me to try it in the first place).
So it's not all friction on the deployment story.

> ... many of the driving people behind Lua just don't care about it becoming
> a mainstream language or not.

Yes, I think this is likely true. I don't think any of the core contributors
care about it being "the next Python/Ruby/Perl". If I had to summarize the
emergent aesthetic, it's that Lua is designed to be a _just a language_ with a
large set of DIY practices around it, rather than a curated software
ecosystem.

~~~
damncabbage
> C doesn't have a package manager; people still seem to be able to get the
> relevant packages when they need them through the OS's package manager.

This is fine with Linux, which has at least a few sane package management
systems between the different distros. This goes out the window with OS X and
Windows.

(This may just be an argument that anyone working on server-side software
should be working inside a VM that matches your production environment. The
Ruby community seems to have shown that people push back very hard on that.)

~~~
mtourne
In the particular case of Nginx and Lua, the OpenResty package[1] is pretty
much self contained except for libpcre.

Nginx+Lua is only the core part that powers this ecosystem. OpenResty comes
with a lot of libraries for "usual web stuff", except maybe a default template
system. I've been using this rather simple one [2]

In the full example[3], I installed it on Mac OS (and actually found a small
problem with homebrew that should be fixed soon)

[1] <http://openresty.org/#Download> [2] <http://code.google.com/p/slt/> [3]
<https://github.com/mtourne/nginx_log_by_lua/>

------
zrail
I used OpenResty as the core for an experimental HTTP routing service at my
previous job. The idea was to maintain host:port pairs for HTTP services in
Redis and use a small bit of Lua code to look up the right place to proxy a
virtual host to. It worked well enough, but never got deployed because we were
concerned about the Redis SPOF. Now-a-days it wouldn't be that big of a deal.

That said, the nginx/lua combo is wonderful to work with. I got the core of
that thing working in just a few hours.

------
jumby
Why doesn't everyone try Apache Traffic Server with its LUA plugin which has
way better performance and many more capabilities?

~~~
justincormack
(a) no evidence of better performance. Lua/Nginx is very fast too. (b) Its
just a proxy, while Lua/Nginx/openresty is much more of a full web development
environment. You can use it as a proxy, but it is not the main use case. (c) I
don't think Traffic Server has more capabilities, as far as I can see, I think
it has fewer as it is much more focus on being a proxy product, while Nginx is
a full web server.

~~~
hypnotist
I think Lua/nginx and ATS are quite different beasts, solving different
problems. Nice presentation on ATS - "Apache Traffic Server: More Than Just a
Proxy" [https://www.usenix.org/conference/lisa11/apache-traffic-
serv...](https://www.usenix.org/conference/lisa11/apache-traffic-server-more-
just-proxy)

------
decktech
I've been using Openresty (with Lua and Redis support) to collect a large
amount of data from GET requests for a video sharing site. Having nginx push
data directly into Redis is super fast, lends easily to real-time metrics, and
makes it easy to batch everything up and push it into MySQL at the end of the
day. Now that Redis supports internal Lua scripts, you can also do custom
atomic functions and other neat things.

If you can get around the Redis SPOF, OpenResty + Redis is great for large-
volume data collection. Thousands of requests/sec on an EC2 Small at < 10%
load.

------
tferris
Sounds amazing and would love to step in immediately. Have some basic
questions after reading briefly the post and the comments here.

1) What's the difference between OpenResty and the Nginx+Lua module (is the
module the core of OpenResty)?

2) How does it compare to Node regarding ecosystem, performance and every day
usage/maintainability (if it's comparable)?

~~~
jhancock
I've only been playing with openresty for a couple of months. Here's what I've
learned:

1 - openresty is nginx and many official and unofficial modules. From what
I've asked the experts, there is supposed to be no difference in the source;
openresty is just a single package of many modules with an easy to use
./config and make process.

2 - The nginx module and lua lib ecosystem seems to have most of the basics in
order to roll a high level app framework, but only a half baked one exists
thus far: <https://github.com/appwilldev/moochine> Also:
<https://github.com/antono/valum> <https://github.com/pintsized/lua-resty-
rack>

My impression is if you have the time to roll your own sinatra-like framework,
it should be pretty straightforward.

~~~
tferris
Thanks!

------
mattdeboard
I've written an authorization server for CloudFront using Python/Flask and
have wondered whether just writing the same functionality via Lua could be a
better/faster replacement. I'm just not clear on how to use Lua to make the
calls to our API. This is a good pointer in the right direction though.

~~~
mtourne
Nginx exposes multiple phase to process an incoming HTTP Request. Mainly
rewrite, access, content, header_filter, and log

You can embed Lua at the access phase, and query your API from there. Some of
the Lua libraries of the OpenResty package [1] might help you do that.

And here [2] you can find an example of OAuth support.

[1] <http://openresty.org/#Components> [2]
[http://seatgeek.com/blog/dev/oauth-support-for-nginx-with-
lu...](http://seatgeek.com/blog/dev/oauth-support-for-nginx-with-lua)

------
smegel
But what happens when people put big numbers in my Fibonacci calculator?

~~~
cheald
Lua implements TCO, so it'll do just fine. :P

~~~
dbaupp
If we are being pedantic, the naive Fibonacci implementation isn't tail
recursive. (And any tail recursive implementation is unlikely to need TCO
before the numbers get too large.)

~~~
InclinedPlane
Sorry friend, it's time to be SUPER pedantic.

Silly Computer Science nerds spend a lot of time trying to figure out the most
efficient implementation of a recursive Fibonacci number function. "Ooo, let's
make sure to take advantage of tail-call optimization, oh, and memoization,
aren't we fancy?"

Meanwhile, mathematicians look at the problem and then cock their heads to the
side and say "uh, guys, that's way too much trouble, just use the closed form
solution and calculate any value in constant time using a tiny number of
floating point operations". Because Fib(n) = (phi^n - (1-phi)^n)/sqrt(5),
where phi is the golden ratio.

~~~
dbaupp
No, if there's any computations happening, that's engineers and physicists,
_especially_ if there are approximations happening.

A mathematician looks at the recurrence formula and generalises it (e.g. f(n)
= a f(n-1) + b f(n-2) or f(n) = f(n-1)*f(n-2) (solve this one, hint: xkcd) or
...) and investigates its properties. The actual values of the numbers are not
of much interest.

And a^n can only be computed in constant time if one uses exp(n log(a)), and
(I think) that the scale of the numbers can have a large impact on the
accuracy of the result, and so for large n, one needs more operations within
both exp and log to give the same (relative) accuracy.

~~~
pjscott
Last time I checked, numerical analysis was still considered a field of
mathematics. It's _all about_ the approximations.

~~~
dbaupp
Pssh, it's not abstract enough, it can't be _real_ mathematics. ;P

(But seriously: yes, agreed.)

