

Tengine - A variant of Nginx used on #1 Chinese ecommerce site goes open source - devy
http://tengine.taobao.org/
Tengine is a web server developed by the Servers Platform Team at Taobao. It is based on the popular Nginx HTTP server with many convenient features.
======
zhuzhaoyuan
Hi guys,

I'm responsible for the Tengine project. We're very excited to see the news of
Tengine appears on ycombinator.

Just a few clarifications:

Q: Why Tengine is 'forking' Nginx instead of committing the patches to the
official Nginx?

A: First, we are developing our own Nginx version because we have strong
requirements to enhance it. Our website is very busy (ranked #14 on Alexa's
top sites list) and many features we need can't be done by writing modules. We
would love to contribute our work to the official Nginx. We consider it as a
great honor to share our achievements with the community. That's why we have
open sourced it. We are also trying our best to contribute back to Nginx.
Actually we have contacted the core members of the Nginx team last December,
including Andrew Alexeev, the people in charge of their Business Development
and their COO/GM, Maxim Konovalov. We asked how to collaborate with them.
Their replied as following:

"It's interesting what you guys do and let's keep in touch. I'm not really
quite sure right away in regards to what can be imported to the main branch,
but hopefully we'll find things to collaborate on. We're a bit busy towards
the end of the year, so probably a good idea to catch up in January."

More than two months have passed. We are still waiting for their requests. We
are very confused because we don't know which features and bug fixes they
think should be merged into Nginx. Some feature such as the syslog and pipe
support are explicitly refused to implement; A 'bug fix' of the error_page
directive I wanted to send to the Nginx developer but they thought that
behavior was OK though many users think it's a bug... Frankly, we are a little
bit frustrated. It's very sad that we haven't done too much things together
yet. But the Tengine team are open to hear the ideas from the Nginx guys. And
we're going to knock their door again.

Q: "input filters" in Tengine have anything to do with Chinese requirements
for censoring?

A: No. It's just a mechanism to help implement something similar to Apache's
mod_security. E.g. I have written a module to demonstrate how to fight the
hash collision DoS attack: [http://blog.zhuzhaoyuan.com/2012/01/a-mechanism-
to-help-writ...](http://blog.zhuzhaoyuan.com/2012/01/a-mechanism-to-help-
write-web-application-firewalls-for-nginx/) BTW, please don't connect
everything to censoring so rashly. The idea of 'people in China are willing to
do censoring' is also stupid.

~~~
newman314
Would you be willing to put tengine code on github? It might be easier to send
pull requests and patches.

Also, agreed on people not assuming the worst based on features.

~~~
zhuzhaoyuan
Good idea. Thanks. Actually I'm considering it too. Maybe migrate the code
base to github next month :)

------
morganpyne
Never heard of this until now, so was very interested to see the list of
enhancements that they have made to nginx. Some of them look quite useful e.g.
the logging enhancements and Input body filter support.

This feature caught my eye: \- Combines multiple CSS or JavasScript requests
into one request to reduce the downloading time

I wonder how they achieve this at the webserver level? Normally something like
this is done as part of the deployment/compile process. Anybody familiar with
Tengine care to comment?

~~~
zhuzhaoyuan
You can get more information here: <https://github.com/perusio/nginx-http-
concat> Thanks to António Almeida (perusio) for writing this :)

~~~
morganpyne
Thanks for the information. Looks like the app needs to support it too, using
a custom URL syntax.

------
Gorgonheart
Tenginx was not a community project.

It's a commercial project for www.taobao.com, which ranks 13th global
according to Alexa (higher than ebay, lower than amazon) So when the company
has some custom feature, it can't wait for the original project to accept and
update.

Tenginx going open source because we got tons of help from the community, not
only Nginx project. And it's an honor to contribute.

Other open sourced projects include TFS(file system)/Tair(distrbuted
cluster)/Webx(web framwork), all currently running on taobao.com. You can
check them out at <http://code.taobao.org/> . Most docs were in Chinese tho...

Currently the feature list on Tenginx might be limited, but who knows what
would happen...Google forked WebKit into Chrome, that's nothing to do with
feature list in the beginning.

~~~
Deinumite
It's really cool to see a website this big contributing back to FOSS, good
stuff.

------
jbyers
The opensource page suggests they're open to merging with 1.1. I'm hopeful
that happens. It would be a shame for these patches to live on a fork.

Unfortunately the more interesting features -- input filters and file
composition -- appear undocumented.

~~~
kijin
I wonder if those "input filters" have anything to do with Chinese
requirements for censoring certain keywords? Or is this more like mod_security
for nginx?

Also, how does combining CSS and JavaScript work? Can this be done at the
webserver level without explicit support from the web application?

~~~
est
I don't think writing a huge list of banned keywords in nginx conf looks like
a good idea. But it can be used in that way.

------
timc3
Why not give back to the root Nginx project rather than fork or have these
features as compilable modules?

~~~
DarkShikari
There's often a serious communication barrier between Asian open source
developers and Western communities; partially a language barrier, but also
other issues. The end result is that a lot of projects end up being de-facto
forked by Asian open source developers -- this is where outreach can be very
important, to help integrate the changes back in as soon as possible and start
that necessary contact.

If anyone cares I can talk a bit more about this (as this was an issue with
x264). There was also a session (with notes online) on the topic at last
year's GSOC Mentor Summit.

~~~
quadhome
I most definitely care and am very interested.

Are these[1] the notes you were referring to? I couldn't find the talk itself.

[1]: <http://titanpad.com/fossinasia>

~~~
DarkShikari
There was no talk; it was an unconference.

I'll try to retell x264's story here...

Japan has long had a community of x264 developers and users, but for a long
time they largely remained insular and didn't make much active effort to push
their patches upstream. This wasn't out of maliciousness; there were many
reasons.

The language barrier was a big problem -- less so that they didn't _know_
English, but moreso that they didn't feel confident with it. In reality, we
found the Japanese contributors to have way better English than they thought
they did. People are often embarrassed or scared to use a language they're not
confident with, and they worry about making mistakes and looking bad. _Make
them comfortable_.

Note that this is especially an issue with Japan. In my experience, Japanese
tend to be (typically) much less confident in their English despite equal or
greater skill. This might be in part because the school system typically
doesn't emphasize much 'real-time' conversational English.

This problem is more general than just language; open source IRC channels and
mailing lists can be intimidating, and people (often rightly) suspect that
they'll be mocked for making mistakes, so they just don't bother.

As is common in many non-English-speaking countries, the Japanese have many of
their own tools and methods of communication that are unique to Japan. In this
case, they had an ongoing x264 thread on 2ch. A friend of mine from the x264
user community offered to help. He was multilingual, able to speak near-
perfect English and Japanese, along with many other languages. Using his
translations, I answered questions for a few weeks on the 2ch thread.

Eventually it became somewhat obvious that one of the people posting in the
thread was a Western developer; the Japanese jokingly called it the 'Black
Boat incident', a reference to the arrival of Commodore Matthew Perry's fleet
at Japan in 1853 (see <http://en.wikipedia.org/wiki/Black_Ships>). We
convinced a few to drop by our IRC channel; we invited them, noting that
difficulty with English was not a problem, and we had a Japanese speaker who
they could converse with directly anyways.

A few came, some contributed patches, and a few stuck around; we now even have
a small community of Japanese users who hang out in the main channels. One of
them noted later that his (written) conversational English had improved vastly
just by being on IRC for a few months and was now much more fluent.

Put simply, cultural and language barriers reduce peoples' confidence in
communicating. They can also result in misunderstandings; open source
developers, for example, tend to have a very blunt style of communication. In
the best cases, this can mean they will ignore their own ego and debate
decisions on technical merits, without pleasantries. In the worst cases, this
can mean rudeness, intolerance, and general dickishness. Especially coming
from a culture more heavily based around politeness, this can be daunting.

If you want to welcome a foreign community of developers, you should try to
(list certainly not complete):

1\. Have someone who speaks their language, so they can feel confident that
they have someone to speak to even if they aren't confident in their English
skills.

2\. Contact _them_ first; don't rely on them to come to you.

3\. Be friendly and welcoming. If you have to, mute That Guy who insists on
being rude and obnoxious. Mocking grammar errors or being needlessly blunt are
quick ways to make people feel incredibly unwelcome. They probably speak your
language much better than you speak theirs!

4\. Give them extra help -- don't treat them like Just Another Patch
Contributor. Your goal here is not just to integrate their changes, but to
gain a connection to their developer community. "Patch rejected, I don't like
it" is not a way to gain friends.

5\. I really shouldn't have to say this, but apparently (from experience) I
have to: seriously -- don't be a racist asshole. Particularly if you're
inviting people to an IRC or high-traffic mailing list, there are often people
(including devs!) who will make all sorts of insensitive comments. This needs
to not happen. Yes, this means no stupid jokes about "roneriness" or Indian
tech support.

Yes, this also means no stereotyping. Just because they're Japanese doesn't
mean they want to talk about Naruto, and just because they're Chinese doesn't
mean they really like General Tso's Chicken. I know you really love Korean
culture, but just because they're Korean doesn't mean they want to talk about
Girls' Generation and Starcraft. Don't treat someone from a different country
as if they're some specimen under a microscope either. In short, avoid
_othering_. Making people feel as if they are different and not wanted is a
quick way to make them not want to come back.

~~~
ksec
Just a few things about not only a language barrier, but more importantly a
culture barrier. Language Barrier is easier to break.

Then there is the obvious, IRC always tends to get lots of slang and jokes,
that even Brits from UK wouldn't get what that US American are laughing about.
And this put off a lot of Eastern Developers.

P.S - Nice to know DS read HN.

------
tszming
If you can read Chinese, you can check out the link below. They are working on
a list of projects (some of them are already opensourced) that handle "big
data". In case you don't know, taobao is the ebay in China, with alexa ranking
of 13.

<http://www.tbdata.org/p_d/development>

------
sarcasticdog
This work is also worth checking out

<http://openresty.org/> <https://github.com/agentzh>

This too has originated at Taobao, Alibaba Group.

~~~
zhuzhaoyuan
It's great work. Recommend +1.

------
kennywinker
Feature #1: "Input body filter support. It is quite handy to write Web
Application Firewalls by using this mechanism."

Correct me if I'm wrong, but "Web Application Firewalls" sounds like an API
for censoring content to me. As much as I loath the idea of internet
censoring, I would love to get a peek inside the technical workings of the
famed "Great Firewall of China".

~~~
elliotanderson
Applying the same logic, firewalls used to protect computers and servers are
only there to censor content.

From the description, it sounds like it has nothing to do with content
filtering and everything to do with app security. Web apps in particular have
very different attack vectors (think XSS, SQL Injection etc) that your
standard install of iptables/shorewall is going to protect against

~~~
kennywinker
I only raise the issue because I happen to know that China is involved in
large-scale automated internet filtering and that in order to operate,
companies are required to implement those filters.

And the fact of the matter is, many many firewalls ARE used for content
filtering. For example, many workplaces block common time-waster sites. Home
wifi routers can be easily configured to filter adult content. Universities
block file-sharing. ISPs do bandwidth shaping. etc. etc.

------
reustle
When spoken, it sounds like 10gen.

------
guoxian1
asdf

