Hacker News new | comments | show | ask | jobs | submit login
Tengine - A variant of Nginx used on #1 Chinese ecommerce site goes open source (taobao.org)
57 points by devy 2062 days ago | hide | past | web | 37 comments | favorite

Hi guys,

I'm responsible for the Tengine project. We're very excited to see the news of Tengine appears on ycombinator.

Just a few clarifications:

Q: Why Tengine is 'forking' Nginx instead of committing the patches to the official Nginx?

A: First, we are developing our own Nginx version because we have strong requirements to enhance it. Our website is very busy (ranked #14 on Alexa's top sites list) and many features we need can't be done by writing modules. We would love to contribute our work to the official Nginx. We consider it as a great honor to share our achievements with the community. That's why we have open sourced it. We are also trying our best to contribute back to Nginx. Actually we have contacted the core members of the Nginx team last December, including Andrew Alexeev, the people in charge of their Business Development and their COO/GM, Maxim Konovalov. We asked how to collaborate with them. Their replied as following:

"It's interesting what you guys do and let's keep in touch. I'm not really quite sure right away in regards to what can be imported to the main branch, but hopefully we'll find things to collaborate on. We're a bit busy towards the end of the year, so probably a good idea to catch up in January."

More than two months have passed. We are still waiting for their requests. We are very confused because we don't know which features and bug fixes they think should be merged into Nginx. Some feature such as the syslog and pipe support are explicitly refused to implement; A 'bug fix' of the error_page directive I wanted to send to the Nginx developer but they thought that behavior was OK though many users think it's a bug... Frankly, we are a little bit frustrated. It's very sad that we haven't done too much things together yet. But the Tengine team are open to hear the ideas from the Nginx guys. And we're going to knock their door again.

Q: "input filters" in Tengine have anything to do with Chinese requirements for censoring?

A: No. It's just a mechanism to help implement something similar to Apache's mod_security. E.g. I have written a module to demonstrate how to fight the hash collision DoS attack: http://blog.zhuzhaoyuan.com/2012/01/a-mechanism-to-help-writ... BTW, please don't connect everything to censoring so rashly. The idea of 'people in China are willing to do censoring' is also stupid.

Would you be willing to put tengine code on github? It might be easier to send pull requests and patches.

Also, agreed on people not assuming the worst based on features.

Good idea. Thanks. Actually I'm considering it too. Maybe migrate the code base to github next month :)

Never heard of this until now, so was very interested to see the list of enhancements that they have made to nginx. Some of them look quite useful e.g. the logging enhancements and Input body filter support.

This feature caught my eye: - Combines multiple CSS or JavasScript requests into one request to reduce the downloading time

I wonder how they achieve this at the webserver level? Normally something like this is done as part of the deployment/compile process. Anybody familiar with Tengine care to comment?

You can get more information here: https://github.com/perusio/nginx-http-concat Thanks to António Almeida (perusio) for writing this :)

Thanks for the information. Looks like the app needs to support it too, using a custom URL syntax.

mod_pagespeed for Apache (by Google) also does this. Amongst many other automatic optimisations.

Tenginx was not a community project.

It's a commercial project for www.taobao.com, which ranks 13th global according to Alexa (higher than ebay, lower than amazon) So when the company has some custom feature, it can't wait for the original project to accept and update.

Tenginx going open source because we got tons of help from the community, not only Nginx project. And it's an honor to contribute.

Other open sourced projects include TFS(file system)/Tair(distrbuted cluster)/Webx(web framwork), all currently running on taobao.com. You can check them out at http://code.taobao.org/ . Most docs were in Chinese tho...

Currently the feature list on Tenginx might be limited, but who knows what would happen...Google forked WebKit into Chrome, that's nothing to do with feature list in the beginning.

It's really cool to see a website this big contributing back to FOSS, good stuff.

The opensource page suggests they're open to merging with 1.1. I'm hopeful that happens. It would be a shame for these patches to live on a fork.

Unfortunately the more interesting features -- input filters and file composition -- appear undocumented.

I think most people do not read the document http://blog.zhuzhaoyuan.com/2012/01/a-mechanism-to-help-writ...

I will give out a link to this document on http://tengine.taobao.org/, so more people will see it.

I wonder if those "input filters" have anything to do with Chinese requirements for censoring certain keywords? Or is this more like mod_security for nginx?

Also, how does combining CSS and JavaScript work? Can this be done at the webserver level without explicit support from the web application?

I don't think writing a huge list of banned keywords in nginx conf looks like a good idea. But it can be used in that way.

Why not give back to the root Nginx project rather than fork or have these features as compilable modules?

There's often a serious communication barrier between Asian open source developers and Western communities; partially a language barrier, but also other issues. The end result is that a lot of projects end up being de-facto forked by Asian open source developers -- this is where outreach can be very important, to help integrate the changes back in as soon as possible and start that necessary contact.

If anyone cares I can talk a bit more about this (as this was an issue with x264). There was also a session (with notes online) on the topic at last year's GSOC Mentor Summit.

I most definitely care and am very interested.

Are these[1] the notes you were referring to? I couldn't find the talk itself.

[1]: http://titanpad.com/fossinasia

There was no talk; it was an unconference.

I'll try to retell x264's story here...

Japan has long had a community of x264 developers and users, but for a long time they largely remained insular and didn't make much active effort to push their patches upstream. This wasn't out of maliciousness; there were many reasons.

The language barrier was a big problem -- less so that they didn't know English, but moreso that they didn't feel confident with it. In reality, we found the Japanese contributors to have way better English than they thought they did. People are often embarrassed or scared to use a language they're not confident with, and they worry about making mistakes and looking bad. Make them comfortable.

Note that this is especially an issue with Japan. In my experience, Japanese tend to be (typically) much less confident in their English despite equal or greater skill. This might be in part because the school system typically doesn't emphasize much 'real-time' conversational English.

This problem is more general than just language; open source IRC channels and mailing lists can be intimidating, and people (often rightly) suspect that they'll be mocked for making mistakes, so they just don't bother.

As is common in many non-English-speaking countries, the Japanese have many of their own tools and methods of communication that are unique to Japan. In this case, they had an ongoing x264 thread on 2ch. A friend of mine from the x264 user community offered to help. He was multilingual, able to speak near-perfect English and Japanese, along with many other languages. Using his translations, I answered questions for a few weeks on the 2ch thread.

Eventually it became somewhat obvious that one of the people posting in the thread was a Western developer; the Japanese jokingly called it the 'Black Boat incident', a reference to the arrival of Commodore Matthew Perry's fleet at Japan in 1853 (see http://en.wikipedia.org/wiki/Black_Ships). We convinced a few to drop by our IRC channel; we invited them, noting that difficulty with English was not a problem, and we had a Japanese speaker who they could converse with directly anyways.

A few came, some contributed patches, and a few stuck around; we now even have a small community of Japanese users who hang out in the main channels. One of them noted later that his (written) conversational English had improved vastly just by being on IRC for a few months and was now much more fluent.

Put simply, cultural and language barriers reduce peoples' confidence in communicating. They can also result in misunderstandings; open source developers, for example, tend to have a very blunt style of communication. In the best cases, this can mean they will ignore their own ego and debate decisions on technical merits, without pleasantries. In the worst cases, this can mean rudeness, intolerance, and general dickishness. Especially coming from a culture more heavily based around politeness, this can be daunting.

If you want to welcome a foreign community of developers, you should try to (list certainly not complete):

1. Have someone who speaks their language, so they can feel confident that they have someone to speak to even if they aren't confident in their English skills.

2. Contact them first; don't rely on them to come to you.

3. Be friendly and welcoming. If you have to, mute That Guy who insists on being rude and obnoxious. Mocking grammar errors or being needlessly blunt are quick ways to make people feel incredibly unwelcome. They probably speak your language much better than you speak theirs!

4. Give them extra help -- don't treat them like Just Another Patch Contributor. Your goal here is not just to integrate their changes, but to gain a connection to their developer community. "Patch rejected, I don't like it" is not a way to gain friends.

5. I really shouldn't have to say this, but apparently (from experience) I have to: seriously -- don't be a racist asshole. Particularly if you're inviting people to an IRC or high-traffic mailing list, there are often people (including devs!) who will make all sorts of insensitive comments. This needs to not happen. Yes, this means no stupid jokes about "roneriness" or Indian tech support.

Yes, this also means no stereotyping. Just because they're Japanese doesn't mean they want to talk about Naruto, and just because they're Chinese doesn't mean they really like General Tso's Chicken. I know you really love Korean culture, but just because they're Korean doesn't mean they want to talk about Girls' Generation and Starcraft. Don't treat someone from a different country as if they're some specimen under a microscope either. In short, avoid othering. Making people feel as if they are different and not wanted is a quick way to make them not want to come back.

Just a few things about not only a language barrier, but more importantly a culture barrier. Language Barrier is easier to break.

Then there is the obvious, IRC always tends to get lots of slang and jokes, that even Brits from UK wouldn't get what that US American are laughing about. And this put off a lot of Eastern Developers.

P.S - Nice to know DS read HN.

Guys, it's time to start learning foreign languages than your native...

It's a bit arrogant from English speaking people to force other people on the Earth to learn your native language.

I only just read this today. Thank you, very VERY much, for responding with fuller notes and reflections!

Great points! This needs to be its own blog post.

Thank you, thank you! This aspect is pretty important to a lot of us coming from Asia - even more so in India, where English is the politically incorrect, defacto national language. Despite that, there is a large gap in adoption of open source software in one of the largest markets in the world, where it is often a choice between being able to afford a Linux desktop vs not being able to afford a Windows computer at all.

It is a cultural gap that prevents most open source software to be architected to solve problems specific to Asia (be it support for complex Indic fonts to assuming expensive system hardware requirements). This gap not just prevents developers from working together, but the even more dangerous symptom is to not identify certain problems as being important enough[1] . I was not aware that this was (rightfully) considered important enough for a session.

[1 - very old. apologies for the rant] http://sandeep.wordpress.com/2009/08/23/harfbuzz-graphite-an...

As explained in the FAQ (which is in Chinese I'll give you that) they created the fork for a few reasons:

- Patch have been historically slow to be accepted in NGINX,

- Some of their patches have been specifically rejected, including the syslog/pipe one (and it's probably a crucial feature for them),

- They needed a place to share their pool of patches and enhancements,

Overall, I don't see anything wrong with that; isn't the hacker culture nowadays prone to forking (Github anyone?).

We are here in front of a big player of the Chinese Web space (Taobao is massive in China, we use it on a daily basis at the office as do hundreds of millions of other users) committing resources to release and maintain code in the pure spirit of OSS, in a country where that very specific behavior (sharing and openness) is still in it's infancy.

I applaud the guys at Taobao and hope this will help foster the OSS movement in China.

http://tengine.taobao.org/faq_cn.html From the chinese faq, they said nginx rejected some of the features like "syslog/pipe support".

I agree. Nothing in the feature list looks compelling for a fork versus making modules or submitting a patch back.

My guess is that the Chinese open source community is still developing at the early stage where forking & learning from those popular/successful open source projects is still a common scene. When community grows bigger, sharing/contributing would certainly be more common. It's certainly a good suggestion for them though.

If you can read Chinese, you can check out the link below. They are working on a list of projects (some of them are already opensourced) that handle "big data". In case you don't know, taobao is the ebay in China, with alexa ranking of 13.


This work is also worth checking out

http://openresty.org/ https://github.com/agentzh

This too has originated at Taobao, Alibaba Group.

It's great work. Recommend +1.

Feature #1: "Input body filter support. It is quite handy to write Web Application Firewalls by using this mechanism."

Correct me if I'm wrong, but "Web Application Firewalls" sounds like an API for censoring content to me. As much as I loath the idea of internet censoring, I would love to get a peek inside the technical workings of the famed "Great Firewall of China".

Applying the same logic, firewalls used to protect computers and servers are only there to censor content.

From the description, it sounds like it has nothing to do with content filtering and everything to do with app security. Web apps in particular have very different attack vectors (think XSS, SQL Injection etc) that your standard install of iptables/shorewall is going to protect against

I only raise the issue because I happen to know that China is involved in large-scale automated internet filtering and that in order to operate, companies are required to implement those filters.

And the fact of the matter is, many many firewalls ARE used for content filtering. For example, many workplaces block common time-waster sites. Home wifi routers can be easily configured to filter adult content. Universities block file-sharing. ISPs do bandwidth shaping. etc. etc.

> sounds like an API for censoring content to me.

Does all web frameork URL handlers or case statements sound like web censoring to you?

Context-dependant, yes. I.e. when it comes after "Input body filter support" and comes from a country where "firewall" is used to describe massive scale automated internet censorship.

If you know that's not the case, cite something.

kennywinker, every single technology can be used for "good" and "bad" purposes. Taobao's Tengine is a fork of Nginx, a great one at that, I might add. It has nothing to do with the Great Firewall. It's open source and the code is available for everybody to see.

The innuendo, knee-jerk, hypocritical, "guilty by association with big/bad Chinese gov" reaction toward a solid open source web server, shared for all to play with and use, is quite disgusting, frankly.

When spoken, it sounds like 10gen.


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact