Hacker News new | past | comments | ask | show | jobs | submit login
Migrating Dropbox from Nginx to Envoy (dropbox.tech)
424 points by SaveTheRbtz 8 days ago | hide | past | favorite | 237 comments

Also note that we’ll cover the open source version of the Nginx, not its commercial version with additional features.

It always kills me when very successful companies don't buy software from other companies.

I remember being at a lunch with a prospective client that really loved our technology. About 1/2 way through, he said he really would love to purchase our software, but the CEO doesn't allow them to use anything but OSS. What they make? Non-OSS software.

Just blows my mind.

In a business context, I'd definitely consider paid support for an Open Source product. But I'm not interested in a proprietary version that I can't modify or get third-party support for or otherwise work with in a pinch; I'm certainly not going to make a business dependent on it. Push the proprietary version hard enough and I'll reconsider whether I even want to use the Open Source version, or if it might be on more tenuous ground that might get undermined in the future (pushing back on improvements to the Open Source version to maintain differentiation, or worse, deciding to switch to a non-FOSS license in the future).

> Push the proprietary version hard enough and I'll reconsider whether I even want to use the Open Source version

Qt is pushing hard for commercial licensing (which I heard prevents you from using the open-source version), putting L/GPL FUD on their websites, and trying to track users of their installers more.

The model of "copyleft if your project is open, pay us if your project isn't open" is one where I have no problems or concerns, and will happily use the open version and recommend that people building something proprietary purchase a paid license. Nor will I typically worry about the motives or future of the project unless I have some other reason to. And the KDE Free Qt Foundation means I never have to worry about Qt going proprietary.

Does anyone know a good license for that? Maybe the Prosperity license?

EDIT: I've just realized that I want a revenue-limited trial, rather than a time-limited one. I basically want the Prosperity license, but with the ability to say "you have to pay me if your company makes more than $100k in annual revenue". Is there a license like that?

I've emailed the License Zero people, hopefully they'll do something for that.

> Does anyone know a good license for that?

The "that" in question was "copyleft if your project is open, pay us if your project isn't open". For that, try the AGPL or GPL, depending on your use case and customers, and then sell alternate licenses for people who don't want to make their own code open.

> Maybe the Prosperity license?

That isn't an open source license, despite its efforts to be ambiguous on that front. That and the even worse "commons clause" are exactly the kind of license that motivated the latter half of my original comment at https://news.ycombinator.com/item?id=24005833

The Prosperity license seems like a better fit to me, as the GPL/AGPL has a different set of constraints (e.g. customers who want to keep their code closed).

What's wrong with a license that's non-OSI "open" but gets developers paid from large companies to develop otherwise open software?

The comment you originally responded to said:

> The model of "copyleft if your project is open, pay us if your project isn't open" is one where I have no problems or concerns, and will happily use the open version and recommend that people building something proprietary purchase a paid license. Nor will I typically worry about the motives or future of the project unless I have some other reason to. And the KDE Free Qt Foundation means I never have to worry about Qt going proprietary.'

Software under a proprietary license is something I can't build other Open Source software atop of that I expect other developers to use and collaborate on. I don't want to have a forked ecosystem of proprietary-with-source-available software, I want to actually collaborate on Open Source software.

With Open Source, I'd feel confident that if we had to manage it ourselves, or fork it and add a patch, or get a third-party to develop a patch, or work with a dozen others with the same needs we have and collaborate on it, we can do so. It's reasonable to build an ecosystem or community or company around. You cannot replicate that with any non-open license; by the time you're done granting all the necessary rights, what you'd have is a non-standard-but-Open-Source license, at which point you'll get more traction if you use an existing well-established Open Source license.

I don't really care about encouraging the development of more proprietary software, whether or not it happens to have source available. There are already well-established models for getting people to pay for proprietary software. If someone is looking for a funding model for Open Source, and what they find is "turn it proprietary and generate FUD that it's as good as open", that's a failure. And when people are looking for Open Source and they find proprietary-with-source-available, it undermines decades of careful work and explanations by the FOSS community, and generates substantial confusion.

It's your software, and ultimately your choice how to license it. Various companies have tried the "source-available but still proprietary" model. Just please don't call it open or contribute to the FUD around proprietary-with-source-available licensing.

Speaking from experience, when encountering software under a proprietary-but-source-available license that tries to act like it's open, the response from companies that actually deal in Open Source is not "Ah, OK, let's pay them and use it", it's "yikes, get this away from us, how did this even end up in consideration, let's either use something Open Source or use something actually proprietary, not something by someone who is either confused about the difference or hopes other people will be". (The set of engineers and lawyers who deal with software licensing professionally, at various companies, tend to talk about it with each other.)

That makes sense. Unfortunately, as someone else in the thread has mentioned, the ways of monetizing OSS misalign the incentives between user and developer, and they haven't really been very successful anyway.

I develop multiple popular libraries that thousands of people use, yet I've never seen a single cent from them, which is fine for me because I don't develop them to make money. However, it's really hard to foster an ecosystem when companies who extract millions of dollars of value from FOSS don't feel like they need to give back.

> Unfortunately, as someone else in the thread has mentioned, the ways of monetizing OSS misalign the incentives between user and developer, and they haven't really been very successful anyway.

Many ways of monetizing it don't misalign incentives. As a user, I don't value support less just because software is more reliable; on the contrary, I trust the software in higher-value contexts because it's reliable, and in those contexts I need the support more. I don't value "host this for me" less just because the software is easy to install and configure (because I still don't want to be the system administrator if I don't have to). And "please develop this feature for me" has great alignment of incentives.

> However, it's really hard to foster an ecosystem when companies who extract millions of dollars of value from FOSS don't feel like they need to give back.

You're a lot less likely to get paid for software that's under an all-permissive license (e.g. MIT or Apache or BSD). It's unfortunate that so much of the ecosystem has settled around permissive licenses; with such licenses, your best strategy for making money may be "use the software to get hired somewhere or to build reputation for consulting". There's a reason companies love permissive licensing, and subtly (or unsubtly) discourage copyleft. Strong copyleft gives you more options to monetize something, either now or in the future.

That said, I also do agree that there need to be more good ways of funding Open Source.

Maybe you're right, I do tend to choose MIT/BSD usually. I think I'll switch to GPL, though I don't entirely agree with your first paragraph (I am less likely to pay for software if I can host it myself, for example, though possibly not by much).

Well, once I paid the fees to license our logo from subcompany A, and the rent on all my servers to subcompany B; we didn't make any money.

Yes, this is an example of a company I wouldn't want to charge for my software, at least until you're into six figure revenues.

Not sure but I think maybe they were mixing up profit vs. revenue and pointing out how a company can ensure it has zero profit. But I guess you'd also have to be careful about which entity or sub-entities are covered by a license and the revenue limits.

Ah, I see, by "subcompany" they meant their own subsidiaries. Yes, that's why I said "revenue" in the original post, thank you for the clarification.

Unreal engine charges a 5% royalty on sales after the first $1M in revenue. And source is all on github, so it's basically zero friction until you get big.


Also have a look at polyform licenses


Ahh, yes! I think I want a cross between the Prosperity and the Small Business license.

Qt is more and more trying to get away with restricting the open source version as much as they can without triggering KDE's escape rights.

And nowadays, most people use Electron instead of Qt.

I’ve been using Qt/C++ for a few months here and I’m really really impressed. It’s really nicely done, and the performance is fantastic. Honestly, I’d have probably considered Electron, but the software runs in a somewhat resource constrained environment and has pretty significant soft real-time processing requirements (the C++ part)

which is deeply unfortunate due to how resource hungry Electron is.

What if it were a single source-available version (that also allowed you to, say, get third-party support/customizations)?

If the licensing model asks for companies to pay for the "Enterprise" features which are still OSS, it would resolve this problem for me at least (not sure about the OP).

In general, its been my experience that the closed source enterprise only crap that most companies push for is exactly that: crap. I suspect its because those features are treated as a business expense, and thus built to keep costs low. Almost every time those features are underwhelming and buggy. If its OSS, at least I can contribute a patch; if the thing is popular, likely someone else has already fixed it.

Enterprise support is a fucking joke; they will delay delay and delay. If you push hard enough, they say "its on the roadmap" without giving any gurantee of when it will be fixed. The only time Enterprise support has really worked well is in my org that got the best support package for GCP. GCP's support for urgent issues and product feature requests has been somewhat reliable and predictable. Much more than literally any other enterprise (I'm looking at you Okta).

> GCP's support for urgent issues and product feature requests has been somewhat reliable and predictable.

Can you give an example of a product feature request that succeeded via that support channel?

- we had requested some way of being able to remove IAM permissions that went stale. GCP introduced a “recommendations” feature that indicates which IAM roles have been unused for a long time and can safely be removed.

- we requested a way to prohibit the provision of ILBs on shared VPC subnets without explicit grants; GCP introduced a role for that

- we had issues with the number of compute instances being too high on our VPC and reaching limits because of the number of vpc peerings. In the past, every GKE cluster created a new peering. We have a bunch of GKE clusters, and as we added more the Max number of instances we could provision was reduced significantly. GCP introduced and fast tracked a feature that enabled all GKE clusters to use a single peering rather than create a peer per GKE cluster.

There’s a bunch more. But on this front I have been a happy customer.

Thanks for the details!

If you're saying "source-available" as opposed to Open Source: complete non-starter. I'm looking for Open Source, not a faux knock-off of it; don't try to give me a subset of Open Source license terms that you think would placate me while not actually being open.

the free software people say the same thing about open source

I think a part of this is that engineers in particular have a preference to use software which can be treated as a transferable skill if/when they move on. They would rather use the OSS version of Nginx, of Envoy, because they know they will have access to it in the future. I think there is some aversion to becoming familiar with the features, functionality, and characteristics of a piece of software that your current employer is paying a non-trivial amount for, when you know that chances are your next employer will refuse. This may not be in the best interest of the current company, but it's a bias that I think impacts a lot of engineers.

It is a generation thing, back when I started the only free beer software was my own.

Even for code listings I had at very least to buy the medium where they came.

You can purchase the commercial version of Nginx and not use its specific paid feature subset. Alternatively, "I am familiar with this, and we can do x% of what we need with the OSS version, but we had to pay to get the last part."

I see what you're saying, but this is basically suggesting that Dropbox should have made a donation to F5 (a public company with an $8B market cap).

I think there is a valid point you're making for smaller companies that are providing both open-source and commercial versions of software, but I don't think Nginx is a great example of that.

Why does it blow your mind? Various obvious and sane reasons for this, including cost. I bet many of those companies you have in mind “buy” Windows and macOS, and if they are sufficiently big, most certainly buy Oracle or SAP for their corporate operations, finances, and accounting. It’s usually only the production side of sufficiently large internet-scale companies that is biased towards building. Most of the time you can easily explain it with “it’s cheaper than the contract with supplier”. Often, it is strategic ownership over your fate and not being locked into the vendor that comes into play as well. The vendor positioning in the market and its leverage may also change over time and pose a risk down the road (getting acquired, changing focus, abandoning the product, going out of business, etc.) Many times it is just that their problem is so unique to their scale that the generic solution does not technically work for them or the pricing model is not designed to be a fit.

In the particular case of nginx, I can tell you their reputation is not great in adapting to the users’ needs.

I think the point was a financially successful company not contributing to an open source project even after making a bunch of money just seems un-ethical? Maybe I'm old-school but I still think we should be supporting each other in this type of situation especially if one of us strikes it big? Sure - move away from Nginx but maybe throw some $ their way for the service they provided even if you don't legally have to ...

I did not infer that from the OP’s comment, but in any case, I don’t know the specifics of their arrangement or whether they had been a commercial customer or not. Last time I checked Nginx had been doing fine selling itself for half a billion.

But more abstractly, I don’t actually agree with that sentiment. I see more of a responsibility to give back in the form of patches and collaboration than throwing $ at the problem. I see the nginx approach of open source simply as a business tactic no different from Windows Home/Pro customer segmentation except Home is free for tactical reasons to kill off other competition. It is a calculated business move; if your business model sucks-—which it obviously did not in nginx case—-does not imply others are acting less than ethically or they should pay you out of pity. (That said it might be strategically important for them to keep your head above water and survive for their own benefit as their vendor, but that’d be a different angle.)

I suppose the difference between free software vs open source is also relevant to this discussion, and I could relate to your sentiment when facing the former much more than the latter.

All fine and dandy, except that those patches and collaboration don't pay bills.

Often big companies employ people directly to work on open source projects that they use heavily. That does pay the bills.

Big companies like Dropbox....

> patches and collaboration don't pay bills.

implying that just because your open source project is being used, that it is entitled to fund the bills of the project maintainers.

I think patches and contributions are a form of bill paying.

Patches and contributions take a non-negligible amount of time and resources to review, test and integrate, as well as adding to the ongoing maintenance burden. They might be welcome, but they are absolutely not cost-free and I wouldn't consider them a "form of bill paying", the benefit (if any) is far too indirect and it doesn't directly help the bottom-line in any way.

Something should pay for the bills and if the oss project creates lots of real value then I would prefer to live in a world where some of that value goes to pay the bills. The alternative world simply discourages oss since devs would have to work other jobs. There is qualitative difference when you have a dedicated core team vs just everyone contributing patches.

When supermarkets and landlords start accepting them, then yes.

Could it be that OSS is simply not a sustainable business model for the long haul and it was simply successful in a period of history when vast money was made quickly by landgrab expansion of technology to consolidate/provide many basic services and the code itself wasn’t the competitive differentiator? I don’t know but that’s a possibility too. I question why one would be concerned in keeping OSS alive, as a business, assuming it cannot survive on its own feet. There’s no inherent reason OSS should somehow forcefully live. It’s already changing its character via AGPL and Mongo license-style things in the face of AWS cloud simply deploying and milking cash.

(The above is assuming the concern that it is funding that’s a problem today; I don’t quite see it that way [for instance, I strongly suspect Nginx to have made more money than DBX so far, so who are we to say who’s been more successful; market cap ain’t everything], but that’s a hypothetical to think about.)

Moreover, supporting a project does not equate supporting its existing maintainers. It could mean taking some partial ownership including the review side and having some developers on your own payroll. Seems like that’s how the big project are done most of the time. The Open Core model we are focusing on is a niche and arguably more akin to fremium products than free software as a thing with communal ownership.

> OSS is simply not a sustainable business model

OSS is not a business model, but more closely matches charity and non-profits, and run on donations and altruism of their users.

i find it annoying that people here keep saying that a company _should_ pay for their open source software usage just because they have money to do so. They don't have an obligation. They could donate - and some do - but it is in no way required of them, regardless of how much value they derive from using said OSS.

Open-core projects, which has a somewhat useless core and a paid for 'enterprise' version is not, under my eyes, a proper OSS project, but instead is a way to market their proprietary product.

The serendipity was GPL getting uptake thanks to Linux and GCC.

Linux via the ongoing lawsuit with BSD back then, and GCC because UNIX vendors started charging for their compilers, with GCC being the only alternative available.

However everyone needs to pay their bills, therefore the push for non-copyleft licenses, thus in a couple of years GPL based software will either be gone, or under dual licenses.

You already see this happening with BSD/MIT based alternatives to Linux on the IoT space, NuttX, RTOS, Azure RTOS, Zephyr, Google's Fuchsia, ARM's mbed, Arduino, ...

>> "There’s no inherent reason OSS should somehow forcefully live"

What on earth does this tirade even mean? Every business lives 'forcefully' and fights for survival. Sometimes it comes with values, i.e. we dont use child labour in DRK to mine thallium, fairtrade, organic, etc. OSS is one of those values.

Is there a business that lives 'effortlessly'?

FOSS doesn't have anything to do with values, or do you refuse FOSS software tainted by corporation's contributions that don't share your values?

Because then it is going to be a very thin selection available.

FOSS software absolutely has value in an of itself, and I will take it even if it comes from satan himself.

As chirchill once said: "If Hitler invaded hell I would make at least a favourable reference to the devil in the House of Commons."

Get that “implying” crap out of here, this isn’t 2008 4chan.

For what’s it worth, many companies have a hard time justifying “unnecessary” expenses to their boards or shareholders. Depending on company structure, their hands may be somewhat tied.

Not all companies, of course; and to be clear, I think such a company structure is a problem itself and agree with you.

I think you'd be hard pressed to find a board or shareholders who think 'support contract for essential component of our infrastructure' is 'unnecessary'.

To clarify, I meant that it can be hard to convince the board to donate money to an OSS project when there is no "need."

Just to be clear: the product I was sell was not OSS and the product they were building was not OSS. That's why it blew my mind.

The subject of monetizing opensource software is a tricky one. Some companies pursue the Open-Core principle, others monetize through the consulting services or cloud infrastructures.

As for investing into opensource, Dropbox is trying to do that when possible, for example we (along with Automattic) did sponsor HTTP/2 development in Nginx.

Personally I think that monetisation of open-source goes against the consumer of the OSS in practically all cases.

- Open-Core::: Features are not added to core, as they want people to upgrade.

- Consulting::: Ease of use is ignored, as if it's too easy people won't need consultants.

- Sponsoring Goals::: Software is almost held at ransom, until goals are reached.

The best way to help open-source software is to donate or contribute code... if you're trying to maximise profits, then just make it propitiatory

> - Consulting::: Ease of use is ignored, as if it's too easy people won't need consultants.

Some problems can only be made so easy. Some problems require custom work. Sometimes you need paid support not because the product is low-quality but because you need to know that you can call someone at 3am because your service is down. There are lots of reasons to have consulting.

> - Sponsoring Goals::: Software is almost held at ransom, until goals are reached.

You're assuming the work would get done one way or another. Sometimes people have many other things they could be doing, and they need to justify spending more time on a project than they already do. Or sometimes, people have a fixed amount of time but they're happy to prioritize things people want and will pay for.

(No argument about open-core; that definitely has problems.)

Other great approaches include hosting the software as a service. Depending on the nature of a project, many people may want a service whose primary value proposition is "we'll host this for you so you don't have to maintain and administrate it".

From my (admittedly) limited experience, paid support isn't consulting.

Paid support surely is, as you say, about calling someone at 3am and having them look into an incident.

From my experience, that's not about helping you get the most out of the product, and a hand in tailoring it to your needs - that's the consulting part, and is usually paid for separately (and at much higher rates).

You can just charge companies that make over a certain amount of annual revenue. Then OSS and small companies can use your software fine, but when they get big they have to pay you.

Do you have an example of sponsorship goals actively gating software development? I haven’t seen this one. “I’m not patching this zero day until I get to $1,000,000!”

Isn't that what RHEL is, in different words obviously.

I suppose, but you know you’re what you’re buying when you sign up for RedHat. I was trying to imagine a scenario where free OSS project does that, like Kubernetes or React.

Disclaimer: I work for Red Hat but opinions are my own

I totally disagree. Red Hat patches/maintains things regardless of whether people pay for it. Everything is always available open source. There are numerous derivatives of RHEL that get these for example.

The money you pay for Red Hat stuff is for support. There are always free-as-in-speech and free-as-in-beer alternatives of red hat products.

It might very well be my misunderstanding, so I apologize, but doesn't RHEL get security updates that are unavailable to the "rest of us" for a bit?

Since when does blender 3d offer consulting?

It's not necessarily about money. An engineer can burn through tens of thousands of dollars per month in cloud spend because they have access to the AWS [or] GCP console, but that same engineer may not have the first idea about how to get the CFO's sign-off to purchase a license that will facilitate a halving of that spend. And that same CFO can institute a policy against using credit cards for recurring payments that prevent that engineer from expensing the purchase through a corporate card. And the software company may not offer a bill-via-invoice option — or they may only offer it for amounts greater than the amount the engineer wants to spend.

So much of what happens in sufficiently large organizations has nothing to do with profit maximization. Think confederacy of dunces, not a conspiracy of greedy evil geniuses.

Exactly my thoughts when I read the article - a hugely successful company not contributing to an open source project which enabled them to succeed in the first place ...

There are different paths companies take. Some buy and it really works for them and their business, since overhead is small and everything just works. The other set of companies have more sophisticated requirements: when they want to have full control on what is going on, understand what the code is doing to better optimize everything else around it, faster shipping cycles and being able to implement what you want with out waiting for the next shipping cycle with commercial software, community and knowledge base around it etc.

> when they want to have full control on what is going on, understand what the code is doing to better optimize everything else around it, faster shipping cycles and being able to implement what you want with out waiting for the next shipping cycle with commercial software, community and knowledge base around it etc.

I'm a bit confused by this - I work for HAProxy Technologies and we do have an enterprise product. Many of our customers contribute code directly into the community and we backport those features into the latest enterprise stable version. This means they do not have to wait until the next shipping cycle to take advantage of a new feature. There's also a large community & knowledge base around HAProxy.

Your reasoning may be right when dealing with "closed source enterprise software" but it doesn't line up when we start talking about open source/open core.

Shipping cycle is one of the reasons mentioned. And as you can imaging, unfortunately, contributing to Nginx open source is not an easy thing (but they have a great product for sure). If HAProxy is different in terms of contributions - it is great!

What stops them buying commercial licenses for Nginx and then using the OSS version? They're not obligated to, certainly, but I hardly think Nginx would say "you must use only the commercial version".

Just speculation, but there are motives to not use the closed source version beyond purely profits driven point of view. One of the prime benefits of OSS is that you have the power to change it whenever necessary. If something is breaking bad- you might not have the luxury of waiting on support to track down and fix your problem. If you don’t need the features of the paid version- then using the paid version is actually limiting your options.

I don't see what's surprising -- companies that earn money selling product can make even more money by cutting costs. And if OSS software gives them an equivalent (or even better) solution, why wouldn't they use it? For any sizable production deployment, the cost of nginx licenses could be applied to hire a number of engineers to help maintain the OSS software.

I don't know what their volume licensing is like, but at $2500/server list price, costs add up quickly.

Isn't this simple economic reasoning?

If you buy something or worse you have to pay license fees on a regular base your earnings will be smaller.

We live in a world that is driven by economic growth so the ultimate goal is to maximize profit.

Of course this has a moral aspect to it as well and I see it but in this case I think it is not outraging enough to be something on the scale of a scandal.

Many businesses use ideas or products for free to start a successful enterprise that earns a lot of money.

In germany its the opposite, no free software in production! Only software with enterprise support!

We Germans are very risk adverse (i hate that sometimes)

That's true in many US companies as well. People like having a vendor they can fire when things go awry, rather than they themselves getting fired.

Reminds me of private companies who profit from public resources.

Like selling tap water in bottles.

Is tap water not sold for commercial use at market rates? The public resource steward is leaving money on the table if they aren't.

This also rubbed me the wrong way. As an individual I think that shows selfish and opportunistic behaviour and it raises a red flag about that organisation in my mind.

However, for profit companies are not here to do what’s “correct” they’re here to make money for its investors. If I had decision making abilities at Nginx I’d be conducting a comprehensive review of the free OSS offering and redacting the features and overall value with extreme prejudice.

Dropbox never paid because it COULD not pay. If you have an enterprise, paid version of your OSS product it has to be impossible for an enterprise to use it for free.

> If you have an enterprise, paid version of your OSS product it has to be impossible for an enterprise to use it for free.

Why? Most enterprises, especially ones that aren't tech firms, are going to shell out for enterprise support even if there are no additional features. Crippling the community version doesn't necessarily help enterprise sales, it can reduce overall mindshare reducing enterprise traction or, worse yet, mean that a third-party downstream edition with richer open-source features becomes dominant and it's creator gets “your” enterprise support contracts.

> shell out for enterprise support even if there are no additional features.

i don't feel this to be true.

Also, an enterprise that's large would want some features that are irrelevant to a small shop. For example, single-sign-on integration with various providers.

I do though ... Imagine a manager being dependent on a system he ~~bought~~ installed for free, which he didn't buy support for, and is now malfunctioning. It's his fault. And this is what I've seen in practice as well.

Just think about the commercial success of SUSE Linux?

> However, for profit companies are not here to do what’s “correct” they’re here to make money for its investors.

While partially true, this is overly reductive. Companies can and often do take actions that serve goals beyond "increase upcoming quarterly profits".

You can't redact the features that are already open sourced.

And besides, if that were to happen people would just go behind some other open source web server, and push that.

This ain't mind-blowing by any means IMO.

If the said company has unknown track record, then doing business with them is risky.

What if the company goes out of business in near future? Or get acquired (actually I think A lot of infra companies's end goal is to get acquired)? What if they raise the price out of sudden? How extensible/customizable their solution is?

The trust is the key here. If I am in the position to buy software from somewhere and cost isn't the primary concern, the money would goes to a known/stable figure in the industry.

In the case of buying from a small company this can make sense. If they fold it is good to know that the software will still be around.

Business is motivated to avoid anything which is a tax. Said another way, they are motivated to avoid or escape from anything that grows in line with earnings. If their infrastructure grows, their bill from nginx will grow, modulo the skills and efficiency of their infrastructure teams and the speed of whatever servers they are buying.

I’m increasingly concerned about being screwed by non-OSS vendors. Imagine a use case like Slack. Say you have an employee that goes to visit a family member in Venezuela & connects to the company Slack. Slack has been given a mandate to terminate accounts for people in Venezuela by the Trump administration, and now your key employee is cut off from communication, or perhaps your Slack account gets flagged.

HR/ The company should be providing advice on going to "at risk" areas.

Also if your going to china take a disposable phone and a laptop that is clean ands can be wiped on return.

That not a non-OSS issue. That's a SAAS issue.

Even if your SAAS was OSS, they could still deny you access as you're inhibiting their server, not your own.

Fair point.

In my company thousands of CentOS servers were running, we still had the support license though.

You have a good problem. What sucks is when you sell a foss solution and they want paid support and SLA but the foss maker does not want free money in form of closing out issues/bugs/features they might anyways workon without getting paid for it.

I fully agree. One other good point with paid software is that is more long term. It will be supported as long as there is money involved.

Just look at the JS ecosystem. Everything is for free. But also shitloads of crap. A lot of libraries left unmaintained.

Not sure what nginx is like, but in my experience, the developer/operator experience of commercial software tends to be subpar. For instance, when I worked at a shop that used a ton of Red Hat software (millions of $$ per year in licensing), the commercially-supported versions often were a pain, with requirements like phone-home (that didn't play well with the mandatory corporate proxy), documentation behind a paywall and hard to discover (yes, we had login accounts, but Google couldn't index it), and other disadvantages. The OSS equivalents were easier to access, had better (or at least better-indexed) documentation, and we didn't need to worry about per-seat licensing (again, we were paying for it, but we still had to track it).

If you're going to sell software that has an OSS variant, make sure the commercial experience actually outshines the free one.

I agree, we (at Red Hat) try so hard to make awesome documentation but then put it in hard-to-reach places. I really wish we didn't do that. I'd like to see us publish it all widely.

That said you'd be amazed at how much of man pages is written by Red Hat but isn't attributed, so nearly everybody on every distro benefits from our documentation without realizing it.

Makes sense actually. Your motives are conflicted so you can't see it.

Also if I can ask, is your product also closed source (in any nature at all), but made with open source components?

I feel so old now. There was a time, when I used to discuss with senior engineers @ Yahoo! to use NginX over Apache. Nginx was the hot thing, popularizing C10k [1]. Now in my current team, I have junior devs in my team pushing for Envoy over HAProxy/Nginx setup.

Is this trend happening primarily because devs are pushing for GRPC over REST? What benefits does Envoy offer over Nginx, if you're still a REST based service. I am not fully convinced of operational overhead that NGINX brings.

[1] https://en.wikipedia.org/wiki/C10k_problem

The sibling comments point towards the difference in configuration if you take the "out of the box" product. But there is also a vast difference in how code is organized, in case you ever have to touch it.

From my point of view Nginx feels "old". It's a C codebase without a great amount of abstractions and interfaces, and instead having a bunch of #ifdefs here and there. Unit-tests and comments are not to be found. Build via autotools.

Envoy looks as modern as it gets for a C++ codebase - apart from maybe the lack of using coroutines which could be interesting for that use-case. It uses C++14, seems to be extremely structured and documented, has unit-tests, uses Bazel for builds, etc.

So I think the experience for anyone being interested in working on the code will be very different, and people that prefer the style of project A will have a very hard time with the style of project B and the other way around.

I looked around at the code in Envoy.

"As modern as it gets"? Very, very far from it. Everywhere I looked it was all-over public virtual functions. It looked, more than anything, like Java, which is essentially, more or less, C++92 with some bells on.

The code might be OK, but, as with typical Java code, everywhere I looked was boilerplate, hardly any of it doing any actual work. I would hate for somebody to look at Envoy and think that was what good, modern C++ code looks like.

Virtual functions are a good answer to certain problems that come up, once in a while--in C, for such a problem, you would use function pointers. Inheritance is a pretty good answer to certain problems that come up a little more often.

But neither is a good answer to any organizational need, and a big project that reaches for virtual functions and inheritance as first resort makes me shiver.

> uses Bazel for builds

Is this unanimously good? I've heard both praise and horror, never used it myself.

One of the senior engineers once said to me that "Bazel is like a sewer: you get back what you put in."

Bazel requires a lot of upfront effort but the power of (a programmatically accessible/modifiable) dependency graph and a common build/test system across all the languages is very hard to underestimate.

> very hard to underestimate

Are you sure?

It's good for Dropbox, since they use Bazel.

The operational overhead shifts to more API stuff, so people can write 100 lines of code instead of modifying 1 line of config, it feels like.

This is never going to end as more things shift towards being core APIs that allow you to write code instead of configure things. It's not even configuration-as-code, it's just code managing configuration files.

edit: I think my comment comes across maybe kinda rude. My beef with Envoy is that the documentation is _extremely_ complex, and I've repeatedly asked 'How do I get started with xDS?' and been pointed to the spec, which I think took some time to read through and when I asked others about how to setup LDS/RDS/CDS/SDS was met with a like 'what are these things...? just use xDS,' which led me to a lot of frustration. This has been my experience each time trying to approach Envoy, and xDS.

I think the problem with xDS is that their example go-control-plane repository is completely useless. It's overly complicated with frightening-sounding details that don't matter to someone experimenting ("you MUST MUST MUST CACHE THIS how to do so is an exercise left to the reader").

I ended up reading the specs and found them very clear, and wrote my own xDS implementation: https://github.com/jrockway/ekglue/blob/master/pkg/xds/xds.g... I did this after reading the source code for the most popular xDS implementations and finding myself horrified (you know the popular xDS implementation I'm talking about). Now I have a framework for writing whatever xDS server I desire, and it can be as simple or as complex as I want it. For example, for my use cases, I'm perfectly happy with a static route table. It is very clear what it does, so I have that. What annoyed me was having to configure the backends from Kubernetes for every little service I wanted to expose to the outside world. So I wrote ekglue, which turns Kubernetes services and endpoints into Envoy clusters and Envoy cluster load assignments. This means that I never have to touch the tedious per-cluster configs, and still get features like zone aware load balancing. And I don't have to take on complexity I don't want -- the woefully under-specified Kubernetes Ingress standard, service meshes, etc. (I also plan to use ekglue for service-to-service traffic because xDS is built into gRPC now... just haven't needed it yet. It's great to use the same piece of software for two use cases, without having to maintain and read about features I don't need.)

TL;DR: take a look at the spec. It's really well thought out and easy to implement. Just don't cut-n-paste from Istio because they got it really wrong.

On that note, the grpc spec specifically calls for load balancing that doesn't actually do the proxying, but instead hands out assignments, with the server passing it's current load back to the load balancer service. it sounds like in this case the grpc client is using some array of xDS, but the server is using xDS along with...?

I feel like xDS is a relatively new addition to gRPC. I think there is another parallel implementation inside gRPC of external load balancing, which may convey server load information back to the gRPC client.

I looked up the current state of the xDS code, and there's a lot more of it than I remember. The EndpointDiscoveryService based gRPC balancer is here: https://github.com/grpc/grpc-go/blob/master/xds/internal/bal.... It appears to balance similarly to Envoy; locality-aware with priorities.

(That doesn't surprise me because I don't remember any field in the ClusterLoadAssignement proto that sends load information back to the client. Health, yes; load, no. But I could easily not remember it being there because it hasn't been something I've tried to implement.)

But yeah, the way to look at endpoint discovery is like DNS. DNS can return multiple hosts, and clients will spread the load by picking one at random (sometimes, if you're lucky). EDS is similar to this, but is a streaming RPC protocol instead of connectionless UDP, so it's theoretically easier to operate and monitor.

The other xDSes do more things -- CDS lets you discover services (that EDS then gives you endpoints for). RDS lets you make routing decisions (send 10% of traffic to the canary backend). SDS distributes TLS keys (and other secrets). ADS aggregates all of these xDSes into one RPC, so that you can atomically change the various config versions (whereas requesting each type of stream would only be "eventually consistent"; doing it atomically is good where route table changes add new clusters, the client is guaranteed to see the new cluster at the same time the route to that cluster becomes available).

It is all somewhat complicated but very well designed. This reminds me that I want to look more deeply into gRPC's support of xDS and add some integration tests between gRPC and ekglue.

Yep, gRPC is the new toy for distributed computing, after everyone realised that DCOM, CORBA, RMI, Remoting actually made sense instead of parsing XML and JSON text formats all the time.

I had to chuckle as well when I read that article and the part about gRPC. Seems like the pendulum is swinging into the other direction again - back to where we've already been ten or twenty years ago. New name of course, but same concepts.

One really starts to feel old at such occasions.

It certainly does look like that. I do think though that we've learned a number of central lessons in the process:

- treat messaging as a first class concept, not something to hide & abstract away.

- do not attempt to implement polymorphism in a messaging protocol. Do not bind your messaging protocol to a programming language's type system (they serve different purposes).

- bake fundamental monitoring & maintainability concepts into the protocol (e.g. intermediaries must be able to understand what responses are errors).

- have a well understood, simple backwards and forwards compatibility story.

- etc.

All of this is stuff we didn't understand in RMI or CORBA or SOAP etc. REST was a great wakeup call, both in simplicity and some of the messaging protocol concepts (such as error modelling). It is missing the application level binding - there's just no good reason why you wouldn't have a statically checkable method/request/response type binding.

I am a bit weary on whether gRPC will go over board again in complexity. We'll see.

Sure we did understand it, that is why they used an IDL. independent of programming language's type system.

Apparently DCE IDL now comes in proto files.

What newcomers did not bother to understand is why we were using those formats in first place.

Rest assured, maybe in 20 years we will be introducing this cool RPC protocol, based on YAML on something. Thankfully by then I should be retired.

I think IDLs are an important point among many others (I attempted to list some above). I think we might be missing substantial improvements if we'd just say "oh so we're back to a static IDL description, all old is new again".

And even within IDLs, we've made major progress. Compare the mess of SOAP's data type system, various attempts at inheritance and polymorphism in SOAP and CORBA, pointers in CORBA etc.

What's the beef with polymorphism?

I read that part as:

Protocol Buffers are good enough to make us forget the traumas caused by CORBA.

Totally get it! The team (@veshji and @euroelessar) struggled a bit in convincing me that the new Envoy way is a simpler one. I do not regret giving in.

Operationally, there are many differences (esp. around Observability) but if I were to distill it down to one thing it is a clean separation between data- and control-plane. This basically means that it was designed to be automated and the automation layer (xDS) itself runs just like any other normal service in production.

This is just the software industry. Maybe it’s because we’re so young. Maybe it’s because software is relatively easy to change and experiment with.

Who knows. All I know is, it’s exhausting, and ultimately it’s terrible for the end user. We have no idea what we’re doing when we pull in a new dependency like this. There’s tiny corner cases we don’t think about, and those get passed on to the user.

Innovating is fun, but exhausting in aggregate.

Envoy is lot more configurable and rivals nginx on performance (especially throughput). Codebase is a lot more manageable (but that’s my personal preference). Runs circles around nginx on observability features.

Really great post. I'm glad the post in particular mentioned community, because I think in the end this is the huge advantage Envoy has over NGINX. NGINX, could, in theory, resolve all technical issues raised in the post. But the fundamental tension between the open source and commercial versions cannot be resolved.

(Disclosure: We use Envoy as part of Ambassador, and so of course we're big fans!)

I know some people might find it a little controversial, but I’m super excited about our load balancing future and that we probably have the biggest Envoy deployment in the world now. When we moved most of Dropbox traffic to Envoy, we had to seamlessly migrate a system that already handles tens of millions of open connections, millions of requests per second, and terabits of bandwidth. This effectively made us into one of the biggest Envoy users.

Well, a single server doesn't really need to do more than 10Gbps or 100k connections. Going above is a "simple" matter of managing horizontal scaling.

What I wonder about is how do you distribute the traffic on the higher level? I imagine there are separate clusters of envoys to serve different configurations/applications/locations? How many datacenters does dropbox have?

I was running a comparable setup in a large company, all based on HAProxy, there was a significant amount of complexity in routing requests to applications that might ultimately be in any of 30 datacenters.

We had a large rundown of our Traffic Infrastructure some time ago[1]. TL;DR is:

* First level of loadbalancing is DNS[2]. here we try to map user to a closest PoP based on metrics from our clients.

* User to a PoP path after that mostly depends on our BGP peering with other ISPs (we have an open peering policy[3], please peer with us!)

* Within the PoP we use BGP ECMP and a set of L4 loadbalancers (previously IPVS, now Katran[4]) that encapsulate traffic and DSR it to L7 balancers (previously nginx, now mostly Envoy.)

Overall, we have ~25 PoPs and 4 datacenters.

[1] https://dropbox.tech/infrastructure/dropbox-traffic-infrastr... [2] https://dropbox.tech/infrastructure/intelligent-dns-based-lo...

[3] https://www.dropbox.com/peering [4] https://github.com/facebookincubator/katran

Katran - nice! Any issues with it at all? Do you use it with xdp capable hardware or just normal driver offload?

It works beautifully. We use driver offload (i40e on the Edge.)

Cool to see someone using Katran in production. Really interesting stack you have there.

Actually, all the props for that go to Katran's author himeself. When we hired Nikita V. Shirokov (tehnerd), the first thing he did was replacing IPVS with XDP/eBPF-based Katran, which improved our Edge servers throughput by 10x, from ~2MPps to ~20Mpps.

He also contributed a lot to Envoy migration migrating our desktop client to it and adding perf-related thing like TLS session tickets' lifetime to SDS.

Great. Exactly what I was looking for =)


"we have an open peering policy"

That's a bit of a lie given you have a minimum 50Mbps requirement before you even consider a peering request.

I would call that Selective, not Open !

It's interesting almost no web server provides an easy way to deal with multi-tenant multi-domain architectures in a good way that includes automatic SSL.

Caddy is the closest, but still not near enough.

There is this small segment of the market that we operate in that requires thousands of TLS connected domains to be hosted behind a dynamic backend. It's services like Tumblr, Wordpress.com, or any other hosting service where you can get a "custom domain" to point to your own blog or site.


Apache - Nope.

Caddy - Can do (but need lots of workarounds)

Envoy - Nope.

Everyone focuses on a few hand-coded domains and no automatic TLS. Maybe this part of the market is too small anyway. Sigh.

Several companies use Caddy for exactly this purpose. Fathom Analytics for example uses it for their custom domains feature. Caddy can even reactively provision certs during TLS handshakes. It's a native feature. Why does it require lots of workarounds?

Yeah I'm not sure what they're getting at, I've used Caddy as well for similar "custom domain" features, it was super easy. Thanks for creating it!

Yes. Caddy is what we use, since not much else can do it as easily as Caddy can. And it's our go-to tool for several projects that require custom domains. And we really, really, appreciate it!

I'm just saying that it's not something that is documented well or purpose built for that scenario.

Is there any mature integration to achieve this with Kubernetes?

You can definitely “lazy load” TLS certs into Envoy.

The SDS (Secrets Discovery Service) supports this, and is touched on in TFA: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overv...

You provide a gRPC service that can return the keypair needed for any host, with host config also being dynamic.


We are using OpenResty with lua-auto-ssl for exactly this purpose, and it works like a charm.

Lighttpd seems to have solutions for that. Did you have a look at it?


> A traditional problem with SSL in combination with name based virtual hosting has been that the SSL connection setup happens before the HTTP request. So at the moment lighttpd needs to send its certificate to the client, it does not know yet which domain the client will be requesting. This means it can only supply the default certificate (and use the corresponding key for encryption) and effectively, SSL can only be enabled for that default domain. There are a number of solutions to this problem, with varying levels of support by clients.

Then, the best approach seems to be the following:

> Server Name Indication (SNI) is a TLS extension to the TLS handshake that allows the client to send the name of the host it wants to contact. The server can then use this information to select the correct certificate.

Traefik makes it fairly easy (inasmuch as it makes anything easy). But it's just a proxy, not a web server.

Traefik can work yes - documentation is really terrible though for anything approaching that use case. We had to really futz around with it and eventually went back to Caddy. Our use case is several thousand client domains that just proxy to some backends.

Did you move to caddy 2?

Our initial use-case was ingress for docker swarm - after a fooray into k8s with the "traditional" nginx ingress with its rather hackish let's encrypt contraption.

I briefly looked at caddy 2 - but wasn't able to find any out-of-the box tricks for listening to docker messages and dynamically configure sites in a sane way.

Do you use custom code and configure caddy via the api?

> I briefly looked at caddy 2 - but wasn't able to find any out-of-the box tricks for listening to docker messages and dynamically configure sites in a sane way.

Like this? (Am not a Docker user, but I know this is an insanely popular solution) https://github.com/lucaslorentz/caddy-docker-proxy

There's also a WIP ingress controller: https://github.com/caddyserver/ingress/

I would think the way to do this would be to run a separate TLS daemon that handles the certificates (including acme challenges, presumably) and then pass the socket to your http server, either by proxying it (preferably to a unix socket), or like actually pass the FD with the session keys.

I don't think hitch (formerly stud) supports acme challenges, but that's where I'd start.

Apache can do automatic TLS with mod_md.

> C++14 is not much different from using Golang or, with a stretch, one may even say Python.

That's... definitely a stretch.

One of my friends brought me up this post in the morning. The post is awesome and inspirational (caused a discussion in our chant group), though I can't agree with some trivial points.

> Nginx performance without stats collections is on part with Envoy, but our Lua stats collection slowed Nginx on the high-RPS test by a factor of 3. This was expected given our reliance on lua_shared_dict, which is synchronized across workers with a mutex.

The `a factor of 3` is quite large to me. Maybe you put all your stats in lua_shared_dict? You don't need to synchronize the stats every time. Since the collection regularly happens in per-minute frequency, you can put the stats as Lua table, and synchronize them once per 5/10 seconds.

It look like that the compared Nginx is configured with a system which has been survived for years and not up-to-date. The company I worked with used a single virtual server to hold all traffic and routed them with Lua code dynamically. And the upstream is chosen by Lua code too. There is no need to reload Nginx when a new route/upstream is added. We even implemented 'Access Log Service' like feature so that each user can have her favorite access log (by modifying the Nginx core, of course).

However, I don't think this post is incorrect. What Envoy surpasses Nginx is that it has a more thriving developer community. There are more features added into Envoy than Nginx in the recent years. Not only that, opening discussion of Nginx development is rare.

Nginx is an old, slow giant.

We've made a note about how inefficient our solution was and what was the plan to fix it. Sadly, to get proper stats in nginx we needed two things:

* C interface for stats, so we can would have access to from C code.

* Instrument all `ngx_log_error` calls so we would have access not only to per-request stats but also various internal error conditions (w/o parsing logs.)

That said, we could indeed just improve our current stat collection in the short term (e.g. like you suggested with a per-worker collection and periodic lua_shared_dict sync.) But that would not solve the longterm problem of lacking internal stats. We could even go further and pour all the resources that were used for Envoy migration into nginx customizations but that would be a road with no clear destination because we would unlikely to succeed in upstreaming any of that work.

> The `a factor of 3` is quite large to me. Maybe you put all your stats in lua_shared_dict? You don't need to synchronize the stats every time. Since the collection regularly happens in per-minute frequency, you can put the stats as Lua table, and synchronize them once per 5/10 seconds.

Any pointers on how to achieve this for someone just starting out with lua and openresty? I have the exact same thing (lua_shared_dict) for stats collection, would love to learn a better way.

nginx had cold (for American standards) and conservative community to begin with, commercial version and F5 ownership likely "closed" it even more

it's a pity that community never evolved with nginx growth and success

If anyone was wondering, this is solely for proxying, not for oldschool web server functionalities, eg. static file serving.

We've actually started experimenting with converting our static file serving to "proxying to S3 + caching." This is simpler from deployment and development perspectives (for companies that do not have a distributed filesystem, like Google with its GFS):

* for deployment we do not need to maintain a pool of stateful boxes with files on them and keep these files in sync.

* for development, engineers now have a programatic interface for managing your static assets.

I'm positively surprised that Dropbox (at least from what I understood from the post) didn't require lots of changes or patches on top of the upstream codebase of Envoy to migrate their traffic!

We did require some of them[1]. Esp. painful were Transfer-Encoding quirks, and some dances around old HTTP/1.0 backends and request buffering.

Compared to NGINX though, it was relatively easy to push these fixes upstream. Community is very welcoming to outside contributions.

[1] https://dropbox.tech/infrastructure/how-we-migrated-dropbox-...

We do have some local patches as well (mostly for integration with out own infrastructure - stats collection, some RPC specific stuff). As SaveTheRbtz mentioned we encountered some issues with non-RFC clients, corner cases which were not exposed when envoy is used in "trusted" environment, etc., but all our fixes are now in upstream, so next migrations will be way easier both for us and for other envoy users.

I did not quite get how they configure envoy? Did they write their own control plane? Use ambassador/Istio/Gloo?

We have a mix of static and dynamic configuration. We started with almost everything defined in the configuration and implemented our control plane only for endpoint discovery service. Over the time we implemented more and more features there (certificates, tls tickets, route and vhost configuration, etc). We decided to write own implementation on control plane - actually the core part is pretty simple and easily expandable.

We have built our own control plane in golang tightly integrated with an existing infrastructure (service discovery, secrets/certificates management, configs delivery, feature gating, and so on).

Did you consider using commercial nginx? If so, what made you decide against it?

Sadly, it would probably be as hard to maintain as an opensource version. We really want to have access to the code to make sure we can fix, troubleshoot it, understand it fast...

Things that may've help:

-- Configuration definition (e.g. protobufs.)

-- More focus on observability: error metrics (instead of logs), tracing, etc.

-- gRPC control plane.

-- C++ module development SDK.

-- (ideally) bazel.

Some dataplane features like gRPC JSON transcoding, gRPC-Web, and http/2 to backends.

Don't any of the major commercial open source vendors offer custom terms to give access to the commercial source? I'd imagine they'd contemplate it for big deals. Seems like one of the only ways to keep some of these sophisticated customers onboard.

open source argument is valid -- most software enterprise vendors do provide source code access (under NDA.) The rest of the arguments stand though: as it is right now, it way more developer/operator friendly to use Envoy in our production.

The price is really insane for Nginx commercial.

As an Enterprise software vendor myself, I can assure you: everything is negotiable at Dropbox’s scale including very deep discounts.

About $2000 per host.

How does Envoy compare to Caddy 2 ? https://caddyserver.com

To tell you the truth, we didn't consider it. From what I can get from the architecture docs[1], it can be a decent platform for apps, but might not be the best choice for a general purpose ingress/egress proxy (at least for now.)

[1] https://caddyserver.com/docs/architecture

It is a great choice for a general purpose proxy. (That's kind of the point.)

But they mentioned that they wanted to use C++ instead of go to get even that extra performance out.

I use Caddy a lot and it's perfectly fine for my scale, but at dropbox's scale, maybe go wouldn't be enough for the ingress part?

I just lament the increasing deployments of programs written in memory-unsafe languages to the edge, in general.

I am more curious what makes the author think Caddy "might not be the best choice for a general purpose ingress/egress proxy" (there were no other qualifications to that statement, but no evidence to support it either).

Yeah, to its credit, the article brought it up but then kinda hand waved away "envoy had many more security issues than nginx". Having a huge load of C library dependencies in a user-facing service seems like a bug these days.

Part of reducing dependencies in my own software was a conscious decision to minimize future CVE exposure.

> C++ instead of go to get even that extra performance out

Might as well use C then (with some hand-written asm sprinkled in where the compiler gets confused and doesn't see an obvious optimization) for that. And I'm not even being sarcastic here (I wish I was though)

Sensing a bit of a trend here. Didn't another major player recently make the same switch?

I think the best slice of who's migrating to Envoy can be observed via EnvoyCon talks[1][2]:

* Lyft (of course)

* Spotify

* Stripe

* Square

* eBay

* Yelp

* Pinterest

Plus the support from major cloud providers: Google, Microsoft, and Amazon.

[1] https://envoyconna18.sched.com/ [2] https://envoycon2019.sched.com/

They must all be GRPC users. Developers are pushing GRPC and protobuf pretty hard in companies. The next step down the road is to move to envoy as the load balancer. Otherwise these protocols don't work well over traditional HTTP infrastructure.

So, seems like nginx is fine until your company reaches the "we are worth billions now" scale?

nginx is never fine for load balancing, they put basic features like metrics behind the paid edition. It's not sane to operate in production.


I work for a billions company. Nginx is still fine. Youll need to be prepared to pay for better operations, management, visibility, and protocol support. You can either pay them or build it in house, but you will want to pay.

discord is also on that list - although we have not spoken much about it yet!

It may actually become a trend. For well known reasons:

- Community

- Nginx served us well for almost a decade. But it didn’t adapt to current development best-practices

- Operationally Nginx was quite expensive to maintain

- C++

- Observability and monitoring


I'd add another reason: so many people only use nginx as a reverse proxy, and the proxy configuration feels duct-taped on sometimes. Envoy being written as a proxy first makes it a better interface IMHO.

Is C++ generally considered to be "better"?

I've always looked at it (esp. with STL) as kind of a "Swiss-Army-Chainsaw" and you were going to shoot your eye out. Maybe that view is old and things are better - but I learned a while back that sending a young gun into a C++ application's code-base would lead to a world of pain)

Maybe that learning is no longer accurate? What do you think?

When we are comparing C, Lua (Nginx) and C++ (Envoy). Yes C++ is better :).

Honest question: In what way?

Platform wars are over-ish. We have the same compile targets. What they call "Undefined Behavior) is relegated to ... well ... platforms we are not supporting.

C is fast - simple - easy(ish) to learn, and easy to "fuzz" in testing.

I can't speak to LUA - but C++ looks like a mine-field (to me).

Why do you declare that C++ is "better"? (Seriously interested - I don't even know enough these days to have a debate. I just gave up on the C++ hell-hole years (decades?) ago, and maybe should have kept up)

There are many reasons which lead to a cleaner codebase, some of them are: RAII, smart pointers, constant types, reusable containers, standard algorithms library, cleaner way to define interfaces, etc.

Overall our experience is that C++ code is smaller, simpler to write/read, and has a smaller chance of mistakes than equivalent logic written in C.

Of course many of this points are relevant only for relatively modern C++ (c++11/c++14 or newer), before that the cost/benefit ratio was much less clear.

Fair enough. Thank you.

In my case - C (as a language) had a smaller footprint, and if the targets were limited, it was easier to learn, to lint and to code-inspect.

Admittedly, this was mostly before C++14. I guess this might be a case of "once bitten, twice shy".

Thank you.

I'm not declaring that C++ is better for everything. In our case it is better because it makes this part of the infrastructure more sustainable: there are more engineers who can code well in C++ vs C in our company and industry overall. Also it is easier to code in C++ as it is general purpose programming language with a lot of libraries available, open source projects, community around it etc.

Maybe we program in different verticals. I have not found it to be so. (I have still upvoted your comment).

I value your opinion.

Well, that is very gracious - esp. here.

I value yours.

You have different experiences than I do - so our conclusions will differ.

That said - I "feel" as if C++ is a dangerous serpent of a language. Maybe I need to spend 6 months re-acquainting myself in complex environments with more developers than just me, and re-evaluate that presumption on a medium-size project.

Thank you.

Your happiness is important to me.

I think what they're trying to say is that developing plugins for nginx in Lua is not great. It's not a rant about languages.

This has to do with nginx not having the required features (features blocked behind the paid edition or GRPC non existent), forcing to develop plugins in Lua to compensate (the only supported language) and Lua is too slow for this sort of stuff (processing Gbps of traffic in real time is no joke).

That makes sense. Thanks. (I still prefer C - but I am admittedly getting old and C++ sucked in the mid-90s or so)

EDIT: BTW -- I am not going to argue with LUA throughput. I'm still not sure what the thinking was there (maybe time-to-prototype?) - but C plugins run faster than Apache's do. By, like, a lot. (And I like Apache! ...Having used it since 1996)

It depends on the team. C++ can be better. It can also be worse.

It's a drop in the water compared to Nginx usage.

That is indeed true. But, I remember the time when we were rolling out nginx back in 2000's and exactly the same thing was said about Apache.

Not if the people switching is the cool crowd. Which is exactly what I think is happening here

HAProxy is pretty popular too.

Because most nginx usage is different?

Of course, you can serve static assets using Envoy, and maybe even connect a fascgi app without very much hassle. But it's quite a bit less straightforward.

Slack announced they were going to switch.


This is also good web server.Configuration is done in yaml. Also,it claims to be very fast.

A shame they picked nginx in the first place, it has all the stats and critical features behind the paid edition. HAProxy is always a better choice for load balancing.

Besides that, it looks like the move was significantly driven by GRPC and profobuf. No surprise here, GRPC really doesn't work well over HTTP. Once a company start using the google stack, they have to move to more of the google stack to make it usable.

Our technology stack is very gRPC friendly, so developer experience is actually better with it, than without (though this is very subjective.)

As for the middleboxes, using gRPC-WEB[1] allowed us to switch Desktop Client App to gRPC even behind firewalls/IDSes that do not speak HTTP/2 yet.

As for the HAProxy, Dropbox used to use (circa 2013) it specifically for loadbalancing, but we eventually replaced it with our Golang proxy. That said, recent HAProxy improvements (v2.0+) make it quite an awesome dataplane and an excellent loadbalancer!

[1] https://github.com/grpc/grpc-web

Thank you for the thorough comparison. Could anyone chip in whether a recent haprozy version would be a better choice than nginx and /or envoy in a similar case?

Can somebody speak to why dynamic upstreams included in a file paired with `sudo service nginx reload` for prod deploys stopped scaling?

Nginx configuration is bound to its workers. When you reload nginx new workers are created(and start responding to new connections) and the old ones are drained. The draining finishes when the last connection is finished or a timeout is reached. In OSS nginx every upstream change requires a configuration reload. If you have lots of upstream changes and don't want to terminate connections prematurely, this can quickly require lots of RAM as you have many workers. Stock nginx worker is around 150Mb, but issues with openresty integration(they mention lua usage) can bloat this to > 1GB.

It is easy enough for simple cases (and we used it for quite a while, until we moved to using Lua for that.) For more complex scenarios you will have new `server` blocks, certificates, tls tickets, log files / syslog endpoints, so the automation will end up interacting not with just a single dynamic upstream file but with rather large amount of system interfaces. Control-plane ends up being distributed between config generation, filesystem state, service configuration (e.g. syslog.)

On a more practical note, each nginx `reload` will double the number of workers, almost doubling memory consumption and significantly increasing CPU usage (need to re-establish all TCP connections, re-do TLS handshake, etc.) So there is only that many reloads that you can do in an hour.

nginx is not well suited for constantly reconfiguring your infrastructure on very hot servers. This is a problem when you expose such infrastructure configurations to users (think cloudflare), but otherwise you can just mitigate this problem by having a sane deployment strategy.

One thing nice about OpenResty (nginx) and their Lua support is that it plugs in at TLS negotiation. Does Envoy?

Can you describe your use-case?

If you are talking about the ability to select a certificate on the fly via `ssl_certificate_by_lua_block`[1] we are not aware of such functionality. If you are missing something, I would highly encourage you discuss it with the community on a github!

From Oleg Guba, Traffic Team TL, co-author, and person driving the deployment:

* ListenerFilters + NetworkFilters are flexible enough, that some of the custom logic could be just moved to the config.

From Ruslan Nigmatullin, our head Envoy developer:

If you are talking more about a custom verification code there is already couple of ways to do that:

* Client TLS auth Network Filter: https://www.envoyproxy.io/docs/envoy/latest/configuration/li...

* Alternatively, if you are writing C++ extension you can use Network::ReadFilter, Network::ConnectionCallbacks.

[1] https://github.com/openresty/lua-nginx-module#ssl_certificat... [2] https://github.com/openresty/lua-resty-core/blob/master/lib/...

Wordpress and others use this to load certain on the fly. When you are a multidomain host this matters a lot.

You don’t just load up a million cents as files and restart the server (though I do know a company that does something like this, but man, quite brittle).

Is Condoleezza Rice still working for Dropbox?

She’s on the board of directors: https://www.dropbox.com/about.

She has never worked for Dropbox.

Dropbox works for her :D

And finally we got nginx as legacy now, lul.

Seems like they could have switched to openresty instead and saved quite a lot of effort in their migration, but oh well, they probably just couldn't handle the 1-indexing /s

Seems like they were already using openresty. Having used openresty professionally, I appreciate that it provides ways to write code to solve a lot of the problems outlined in TFA, but solving the problems out of the box is significantly better.

Who does Dropbox compete with these days? They have pretty much the highest prices for the least amount of value. The only reason I see them mentioned here frequently is their connection with Y Combinator.

I noticed on social media a lot of negativity toward Dropbox and sometimes even on HN. The negative sentiment appears to come from tech circles who feel One Drive offers a better price point or iCloud works great for them, so Dropbox shouldn't exist.

Personally, I prefer Dropbox. I found problems with One Drive. Google Drive client was always hit and miss and I could not rely on it. iCloud is not cross platform (afaik). Dropbox has worked where ever I needed it.

Dropbox is more expensive but I prefer to have my files in Dropbox (as a separation of concerns) rather than have a single tech company control every aspect of my life.

My experience with the 'average Joe' is that Dropbox is easy and it works. Yes, they might save a couple dollars switching to OneDrive but Dropbox still offer a good product. Will Dropbox survive long term? I certainly hope so. I have no affiliation, aside from being a customer.

Insync solves a lot of Google Drive issues (not all! - the fundamental organization (and ability to search - at Google!) is horrible), Box.com is not bad for auditibility and observability, but One Drive keeps trying to suck you in (You think you are out! But you are not!),

For the extra buck-or-two per user per month - I just like the fact that it "just works" for most people and little tech support. (Although I do miss the RSS feed on events that they removed that helped me keep track of all of the "stuff" "the people" were doing with "all the files". I'm sure there was a reason - but that was actually the only feature that made me think that they and Box.com might be comparable in that area)

I don't use Google Drive per say, since I run Linux. I primarily use rclone and previously had issues with Dropbox throttling uploads as well. Currently I pay $12 a month and get unlimited storage with G Suite. In addition to all the other G Suite features, Dropbox doesn't offer anything close in terms of price or features.

I've admired Drew and the early Dropbox team for getting things done and shipped even when compiled python GUIs was edgy as the initial Rails version of Twitter was. But they shipped and validated the market. Now adding all those fancy and cool tech mentioned in the blog post will increase the complexity by a lot but it's not clear what are the real benefits. Does a decreased number of machines running really justify the migration and addition of complexity? Maybe they have some new products in the pipeline that built upon the new stack. Or they waste their time. We will see.

> Does a decreased number of machines running really justify the migration and addition of complexity?

Of course it does. I am not sure why you think it doesn't.

servers are a commodity and cheap compared to runtime complexity

This attitude gives us ever less eficicent software that endlessly gobbles up resources for no good reason.

Yes, but it's a lot of servers.

How many? 5000 servers? As a public company it should not really matter at that size. Developers are way more expensive.

effect of being vc funded. the engineers hired later on, only care about engineering problems also company nametag & money, not user level problems. you would think, dropbox would be concerned with providing best value for the buck but nah. plenty of other examples fb, google - google software quality sucks for a place that employs 100s of engineers.

There are lot of things they do that I'm not happy with, but it is still very valuable for the price point. No where else can you get unlimited storage that is fast for so cheap, and they are the only cloud provider that doesn't charge egress to those services.

With that move it is actually decreasing the complexity and improving manageability and sustainability.

If you change all tires if a running car, you are increasing complexity and risk. The blog post describes the fallacy/anti-pattern of starting from scratch. It's usually a people problem when people don't want to understand the Source of the current state and also don't want to fix it. A green field looks so much nicer than dirty plumbing. But it's wrong.

Another sign of this anti-pattern is the hyper focus on the green field solution and not to think about simpler solutions (better DevOps tooling, hire C++ developers to rewrite slower lua Code etc)

Google and Apple, to start.

They don't compete though. Google offers 17+GB free with email, office suite, unlimited photos, drive and Voice with a free number for unlimited calling and texting. Dropbox has much stricter bandwidth limits as well.

For $9.99 you get all that plus 2TB storage with Google One. Dropbox has a minimum for of 3 users for their business plan, but with 1 user on G Suite for $12/mo I get unlimited storage and all the Goodies I mentioned before.

I'm learning to believe that there are various sub-sects of the HN crowd. The people who would rather pay $3/month to host a slow-ass VPS, and those where $10/month delta, as long as their work-flow is optimized is a "win".

In fact, I think that drives a lot of these "what is better" debates in threads here. Some people go "Google Drive is better, because I get 2TB/month for a flat-fee that bundles the other services" (I do that plan too - I just subscribe to everything - to match the client's work flow. Where it really sucks is that I commercially subscribe to 3 (4?) commercial video conferencing systems).

I am not going to choose to save $5 when it stands in the way of me making $100. I find that thinking impoverishing, and time-wasting, and frankly stupid.

Even if I have to pay a designer $1000 to remake my slides after the content is settled for a client trying to pay me $15K, so that they can raise $2.5M, I'll gladly pay it! That doesn't seem to be the mindset here? (Or maybe I'm just coming across the people that shill for Vultr over Digital Ocean (or God-forbid - AWS!) - instead of focused on velocity of earnings. Maybe I just come across the wrong posts)

But, that's me - and I suspect I am not the majority here.

I think the real split is between 'actually making money using these tools' (you) and 'personal projects but want to make money' (lots of others) (discounting the 'not in the USA' here, which is, for sure also an issue - $10 in USA for a dev is nothing, but might be a huge deal for a dev in another country).

When you cost your business hundreds of $k/yr, who cares about $5-100? Thats less than the cost of the free coffee and snacks!

> $10 in USA for a dev is nothing, but might be a huge deal for a dev in another country

Not to mention that, USD 10 is a variable amount of money. Today, USD 10 is 25% more expensive than it was last year (and a couple of months ago, it was almost 50% more expensive than it was last year). And there's also the hassle of paying in a foreign currency (and not having common payment methods like boleto bancário available), and the annoying tendency of transactions from another country being blocked as suspicious by the credit card provider.

I am with you, also feel that HN has kind of turned into Slashdot of the yearly days.

On a website that started to discuss business ideas and how to turn them into profitable endeavours, it is a bit surreal that every time someone brings commercial products out, one gets endless posts about free beer alternatives and how such software is doing a disservice to the community.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact