Hacker News new | past | comments | ask | show | jobs | submit login
“Lambda and serverless is one of the worst forms of proprietary lock-in” (2017) (theregister.co.uk)
746 points by peter_d_sherman on Feb 4, 2019 | hide | past | favorite | 395 comments

I have worked with database code that was meant to only work with the database it was running on, and database code that was meant to be agnostic to what database you used. I always thought that the costs of the second were underappreciated relative to their benefits. And unless you're actively maintaining running on more than one database (as in, shipping a product where your users have more than one database) you tend to miss all the implicit ways you come to depend on the implementation you're on -- yes, the syntax may be the same across databases, but performance impacts are different, and so you tend to optimize based on the performance of the database you're on.

I suspect the same is true for cloud. Real portability has real costs, and if you aren't incurring all of them up front and validating that you're doing the right things to make it work, then incurring part of them up front is probably just a form of premature optimization. At the end of the day, all else being equal, it's easier to port a smaller codebase to new dependencies than a larger one, and attempting to be platform-agnostic tends to result in more code as you have to write a lot of code that your platform would otherwise provide you.

It’s not just portability that’s an issue with lambda. It’s also churn.

Running on Lambda, one day you’ll get an email saying that we’re deprecating node version x.x so be sure to upgrade your app by June 27th when we pull the plug. Now you have to pull the team back together and make a bunch of changes to an old, working, app just to meet some 3rd party’s arbitrary timeframe.

If you’re running node x.x on your own backend, you can choose to simply keep doing so for as long as you want, regardless of what version the cool kids are using these days.

That’s the issue I find myself up against more often when relying on Other People’s Infrastructure.

It's not about using what the cool kids use these days. I can't stress enough that unmaintained software should not run in production.

This way you have a good argument towards management and if you do it regularly or even plan it in ahead of time it's usually not much work.

During a product planning meeting: "Dear manager, for the next weeks/sprint the team needs X days to upgrade the software to version x.x.x otherwise it will stop working"

I guess we have different philosophies then. My take is that software in production should not require maintenance to remain in production.

Imagine a world where you didn't need to spend a whole week every year, per project, just keeping your existing software alive. Imagine not having to put off development of the stuff you want to build to accommodate technical debt introduced by 3rd parties.

That's the reality in Windows-land, at least. And I seem to remember it being like that in the past on the Unix side too.

Your vision is only workable for software for which there are no security concerns. This might improve to the extent industry slowly moves away from utterly irresponsible technologies like memory-unsafe languages and brain damaged parsing and templating approaches and more or less the whole web stack. I wouldn't hold my breath though. And even software that's not cavalierly insecure will have security flaws, albeit at a lower rate.

Keep in mind that you're arguing against an existence disproof. The Microsoft stack, for example, is a pretty big target for attack, and has seen its share of security issues over the years.

But developers don't need to make any code changes or redeploy anything to mitigate those security issues. It all happens through patches on the server, 99% of which happen automatically via windows update.

Yes, Microsoft is good at backward compatibility.

So many open source hackers do not know the basic tecniques for backwards compatibility (e.g. don't reaname a function, just intoduce a new one, leaving the old available).

I'm spending very significant efforts maintaining an OpenSSL wrapper because OpenSSL constantly remove / rename functions. I hoped to branch based on version number, but they even changed the name of the function which returns version number.

And that's only one example, lot of people do such mistakes costing huge efforts from users.

And this popular semantic version myth, that you just need to update major version number when you chane the API incompatibly to save your clients from trouble.

> So many open source hackers do not know the basic tecniques for backwards compatibility (e.g. don't reaname a function, just intoduce a new one, leaving the old available).

I'd dispute this, or at least I think this doesn't capture the whole picture. Microsoft makes money with backwards compatibility and can afford to spend significant effort on to the ever-growing burden of remaining backwards-compatible indefinitely. Open source volunteers are working with much more limited resources and I think that it comes down much more to intentional tradeoffs between ease of maintenance and maintaining backwards compatibility.

If you have a low single-digit number of long-term contributors, maybe the biggest priority to keep your project moving at all is to avoid scaring off new contributors or burning out old contributors, and that might require making frequent breaking changes to get rid of unnecessary complexity asap. Characterizing that as "they don't know that you can just introduce a new function" doesn't seem like it yields instructive insights.

Yes, this is exactly the wrong reply I often hear when complaining about backwards compatibility.

The mistake here is that in 99% of cases backwards compatibility costs noting - no efforts, no complexity.

Of two equally costing choices the people breaking backwards compatibility just make a wrong choice.

> maybe the biggest priority to keep your project moving at all

When you rename function SSLeay to OpenSSL_version_num, where are you moving? What does it give to your project?

Ok, if you like the new name so much, what prevents you from keeping the old symbol available?

        unsigned long (*SSLeay)(void) = OpenSSL_version_num
(Sorry for naming OpenSSL here, it's just one of many examples)

When developers do such things, they break other open source libraries, which in turn break other. It's a huge destructive effect on the ecosystem. It will take many man-days of work for the dependent systems to recover. And it may take years for the maintainers to find those free days to spend on recovery, and some projects will never recover (e.g. no active maintainer).

With a lift of a finger you can save humanity from significant pain and efforts. If you decided to spend your efforts on open source, keeping backwards compatibility by making the right choice in a trivial situation will make you contribution an order of magnitude bigger, efficient.

So, I believe people don't know what they are doing when they introduce breaking changes.

I saw developers introducing breaking changes, then finding projects depending on them and submitting patches. So they really have good intentions and spend more their volunteer open source energy than necessary. And when the other project can not review and merge their patch (no maintainers) they get disappointed.

So please, just keep the old function name. It will be cheaper for you and for everyone.

An unmaintained duplicate way of doing things is a mistake waiting to happen.

I was just thinking this, but I guess were really just talking API changes. Everything under the api can still get rewritten, no?

Microsoft makes money with backwards compatibility

That's a good way of putting it, and it gets to a key difference between open source and proprietary software.

In the open source world where a million eyes make all bugs shallow, developer hours are thought of as free. So if you change something it's no big deal because all the developers using your thing can simply change their code to accommodate it. It doesn't matter how many devs or how many hours, since the total cost all works out to zero.

In the proprietary world, devs value their time in dollars. The reason they're using your thing is because it's saving them time. They paid good money because that's what your thing does. Save time. Get them shipped. As a vendor, you're smart enough to realize that if you introduce a change that stops saving your customers time or, worse, costs them time or, god forbid, un-ships their product, they'll do their own mental math and drop you for somebody who understands what they're selling.

In the end, all we're talking about here is the end product of this disconnect in mindset.

Microsoft also isn't your average developer that imports libraries from strangers.

Ever time I run an audit (which is monthly) I see at least a dozen conversations in NPM packages we use. Sure, some of them don't apply to our usage, and others can't really impact is, but occasionally there is one we should be concerned about.

We server admins can push buttons to upgrade, but that doesn't mean developer code will keep working.

Many developers live in this world were they think server admins will protect their app... But we're more likely to break things by forcing your neglected package upgrades

> Keep in mind that you're arguing against an existence disproof. The Microsoft stack, for example, is a pretty big target for attack, and has seen its share of security issues over the years. > But developers don't need to make any code changes or redeploy anything to mitigate those security issues.

I don't believe it. Most security issues are not just an implementation issue in the framework but an API that is fundamentally insecure and cannot be used safely. Most likely those developers' programs are rife with security issues that will never be fixed.

Which is fine, as long as they are understood and mitigated against. If your security policy consists entirely of "keep software up to date", you don't have a security policy.

In practice trying to "understand and mitigate against" vulnerabilities inherent in older APIs is likely to be more costly and less effective than keeping software up to date.

If there is a problem in an older API, it's probably time to update. That's understanding and mitigation.

The discussion is about the difference between updates when there's a valid reason and updates that are imposed by cloud providers, nobody advocates sticking with old software versions.

> nobody advocates sticking with old software versions.

In my experience that's what any policy that doesn't include staying up to date actually boils down to in practice. Auditing old versions is never going to be a priority for anyone, and any reason not to upgrade today is an even better reason not to upgrade tomorrow, so "understanding and mitigation" tends to actually become "leave it alone and hope it doesn't break".

In practice you don't mitigate against specific vulnerabilities at all, you mitigate against the very concept of a vulnerability. It would be foolish to assume that any given piece of software is free from vulnerabilities just because it is up to date, so you ask yourself "what if this is compromised?" and work from the premise that it can and will be.

Sounds clever, but what does it actually translate to in practice? And does it work?

Let's say I have a firewall. If we assume someone can compromise the firewall, what does that mean for us? Can we detect that kind of activity? What additional barriers can we put between someone with that access and other things we care about? What kind of information can they gather from that foothold? Can we make that information less useful? etc.

You think about these things in layers. If X, then Y, and if Y, then Z, and if X, Y, and Z do we just accept that some problems are more expensive than they're worth or get some kind of insurance?

I've found that kind of approach to be low security in practice, because it means you don't have a clear "security boundary". So the firewall is porous but that's considered ok because our applications are probably secure, and the applications have security holes but that's considered ok because the firewall is probably secure, and actually it turns out nothing is secure and everyone thought it was someone else's responsibility.

I think you're projecting. The whole point is reminding yourself that your firewall probably isn't as secure as you think it is, just like everything else in your network. This practice doesn't mean ignoring the simple things, it just means thinking about security holistically, and more importantly: in the context of actually getting crap done. Regardless, anyone who thinks keeping their stuff up to date is some kind of panacea is a fool.

Personal attacks are for those who know they've lost the argument.

Keeping stuff up to date is exactly the kind of "simple thing" that no amount of sophistry will replace; in practice it has a better cost/benefit ratio than any amount of "thinking holistically". Those who only their things up to date and do nothing else may be foolish, but those who don't keep their things up to date are even more foolish.

> But developers don't need to make any code changes or redeploy anything to mitigate those security issues

Right, so all deployed Active X based software magically became both secure and continued working as before after everyone installed the latest Windows patches?

The trivial patching only works for security issues due to implementation not design defects. If you have a design defect, your choice is typically either breaking working apps or usage patterns or breaking your users security. Microsoft has done both (e.g. Active X blocking, vs continued availability of CSV injection) and both have negatively affected millions.

... because it is maintained?

There are no changes needed on application code side.

What definition of maintained are you using?

If they're doing security patches and bug fixes it's a maintained codebase.

We're using the definition a few notches upthread: "Dear manager, for the next weeks/sprint the team needs X days to upgrade the software to version x.x.x otherwise it will stop working"

As opposed to:

2011: deploy website, turn on windows update

2011-2019: lead life as normal

2019: website is up and running, serving webpages, and not part of a botnet.

That's reality today, and if it helps to refer to it as "maintained", that's fine. The point is that it's preferable to the alternative.

I think that the parent commenter is referencing node 4.3 being past EOL and being unmaintained software and therefore unfit for prod, unlike the ms stack which is receiving patches

node, not .net

I was referring to comments that MS is good at backwards compatibility and “if you write application, it will run forever” and I pointed out that MS also breaks backward compatibility what regards languages.

Installing Security patches for a ruby stack takes a full code coverage test suite, days of planning and even more to update code for breaking changes.

Installing security patches for a Microsoft stack requires turning on windows update.

There's a BIG difference. Once you write your msft stack app, is done. Microsoft apps written decades ago still work today with no code changes.

That's not true. Try running anything with VisualFoxPro. There are tons of programs that ran on XP and 7 and don't on 10.

What if the new node version fix an bug / issue / CVE that doesn't concern the software ?

Is it resonable to postpone the upgrade for later ?

Example : the software uses python requests. A new version fixes CVE-2018-18074 about Authorization header, but you don't use this header, for sure. Is it resonable to upgrade a little bit later ?

Depends on how mature is your security team/process. Can you spend time tracking separate announced bugs and make case by case decision for each cve? How much would you trust that review? Do you review dependencies which may trigger the same issue?

Or is it going to take less time/effort to upgrade each time?

Or is the code so trivial you can immediately make the decision to skip that patch?

There's no perfect answer - you have to decide what's reasonable for your teams.

The cool thing about serverless infrastructure is that it does not really concern you. As long as you are on a maintained version of the underlying platform your provider will take care of the updates.

If your software runs on a unmaintained platform there won't be any security fixes and that's why amazon forces you to upgrade at some point.

AWS, at least, didn't make any promises for updates for serverless Lambda that I can see in their docs.

Right, because it's not relevant to you. You don't care about the underlying infrastructure in terms of security, amazon does that for you.

Security wise you should of course be taking patches, however those patches should not be breaking functionality.

You are looking to save yourself a week of time a year and then 3 years later for some reason or another you will HAVE to upgrade and good luck making that change when the world has moved past you.

Your describing traditional sysadmin vs devops. Devops means repeating the stress points so that they are no longer stressful and automated as much as possible. I like it way better then the classic, "don't touch this, it's working and the last guy that knew how to fix it is gone.

you don't need maintenance to remain in production, you need maintenance to reduce the tech debt in the infrastructure you decided to use (code, frameworks, third party libraries, security issues). Even just vanilla languages get upgraded every X months/years etc. Not maintaining the code is just a bad gift you are giving to your (or someone's) future. I have been in upgrades from perfectly working software written in an older (almost 4) version of java that was needed to add new features and it took a hell of a time and I have never seen it working at the end. I don't think it's a safe choice to "let it be" when it comes to software.

BSD still loves you long time

Imagine a world where new exploits and hacks didn't come along every day and compromise the systems your app sits on because you didn't keep up with patches and upgrades...

That also describes my (very small in scope) PHP and Javascript things. They all still work, and I love that to bits. Admittedly, the price of that probably is keeping it simple, but if I needed to update it all the time just to keep it from not sinking under its own weight or the ground shifting beneath it, that would be no fun for me.

I completely agree with this, starting a new position and coming into infra running 5 year old software is not fun, generally neglected, and full of deprecated features/code that is improved in later versions. Not to mention the security risk running old software can often create.

Isn't that always the case though? I mean, outside of e.g. lambdas or other platforms / runtimes as a service?

I mean a few years ago there was a huge security flaw in Apache Struts, whose impact was big because it had been used in a lot of older applications - meaning a LOT of people had to be summoned to work on old codebases to fix this issue.

The problem isn't a changing runtime - even if you self-host it you should make sure to keep that updated regularly.

if you self-host it you should make sure to keep that updated regularly.

Mild disagreement: My philosophy is that you should choose technologies that aren't likely to introduce breaking changes in the future.

As an example, I have sites that were built using ASP.NET version 1.1 that have survived to this day with nothing more than Windows Update on the server and the occasional version bump in the project config when adding a feature that needed the latest and greatest.

Compare that to the poor soul who decided to build on top of React when it first came out, and has been rewarded by getting to rewrite his entire application four times in as many years.

To return to the point, rather than rewriting around breaking changes from Node x.x to Node y.a, I'd be shopping around for the LTS version of Node x that I could keep the thing running on without intervention from my team.

> As an example, I have sites that were built using ASP.NET version 1.1 that have survived to this day with nothing more than Windows Update

You are right; but in my experience those ASP applications also had security holes (CSRF etc) that were never patched. They ultimately either became botnets or faded away when the corp simply faded away.

A business that can't afford to pay for cleaning up its business applications is likely to be unable to pay for general upkeep as well. It is simply past the point of being a viable business and is either in limbo or in the grave!

See "maintenance free" approach to software as canary in the coal-mine and run away as fast as possible.

That's not the reality I know. I have apps written and compiled on windows xp that still work to this day.

If you work on any non-msft stack I know of, you're constantly updating code for any, sometimes even minor version upgrade.

The reality I know is that the people who wrote their apps on Windows XP left the company some years ago, but the apps live on: perfectly functioning, but unable to make requests via anything more secure than TLS1.0 and forcing other people to run servers that continue to accept TLS1.0 years after it was deprecated by everyone else.


(That's the PCI announcement in 2015 that despite everyone knowing about problems with TLS1.0, they would continue to allow it through 2018 because of all the companies who deferred their technical debt in the manner you seem to be advocating.)

That happened to me. All I had to do is patch windows server. I didn't have to change or even recompile my code.

I know is hard to believe it's that simple, but it is.

So all you had to do was:

1. Know what to do.

2. Have approval to do it.

3. Do it.

... which is to say, maintenance. The fact that maintenance is simple and/or easy doesn't mean it happens by itself.

> The fact that maintenance is simple and/or easy doesn't mean it happens by itself.

Yes, there will always be _some_ maintenance. The point is it should be as simple and easy as possible.

I'm telling you that the entire infrastructure of the world is held back by companies who don't do simple maintenance, and your response is to tell me it should be easy to do maintenance.

Someone isn't getting the point, and I don't think I can make it any clearer.

Or is your point that we're held back by people who don't do simple maintenance, and trying to make maintenance simpler, while it might help, won't solve 100% of the problem.

Are you saying it's better if maintenance requires a lot of work, to encourage people to do more maintenance?

If we made things harder to upgrade would that encourage more people to upgrade?

Since late last year, you can use old versions if you want to. [0] The provider doesn't enforce the runtime anymore. But I don't think it's the provider issue in the first place. At some point "node x.x" will be EOL and you won't get an email. You'll just stop getting maintenance patches.

[0] https://aws.amazon.com/blogs/aws/new-for-aws-lambda-use-any-...

Good point, but no. If you have architected your apps properly, decommissioning or sunsetting services or individual components should already have been designed and planned.

I know, I know... It's almost never the case.

You really just don't have to think like this on the msft stack. I'm so glad I chose msft ASP 20 years ago, instead of php, or RoR or python, or node or any of the myriad other stacks that have come and gone since.

Do you ever need to update your servers to a newer version like say 2016 or now 2019? There are definitely issues on using say old VB6 libraries when you need to upgrade your severs from 2008 to 2019. Not to mention using those old technologies if you do have a new feature or change you end up with an unmaintainable mess. I am MSFT stack programmer, but to claim there are no issues and you just need to patch a server is flat out wrong.

They didn’t pull the plug. You can’t create or update new lambdas with older versions of Node, but if you have existing code, it won’t just stop working.

I have written more than a few applications on lambda for several years, and I have gotten this email exactly once, for one function.

The issue you have isn't lambda, it's using an *aaS that someone else is hosting.

> If you’re running node x.x on your own backend, you can choose to simply keep doing so for as long as you want, regardless of what version the cool kids are using these days.

What do you mean by your own backend? Your own physical hardware, your own rented hardware, your own EC2 box, your own Fargate container?

What if you get an office fire, a hardware failure, a network outage, a required security update, your third party company goes out of business, etc.?

There's no such thing as code that doesn't need to be maintained. Lambdas (and competitors) probably require the least maintenance of the lot.

It’s not the quantity of maintenance that’s at issue. It’s the lack of ability to schedule that maintenance.

GP is talking about unplanned maintenance, which is a huge problem in many industries, like air travel (any transportation, really), or software.

> If you’re running node x.x on your own backend, you can choose to simply keep doing so for as long as you want,

Until you get audited, and that raises a flag, and now you have to deal with it.

Not a problem. Ill tell TypeScript to output code matching the es version for that version of node.

Sure its a problem. It is still extra effort, no matter if your stack makes it easy or not. It still has to be done.

That's just laziness on your part though.

Using an old version of Node is just going to leave you with worse performance and potentially security holes.

Given the nature of lambdas specifically

-What would the security issue be on outdated lambda code?

-Wouldn’t the performance of the code would be equal to the performance of when you first deployed.

The unifying lesson I've been appreciating after reaching my 30s is that in order to make good decisions, an individual needs to have a very clear understanding of how something works. If you know how something works, then you can foresee problems. You can explain the correct use cases for a particular tool. You know the downsides just as well as the upsides for a particular choice.

In software development, a lot of the new stuff that is supposed to solve our problems just shifts those problems somewhere else -- at best. At worst, it hides the problems or amplifies them. This isn't unique to software development, but it seems to be particularly pervasive because so much is invisible at the onset of use.

Best advice, be very suspicious when someone can't explain what the bad parts are.

There are also a terrible cost of running just on one database, the dependency on just one provider.

In business you should never depend on just one provider. This provider could be the best in the world, now, because it has amazing management. But people die, or are hit by a bus, or get crazy after a divorce or just retire and they get replaced by the antitheses.

In my personal experience, real portability has great benefits on their own. Design is easier to understand and you constantly find bugs thanks to running your software in different architectures, compilers and so on.

At the end of the day, it is the quality of your team. If your team is bad, your software will be bad and vice versa.


Funny thing is, the fact alone that you create a provider-agnostic system can give you enough performance or cost penalties that you want to change providers in the first place.

You can't use provider specific optimizations, so the provider seems bad and you don't wanna use it.

The big difference with something like PL/SQL is that it’s a proprietary language, whereas lambda and the other faas options are based on open languages. Makes portability somewhat easier to achieve.

For sure; the external interface of a lambda is trivial, that is, a single entrypoint function with an 'event'. It's relatively easy to create a simple wrapper around that to make it either provider agnostic or self-hosted.

I haven't used lambda but it sounds remarkably similar to a Django function.

In Django urls point to a function. Your request matches that url, that function gets run.

Lambda must have some similar sort of mapping of urls to functions, so what exactly are you saving with it? Ok, Django includes an ORM, but if you are using any sort of persistence you will need a database layer as well.

Can someone explain what all the fuss is about or what I am missing?

If you're dealing purely with web requests, then yeah, API Gateway + Lambda sounds pretty similar to a Django function. But having used both, it's a lot easier and faster to setup API GW and Lambda than it is a Django app.

And if you're not dealing with purely web requests, then they're very different. Most of my lambdas trigger off of Kinesis, SNS, and SQS events. Work gets sent to these queues/notification endpoints, and then the Lambda function does work based off the data received there, and scales to handle the amount of data automatically.

Good points, but it remains to be seen whether the portability of large serverless apps is sufficient to prevent the providers from squeezing their customers once they are deeply established.

The fact that there’s three major cloud providers (who will at the very least compete on customer acquisition) is a point in favor of this, but it’s definitely an experiment. In my mind, this is as much about negotiating power as it is about technical tradeoffs.

Well, there is a cost to be portable and the cost to adapt an app that's not portable.

Whether it's DB independence or cloud provider independence, from my experience it's cheaper to pay the cost of the migration when you know you want to migrate (even if it involves rewriting some parts) rather than paying it everyday by writing portable code.

Most of the time the portable code you write becomes obsolete before you want to port it.

> it's cheaper to pay the cost of the migration when you know you want to migrate

Agreed, on principle and based on experience. This does require you have reasonable abstractions in place as otherwise you’ll end up refactoring before you can migrate. But that’s a good thing in any case.

This is not true.

For example:

Using this template.


I’ve been able to deploy the same code as a regular Node/Express app and a lambda with no code changes just by changing my CI/CD Pipeline slightly.

You can do the same with any supported language.

With all of the AWS services we depend on, our APIs are the easiest to transition.

And despite the dreams of techies more than likely after awhile, you aren’t going to change your underlying infrastructure.

You are always locked into your infrastructure choices.

You're only thinking about the _input_. Technically, yes, I can host an express app on lambda just like I could by other means, but the problem is that it can't really _do_ anything. Unless you're performing a larger job or something you probably need to read/write data from somewhere and connecting to a normal database is too slow for most use-cases.

Connecting to AWS managed services (s3, kinesis, dynamodb, sns) don't have this overhead so you can actually perform some task that involves reading/writing data.

Lambda is basically just glue code to connect AWS services together. It's not a general purpose platform. Think "IFTTT for AWS"

We have been connecting to mongo db from without lambda for the past year and sure you don't get single digit latency but r/w data happens under 30ms in most cases, we even use paramastore to pull all secrets and it's still without that time frame.

Where you connecting to Mongo from?

You can run your Lambda function within the same network as your other servers. It just appears as a new IP address inside your network and can call whatever you permit it to.

our MongoDB is running on mlabs.

So but you mention you're connecting without lambda - so where are you connecting from?

he probably meant 'within' instead of 'without'. myself to use aws-serverless-express and connect my lambda hosted in us-east-virginia to a mlab (now Mongo Atlas) mongo database hosted in the same us-east-virginia amazon region

The API I just deployed using this connects to a regular old Aurora MySQL instance using the standard MySQL driver.

It’s a standard CRUD REST API used by our website.

The only thing slow is infamous cold start time when running within a VPC because it has to create an ENI.

AWS pinky promised they were going to fix this soon.

But that is the whole point of using cloud services that are tightly integrated with each other. I can not do it as efficiently as Amazon myself can not be called "propriety lock-in".

Said efficiencies are not due to Amazon, just that the services are colocated in the same facility.

If I put the node service and a database on the same box I'd get the same performance, and actually probably better since Amazon would still have them on separate physical hardware.

It’s not about performance, then I have to support all of that infrastructure myself.

The infrastructure, or interfaces is where the lock-in comes in. Each non-portable interface adds another binding, so it's not as easy as swapping out the provider as the OP pointed out, once you've been absorbed into the ecosystem of non-portable interfaces. You have to abstract each service out to be able to swap out providers.

If you use open source interfaces, or even proprietary interfaces that are portable, it's easier to take your app with you to the new hosting provider.

The non-portable interfaces are crux of the matter. If you could run lambda on Google, Azure or your own metal, folks wouldn't feel so locked-in.

As I said. I can run the Node/Express lambda anywhere without changing code.

But, I could still take advantage of hosted Aurora (MySQL/Postgres), DocumentDB (Mongo), ElasticCache (Memcached/Redis) or even Redshift (Postgres compatible interface) without any of the dreaded “lock-in”.

It sounds like you have a preference for choosing portable interfaces when it comes to storage. And you've abstracted out the non-portable lambda interface.

My position isn't don't use AWS as a hosting provider, it's that you ought to avoid being locked into a proprietary non-portable interface when possible.

Not really. My company has plenty of business risks. Out of the those, a dependency on AWS is the least of them.

Vendor lock in isn't really a problem initially. It's something that creeps up on you over time.

Over time, we will have an “exit strategy” that makes it “someone else’s problem” and then we will be well enough capitalized to migrate if needed.

Or the Twitter model - very bad architecture that always crashed, find “product market fit” and then get funding to fix any issues.

Or the company goes out of business, I put X years of AWS experience on my resume and make out like a bandit as an overpriced consultant.

I don’t see the downside....

The downside could be going for a new round and not getting that valuation because projected costs prevent scaling.

I don't really see cloud-provider competition lessening or hardware getting more expensive and less efficient or the VMs getting worse at micro-slicing in the next 5 years. So why would I be worried about rising costs?

I think spending one of the newly-raised millions over a year or so can help there, including hiring senior engineers talented enough to fix the shitty architecture that got you to product-market-fit. This isn’t an inherently bad thing, it just makes certain business strategies incompatible with certain engineering strategies. Luckily for startups, most intermediate engineers can get you to PMF if you keep them from employing too much abstraction.

Isn’t employing too many abstractions just what many here are advocating - putting a layer of abstraction over the SDKs abstractions of the API? I would much rather come into a code base that just uses Python + Boto3 (AWS’s Python SDK) than one that uses Python + “SQSManager”/“S3Manager” + Boto3.

That is indeed what many here are advocating. There are only so many possible interfaces or implementations, and usually abstracting over one or the other is an effort in reinventing the wheel, or the saw, or the water clock, and not doing the job as well as some standard parts glued together until quite far into the endeavor.

Stop scare-quoting "lock-in". Lock-in means development effort to get out of a system, regardless of how trivial you think it is.

If writing code to be able to move to a different cloud isn't considered lock-in, then nothing is since anyone can write code to do anything themselves.

Lock in is an economic concept, it’s not just about code but about “switching costs”. Ecosystem benefits, data gravity etc all come into play.

There are two kinds of lock-in: high cost because no competitor does as a good a job - this is good lock-in, and trying to avoid this just means you’re not building the thing optimally in the first place.

There is also high switching cost because of unique interface and implementation requirmenrs that don’t add any value over a more interopable standard. This is the kind that’s worth avoiding if you can.

I'm talking about his statement:

"Connecting to AWS managed services (s3, kinesis, dynamodb, sns) don't have this overhead so you can actually perform some task that involves reading/writing data."

That is due to network and colocation efficiencies. The overhead of managing such services yourself is another matter.

Not just the network overhead, the maintenance and setup overhead. I can spin up an entire full stack in multiple accounts just by creating a CloudFormation template.

I’ve done stress testing by spinning up and tearing down multiple VMs played with different size databases, autoscaled read replicas for performance. Ran a spot fleet, etc.

When you need things now you don’t have time to requisition hardware and get it sent to your colo.

As far as spinning up and down, a lot of this is solved with docker, while also being relatively platform independent.

So Docker allows me to scale up MySQL Read replicas instantaneously? And I still have to manage infrastructure.

Well, you can use a container service or use EC2 still.

And then you still have more stuff to manage now based on the slim chance that one day years down the road you might rip your entire multi Az redundant infrastructure, your databases, etc with all of the read replicas to another provider....

And this doesn’t count all of the third party hosted services.

Aurora (Mysql) redundantly writes your data to six different storage devices across multiple availability zones. The read replicas read from the same disks. As soon as you bring up a read replica, the data is already there. You can’t do that with a standard Mysql read replica.

OK. So you connect to Postgres on RDS - cloud agnostic.

You connect to S3, and:

a) You can build an abstraction service if you care about vendor lock-in so much

b) It has an API that plenty of open source projects are compatible with (I believe Google's storage is compatible as well)

Maybe you use something like SQS or SNS. Bummer, those are gonna "lock you in". But I've personally migrated between queueing solutions before and it shouldn't be a big deal to do so.

It's really easy to avoid lockin, lambda really doesn't make it any harder than EC2 at all.

As long as you write your own wrappers to the SDKs changing cloud providers is definitely doable. We started full AWS stack with Lambda but have now been slowly refactoring our way into more cloud-provider agnostic direction. It's definitely not an existential threat level lock-in. Serverless technology is only starting out still and I'm pretty sure 5 years from now Lambda won't be the go-to platform anyway. Plus honestly we've learned so much from the first big project on Lambda that writing the next one with all of that in mind will be pretty great (and agnostic).

I don't believe that writing wrappers is particularly important, though I think that anyone who uses SQS is likely to build an abstraction over it at some point (as with all lower level communication protocols, at some point you build a "client library" that's more specific).

As I said, at least in the cases of your database and your storage, being cloud-agnostic is trivial. Managed postgres is easy to migrate from, S3 shouldn't be hard to migrate from either.

Certainly lambda doesn't impact this too much.

> Serverless technology is only starting out still and I'm pretty sure 5 years from now Lambda won't be the go-to platform anyway. Plus honestly we've learned so much from the first big project on Lambda that writing the next one with all of that in mind will be pretty great (and agnostic).

I realize it isn't entirely on-topic, but could you elaborate? I'm curious to hear more about your opinion on this, I'm not sure what the future of Serverless is.

And that goes back to developers using the repository pattern because one day the CTO might decide that they want to get rid of their 6-7 figure Oracle installation and migrate to Postgres. There is a lot more to migrating infrastructure at scale than writing a few facades.

Heck, consultants get paid lots of money just to do a lift and shift and migrate a bunch of VMWare images from on prem to AWS.

a) You can build an abstraction service if you care about vendor lock-in so much ... It's really easy to avoid lockin, lambda really doesn't make it any harder than EC2 at all.

Yes, you can build an abstraction layer. And maintain it. And hope that you don't get feature divergence underneath it.

That's really, really expensive.

I don't see how you could have missed (b).

I don't see why you think I have.

Have you ever asked the business folks or your investors did they care about your “levels of abstraction”? What looks better on your review? I created a facade over our messaging system or I implemented this feature that brought in revenue/increased customer retention/got us closer to the upper right quadrant of Gartner’s magic square?

Why should they care, or even be in the loop for such a decision? You don’t ask your real estate agent on advice for fixing you electrical system I guess?

Of course your business folks care whether you are spending time adding business value and helping them make money.

I’ve had to explain to a CTO before why I had my team spending time on a CI/CD pipeline. Even now that I have a CTO whose idea of “writing requirements” is throwing together a Python proof of concept script and playing with Athena (writing Sql against a large CSV file stored in S3), I still better be able to articulate business value for any technological tangents I am going on.

Sure. Agree totally, maybe I misread your previous comment a bit. What I meant is that run-of-the-mill business folks do not necessarily know how business value is created in terms of code and architecture.

I don't know of any business where they wouldn't be involved. Not in the "Let's talk directly about implementation details" way, but in the "Driving product development and roadmap" and "ascertaining value to our customers" way.

Any time spent on work that doesn't directly create value for customers is work that the business should be weighing in on. I'm not saying that you should never spend any time doing anything else - but these are trade offs that the product manager should be involved in, and one of their primary jobs is being able to weight the technical and business realities and figuring out where resources should be going.

Both. It's why larger companies have infrastructure teams.

I'm not sure I see your point. What is it you think I'm advocating for?

My only point is that vendor lock-in is not a significant issue on AWS, and that it requires virtually no effort to avoid it.

> and that it requires virtually no effort to avoid it

Of course it requires effort. A lot of effort, not to mention headcount. The entire value of cloud-managed services is what it saves you vs. the trade-off's, and it's disingenuous to pretend that's not the case.

Sorry, I don't agree, and I feel like I provided evidence why in my first post. To summarize, choosing services like Postgres and S3 doesn't lock you in. SQS and SNS might, but I think it's an exaggerated cost, and that has nothing to do with Lambdas (EC2 instances are just as likely to use SQS or SNS - moreso, given that SQS wasn't supported for Lambdas until recently).

There are tradeoffs, of course. Cost at scale is the really big one - at some point it's cheaper to bring ops/ hardware in-house.

I just don't agree that lock-in is a huge issue, and I really disagree with the idea that lambdas make lock-in harder.

There's a big difference between AWS RDS and self-managed. Huge difference.

- DBA's & DevOps

- Procurement management & spare parts

- Colocation w/multihoming

- Leasing agreements

- Paying for power usage

- Disaster recovery plan

- CapEx & depreciation

- Uncomfortable meetings with my CFO explaining why things are expensive

- Hardware failure

- Scaling up/out

Not even worth going on because the point is obvious. Going "all in" reduces cost and allows more time to be focused on revenue-generating work. The "migration" boogeyman is just that, something we tell other programmers to scare them around the campfire. You're going to be hard-pressed finding horror stories of companies in "cloud lock-in" that isn't a consultant trying to sell you something.

> at some point it's cheaper to bring ops/ hardware in-house.

It depends. It's not always scale issue, and with all things it starts with a model and collaboration with your finance team.

Well, what scale would that be? Larger than Netflix?

While I could probably answer that, I don't think it's relevant to my central point - that lock-in is not as big of a deal as it's portrayed as, and that lambdas do not make the problem considerably worse.

The vast majority of Netflix's traffic is video and it's video is not served by Amazon.

Using a company that bypasses Amazon for 99.999% of its traffic isn't exactly an Amazon success story.

That’s an incredibly ignorant and misleading statement. It’s sort of like saying a database isn’t valuable because 99.999% of requests hit the cache, and not the disk.

Everything was built on Amazon and video is largely hosted on S3. Yes, there’s a large CDN in the mix too. That doesn’t take away from the achievement.

Well, what do you think Netflix is doing to be AWS’s largest customer? Have you seen any of their presentations on YouTube from AWS reinvent? Where do you think they encode the videos? Handle sign ins, etc?

No. At the scale of Netflix 7 years ago:


That’s just the CDN. Netflix is still by far AWS’s biggest customer and its compute is still on AWS. I don’t think most companies are going to be setting up colos at ISPs around the world.

Our Lambda deployments handle REST API Gateway calls, SQS events, and Step functions. Basically the entire middleware of a classic 3-tier stack.

Except for some proprietary light AWS proxy code, the bulk of the Lambdas delegate to pre-existing Java POJO classes.

The cold start issues and VPC configuration were a painful learning curve, but nothing I would consider proprietary to AWS. Those are universal deployment tasks.

> Unless you're performing a larger job or something you probably need to read/write data from somewhere and connecting to a normal database is too slow for most use-cases.

This is false. I've seen entire Lambda APIs backed by MySQL on large, consumer-facing apps and websites. As another poster pointed out, the cold-start-in-a-VPC is a major PITA, but it can (mostly) be worked around.

And there is always DynamoDB where you aren’t in a VPC and Serverless Aurora where you don’t have to worry about the typical database connections and you can use the http based Data APIs.

How is the Aurora Serverless Data API now? On preview release it was a bit sketchy: horrible latencies (pretty much ruining any benefit you could get from avoiding the in-VPC cold start) and a dangerous sql-query-as-a-string API (no prepared statements or placeholders for query params that would get automatically escaped IIRC).

Unfortunately, we require the ability to load/unload directly from S3 and Aurora serverless doesn’t support that. We haven’t been able to do anymore than a POC.

Dynamo is really the hardest lock-in in the ecosystem for me. Serverless Aurora is still easy to kill with too much concurrency/bad connection management compared to Dynamo

That’s theoretically where the Data API comes in.

VPC cold start times are the bane of my existence. I do hope they deliver the super-fast cold starts they promised this year

Can you explain this more? Is this if your lambdas haven’t been hit in a while, in the background aws will scale things down?

Or is it when deploying new code?

When lambdas haven't been hit for 15mins the first hit after has a noticeably longer start time. It's due to deprovisioning/reprovisioning underlying resources like network interfaces. Some people do crazy stuff like a scheduled task to hit their own service to combat this so AWS promised to solve it.

Even if you invoke your lambda function to warm it up in anticipation of traffic, you'll still hit cold starts if the lambda needs to scale out; the new machines are exposed to inputs "cold." Those crazy patterns trying to warm the lambda up are really crazy if you think about it because no one is using them is really aware of the underlying process(es) involved.

"Why are you throwing rocks at that machine?"

"It makes it respond more quickly to actual client requests. Sometimes."


"Well, most the time."

"Why's that? What's causing the initial latency?"

"Cold starts."

"Yeah, but what's that mean?"

"The machine is booting or caching runtime data or, you know, warming up and serverless. Anyway, don't think about it too much, just trust me on this rock thing. Speaking of which, I got to get back to bastardizing our core logic. Eddie had great results with his new exponential backup rock toss routine and I'm thinking of combining that with a graphql random parameter generator library that Ted said just went alpha this afternoon."

Exactly - this blog post I've always thought was a great overview: https://hackernoon.com/im-afraid-you-re-thinking-about-aws-l...

yo can I get a link to that gql random parameter generator library?

If initial startup time - and long periods of non usage - is an issue, wouldn't a permanently running VPC for the initial load be a better solution?

In addition to what the sibling reply said. There is also the issue of your choice of runtimes. Java has the slowest startup time and Go the fastest. C# is faster than Java but still slower than Python and Node.

Clever uses of dependency injection without reflection (think dagger not spring) and reducing code size as much as possible can give you fairly fast cold start times with a Java execution environment. Bloated classpaths and reflection filled code will destroy your initialization time.

And by "…Go the fastest." I assume you mean compiled languages.

Until last year at ReInvent, you could only use one of five (?) supported languages. Go was the only compiled language available.

I think it probably isn't purely the use of lambda/serverless but the creep of other things that make it more difficult to leave. Stuff like cognito or SNS or other services. Once you start using AWS, each individual choice in isolation looks cheaper and easier to just use something AWS provides. Once you start utilizing 7+ different AWS services it becomes very expensive to transition to another platform.

Also, this conversation is very different if you are an early stage company racing to market as opposed to an established organization where P&L often takes precedence over R&D.

At my previous place I saw this in reverse. Over the previous years we had invested in building up our own email infrastructure.

Because so much had been invested (sunk cost fallacy), no-one could really get their heads around a shift to SES, even though it would have been a slam dunk in improved reliability and developer productivity.

Whereas if we were on, say, Mailgun, and then someone wanted a shift to SES, that debate could probably have been a more rational one.

I just point this out to say that investing in your own infrastructure can be a very real form of lock-in itself.

You’d be surprised how much you can get on AWS by getting rid of one or two of your infrastructure folks....

Other way around, you'd be surprised at how much infrastructure you can buy by forgoing AWS's offerings (for large work scales.)

For small companies, you may not be able to afford infrastructure people, and moving fast makes way more sense. There's little point in paying for an ops person when you have very little infrastructure.

At a certain scale though, AWS stops being cost effective. You begin to have room in your budget for ops people, you get room to afford datacenter costs, and you can start paying for a cloud architect to fill out internal or hybrid cloud offerings using openshift or openstack.

It's all about the right tool for the right job.

Netflix (AWS) and Twitter (GCP) seem to have different opinions.

I know for a fact GE is moving a lot of their workload to AWS.

>Netflix (AWS) and Twitter (GCP) seem to have different opinions.

Yeah, Netflix's opinion is to use Amazon as little as possible. Their critical infrastructure (the CDN) is not anywhere near the slimy grip of AWS.

>Yeah, Netflix's opinion is to use Amazon as little as possible.

This is simply untrue. Everything but their CDN uses AWS.

>Their critical infrastructure (the CDN) is not anywhere near the slimy grip of AWS.

The streaming website and app aren't critical infrastructure? Databases containing all of their business and customer details aren't critical infrastructure? Encoding content so it can be delivered by the CDN isn't critical infrastructure?

That's like saying I don't trust Ford because I buy Michelin tires while I drive a Fusion.

The CDN is in the ISPs data center. You can’t get much lower latency than that. But if Netflix is AWS’s largest customer, I doubt they are using it as little as possible.

Netflix does presentations every year at ReInvent about how they use AWS and they have a ton of open source tooling they wrote specifically tied to AWS.

This is the same model of buying things on amazon too -- once you've bought once, it's easier to buy again. Why spend time going to another shop when you can just use amazon to buy multiple things.

This ease of use philospohy goes way back to the one-click patent. If I want DNS, why wouldn't I go to amazon, which has all my finance details, and a decent interface (and even API), rather than choosing another DNS provider, setting up my credit card, and having to maintain an account with them. So I choose DNS via AWS. Then I want a VPS, but why go to linode and have the overhead of another account when I could do lightsail instead?

And then I can use Amazon Certificate Manager, create a certificate attach it to your load balancer and cloud front distribution and never have to worry about expiring certificates - and they are free.

At that point it seems a lot more about cloud services in general and not merely "lambda and serverless".

Isn't that exactly the point of AWS?

If you use provider agnostic solutions, things get expensive quick.

Stuff like SNS and Cognito is much cheaper in terms of TCO.

If you don't use them you can switch providers more easily, but if you use them you wouldn't need to switch providers.

You could always deploy your own FAAS cluster. https://www.openfaas.com/ https://openwhisk.apache.org/ and others.

And then I have something else to support. Kind of the whole point of using AWS not to have to support infrastructure unnecessarily.

On top of that. I lose the “easy button” of depending on our AWS business support contract if something gets wonky.

You don't necessarily have to host this yourself. With this, there's a relatively straightforward (but obviously not necessarily easy) way for Joe Entrepreneur to set up a hosted service to compete with AWS Lambda, thus helping the community avoid vendor lock-in.

So can Joe Entrepreneur also host my databases? my storage? My CDN around the world? My ElasticSearch cluster? My Redis cluster? All of my objects? Can he provide 24 hour support? Can he duplicate my data center for my developers in India so they don’t have lag? What about my identity provider? My Active Directory?

Can he do all that across multiple geographically dispersed redundant data centers?

I think you're missing the point - you may need to move to your own infrastructure for security, privacy, regulatory or accountability reasons. You may encounter new needs which AWS may no longer meet, cheap bandwidth for example (AWS bandwidth is both way overpriced and on some rather poor networks). Or Amazon may decide that the price of all their serivces is now going up by a factor of 5 and if you have your attitude, well, you're stuck paying for it.

Having relatively easy to spin up alternatives is a great thing. I can run my application entirely on a local kubernetes cluster or one on Amazon, DigitalOcean or Google's cloud services. That sort of flexibility is excellent and has allowed us to scale into situations where we otherwise couldn't have affordably done so (being able to buy some bandwidth from Joe Entrepreneur has it's benefits sometimes).

I think you're missing the point - you may need to move to your own infrastructure for security, privacy, regulatory or accountability reasons.

Which compliance regulation require you not to use a cloud provider? At most they may require you to not share a server with another company - that can be done with a drop down - or the data has to be hosted locally - again that can be done by selecting a region.

>Which compliance regulation require you not to use a cloud provider?

The policies that say not to use a company controlled by the US government. Or the ones that say under no circumstances should the data be sent over the Internet to a third party "because OPs are hard".

Which regulation or policy? Which certification? Name names. It’s not any of the financial, legal, or health care compliance regulations that I’m aware of.

In short most of German laws make it incredibly risky (but not forbidden) to use any american company for any kind of data that can be resolved to the underlying person. (Eg. a lot of companies got their warning shot when "safe harbor" exploded, if the same happens with https://www.privacyshield.gov/welcome a world of shit awaits)

That said “transferring data” not “American ownership”. If it is required that your data doesn’t leave the country, just use a region in the EU.

Yeah I can see explaining to our business customers who grill us about the reliability of our infrastructure that we host our infrastructure on Digital Ocean....

You've just offered the best example of vendor lock-in on this entire thread.

You realize the databases I’m referring to are hosted versions of MySql/Postgres, the ElasticSearch cluster is just that standard ElasticSearch, and Redis is just that - Redis and you can setup a CDN anywhere?

Even if you chose to use AWS’s OLAP database,Redshift, it uses standard Postgres drivers to connect. You could move it to a standard Postgres installation. You wouldn’t get the performance characteristics of course.

If you don’t want to be “locked in” to DynamoDB, there is always the just announced Mongo compatible DocumentDB. Of course ADFS is used everywhere.

Why in the world would I want to manage a colo with all of those services myself and still not have any geographic redundancy - let alone any place near our outsourced developers?

Maybe not, but Sundar Google or Satya Microsoft or Joe Rackspace can make their best offer.

No, but if your use case/business model doesn't require every single one of those things they can do it a lot cheaper.

None of my list is esoteric - a website with a caching layer, a database, a CDN, a search index, and a user pool that can survive a data center outage and some place to store assets.

You can’t (truly) outsource accountability.

> And despite the dreams of techies more than likely after awhile, you aren’t going to change your underlying infrastructure.

sorry -

isn't the _entire context_ of this article (and your response, experience, etc) 'implications of changing an underlying infrastructure' ?

yes, this isn't something one does on a whim, but it does happen, as your own post suggests.

> And despite the dreams of techies more than likely after awhile, you aren’t going to change your underlying infrastructure.

Yeah, nowadays it's done by dissolving the company or being bought out, then shut down.

Still, it's the same thing. Use of inadequate infrastructure being, let's say, terminated.

Seeing that as long as Y combinator has existed, only one company has ever gone public, the rest are still either humming along unprofitably, have been acquired, or gone out of business, the chances of most businesses having to worry about their largest business risk being an over reliance on AWS seems slim.

aws lambda is probably has the least vendor lock in than any of their offerings.

If it's triggered by api gateway, then maybe. But with S3 or kinesis or especially dynamo there's a good bit of lock in.

What do you consider the lock in when using RDS? S3? EC2?

It’s easy enough to separate your event sources from your event destinations. Anything that can trigger lambda can also trigger SNS message that can call an API endpoint.

If I did want to move from AWS that’s the first thing I would do. Put an API end point in front of my business logic, change the event from triggering lambda to an SNS message and move my API off of AWS. Then slowly migrate everything else.

That's kind of a lot

All infrastructure migrations are “kind of a lot”.

People aren’t using the same frameworks and infrastructure they used 10 years ago.

Does this assumption hold for large, production-ready, applications?

That is a production ready API.

The cool kids don’t do large apps, we do microservices. (Said ironically I’m in my mid 40s - far from a kid.)

In most cases, very few companies have products that need to scale to extreme load day 1 or even year 1. IMO, instead of reaching for the latest shiny cloud product, try building initially with traditional databases, load balancing, and caching first. You can actually go very far on unsexy old stuff. Overall, this approach will make migration easier in the cloud and you can always evolve parts of your stack based on your actual needs later. Justify switching out to proprietary products like lambdas, etc once your system actually requires it and then weigh your options carefully. Everyone jumping on the bandwagon these days needs to realize: a LOT of huge systems are still rocking PHP and MySQL and chasing new cloud products is a never ending process.

In my case switching to an AWS API Gateway + Lambda stack means I have zero-downtime, canary deployments that take less than 5 minutes to deploy from version control. Api Gateway is configured using the same swagger doc that autogenerates my request/response models (go-swagger) and (almost) completely removes routing, request logging, throttling and authentication concerns from the request handlers. Combined with a staticly hosted front-end and SNS/SQS+lambda pub-sub for out-of-process workers I never have to worry about auto-scaling, load-balancing or cache-invalidation and we only pay for what we use. It may not suit every use case, but in our case, we have bursty, relatively low-volume traffic and the hosting bill for our public-facing site/service that comprises most of the main business revenue is the same as a rounding error on our BI service bill.

How do you deploy in less than 5 minutes? Our serverless deployments are running into 20 minutes now, and it’s becoming a bit unfunny.

We use golang lambdas, binaries are built in our CI pipeline. Build stage takes ~10 seconds, tests (integration + unit) take ~30 seconds. We use AWS SAM for generating our CFN templates, we package and deploy using the AWS Cloudformation CLI and this takes the remaining 3-4 minutes.

I didn't include post-deployment end-to-end tests in the 5 minute figure, but technically speaking, we do deploy that quickly

How does it take 20 minutes? Do you deploy everything at once? Most of our APIs take a few minutes (depending on what kind of a deploy it is).

We have a fair number of endpoint, so due to the CloudFormation 200 resource limit per stack, we end up creating about 10 different stacks that frankenstein themselves onto a main API gateway stack.

Obviously not using cloudfront...might be the slowest service I’ve ever seen to deploy

Try deploying changes to google cloud loadbalancers. Updated within a few seconds, but changes will take seversl minutes to be applied. The first time i was scratching my head why my changes don‘t work as expected...

This (pay for what you use, so many fewer scalability issues) is so big it, by itself, can give you a competitive advantage against anyone who isn't doing this, which is almost everyone.

Maybe I'm overstating it, but I don't think I am...

I think you're overstating it. Why do people care so much about scalability issues, anyway? Given that (a) a stateless server plus an SQLite instance is much, much easier to set up than the proprietary, poorly documented mess that is Lambda, (b) that server can easily be horizontally scaled, and the SQLite instance can be swapped out with any other SQL database with some effort, and (c) a single server with SQLite will easily handle up to 100K connections, it doesn't seem like scaling was ever an issue for most websites.

I don't think you've used the tools that can work with lambda or had to actually scale something in production, based on your response...

It's all a lot harder than you make it out to be, but at least with lambda (and something like Zappa) you don't have to figure anything out beyond how you get your first environment up. There's just no second step, and that's huge.

Really? Lambda is poorly documented mess

Using a scripting language like Python or Node it literally is adding one function that takes in a JSON event and a context object as your entry point.

Serverless is also easier to develop for.

With Google Firebase Functions I was able to start writing REST APIs in minutes.

Compare that to setting up a VM somewhere, getting a domain name + certs + express setup + deployment scripts, and then handling login credentials for all of the above.

I had never done any of that (eventually I grew until I had to), so serverless let me get up and running really quickly.

Now I prefer my own express instance, since deployment is much faster and debugging is much easier. But even for the debugging scenario, expecting everyone who wants to Just Write Code to get the horrid mess of JS stuff up and running in order to debug, ugh.

(If it wasn't for HTTPS, Firebase's function emulator would be fine for debugging, as it is, a few nice solutions exist anyway.)

But, to be clear, on day 1 the option for me to write a JS rest endpoint was:

1. Follow a 5-10 minute tutorials on setting up Firebase Functions.


1. Pick a VM host (Digital Ocean rocks) and setup an account

2. Learn how to provision a VM

3. Get a domain

4. Get domain over to my host

5. SSH into machine as root, setup non-root accounts with needed permissions

6. Setup certbot

7. Learn how to setup an Express server

8. Setup an nginx reverse proxy to get HTTPS working on my Express server

9. Write deployment scripts (ok SCP) to copy my working code over to my machine

10. Setup PM2 to watch for script changes

11. Start writing code!

(12. Keep track, in a secure fashion, of all the credentials I just created for the above steps!)

I am experienced in a lot of things, and thankfully I had some experience messing around with VMs and setting up my own servers before, but despite what everyone on HN may think, not every dev in the world also wants to run a bunch of VMs and manage their setup/configuration just to write a few REST endpoints!

So yeah, instead I can type 'firebase deploy' in a folder that has some JS functions exported in an index.js file and a minute later out pops some HTTPS URLs.

If you don't want to learn DevOps why not use a PaaS like Heroku? That way when you want to learn DevOps, you can move your application without rewriting large swathes of it.

It's funny but when I learned to code basically all ISPs provided you with free hosting and a database, and you just needed to drag and drop a PHP file to make it live. It's like we have gone backwards not just in terms of openness but also in terms of complexity.

The last time I had done server side dev, yeah, it was all PHP and FTP drag and drop a file over.

I was a bit shocked at how asinine things had gotten.

This seems a bit exaggerated. You definitely don't need to start with certbot or even with a domain name. Why can't you just start with a regular server on localhost, which can be set up in way less than ten minutes, and publish it only when you're ready? Not to mention that Firebase Functions is way less beginner-friendly than Express.

Mobile app development requires an HTTPS connection.

Every React Native tutorial out there has a section on setting up user auth with Firebase, and then putting a few REST endpoints up.

It is simple enough that a beginner "never touched mobile or web" development tutorial can go through it in under an hour.

Firebase is incredibly simple to get started with.

Another solution is to use one of the HTTPS port forwarding services that takes a localhost server and gives it a public HTTPS endpoint, but that is more work to explain than

    firebase deploy
so the tutorial authors go with Firebase. Auth being super simple is icing on top.

you should check next.js and now deloyment, even faster/easier setup for JS stuff

I'm generally a proponent of DIY, this doesn't make sense. When an org needs to scale, it makes sense to cultivate skill in the underlying infrastructure. On day 1, serverless makes a lot of sense because it encourages development patterns that eventually will scale nicely.

Placing stateless web servers on simple VPS instances with load balancers in front doesn't scale nicely? Or perhaps I'm misunderstanding the parent.

Serverless does force you to think for higher concurrency than bigger server instances which is worth something.

And while we spend all of our energy and money doing that, we aren’t actually creating the product that is suppose to be bringing us revenue....

This is the correct answer. AFAIC, Intro to Economics should be the first CS course any software engineer takes.

I have mixed feelings about this and I don't have enough experience to label on method better than the other.

Lambdas basically require zero maintenance. SQS requires zero maintenance. EC2 load balancer is zero maintenance. And the setup is trivial too and there's no migration time down the line.If you start off with native cloud for everything you can keep your maintenance and setup costs down drastically.

However, a lot can be done with the old school unsexy tech.

So I'm mixed.

Zero maintenance? What about standing up multiple environments for staging, prod? Sharing secrets ,keys, and env vars? Deployments? Logging? No migration time down the line? I'm pretty sure GCP, Azure don't have Lambda, SQS, or EC2 load balancers so you absolutely will have migration time if you have to retool your implementation to switch cloud providers or products.

You make really good points about that. The classic things do migrate providers the easiest of all.

I've just found in my experience maintaining a web server or a database server, keeping security in mind, upgrades, scaling, etc. is alot more work than simply spinning up RDS and a Lambda with API gateway. Or even hosting static sites on s3 or Netlify.

Like I said I don't know enough to say one is better.

Also, is your name Jason?

I think we may have worked together in the past. Lol

Honestly, I've developed the last few projects in Serverless framework and deployed to AWS Lambda. My biggest project is a custom Python application monitoring solution that has 4 separate Lambda function/handlers, populates an RDS PostgresSQL database with monitoring data (runs in a VPC), then reads from that database using complex joins across multiple application metric tables to send time-series data to Cloudwatch metrics.

Then, I configured Cloudwatch alarms to have thresholds for certain metrics that send a web hook to PagerDuty.

The benefit is that my monitoring system never goes down, never needs to be patched manually (AWS even patches my Postgres database, and fails over to the warm standby during patching), and never needs any system administration.

Have you ever worked at a company where you had a serious outage that you didn't detect quickly enough because a monitoring system was down? Having a Serverless monitoring system means this has happened 0 times despite our app running in production for almost a year now.

> In most cases, very few companies have products that need to scale to extreme load day 1 or even year 1.

That wouldn't be a great reason to choose serverless indeed. However, that doesn't mean serverless isn't still the right choice.

We've tried both the traditional approach you describe and serverless, and from experience the latter is 10x less infrastructure code than the former (we compared both Terraform implementations).

If serverless fits your use case, saving time and effort is a very good reason to go for it IMHO.

Of all the AWS features to criticism for lock-in, Lambda seems like the weakest choice.

You don't have to write much code to implement a lambda handler's boilerplate, and that boilerplate is at the uppermost or outermost layer of your code. You could turn most libraries or applications into lambda functions by writing one class or one method.

A lambda's zip distribution is not proprietary and is easy to implement in any build tool.

I'd include the triggers as part of that analysis, like being able to invoke a function every time something is pushed to an S3 bucket for example. Just being able to run arbitrary functions without caring about the OS is the core product, but the true value is that you can tie that into innumerable other services that are so helpfully provided.

Basically, AWS has so much damn stuff under their belt now, and it all integrates so nicely, every time they add a new feature it lifts up all the other features as a matter of course.

I tend to agree, although with most things I think it depends on how heavily invested you are. Migrating a handful of mission-critical Lambdas is no biggie, but if you've really bought the bait and implemented your entire web services architecture on AWS API Gateway and Lambda -- for some reason -- you've got a much tougher job untangling yourself. Perhaps it suffices to say it's worth keeping an eye on how much friction you're building for yourself as you go.


Vendor lock-in is a thing, but Lambda are lower on the rungs than other things.

It's the hyper-specific things that imply heavier lock-in, especially those that bleed into other systems.

"I'm scared of vendor lock-in, so I'm going to build something that's completely provider agnostic" means you're buying optionality, and paying for it with feature velocity.

There are business reasons to go multi-cloud for a few workloads, but understand that you're going to lose time to market as a result. My best practice advice is to pick a vendor (I don't care which one) and go all-in.

And you'll forgive my skepticism around "go multi-cloud!" coming from a vendor who'll have precious little to sell me if I don't.

    Pick a vendor and go all in.
That sounds like the perspective of someone who's picked open source vendors most of the time, or has been spoiled by the ease of migrating Java, Node, or Go projects to other systems and architectures. Having worked at large enterprises and banks who went all in with, say, IBM, I have seen just how expensive true vendor lock-in can get.

Don't expect a vendor to always stay competitively priced, especially once they realize a) their old model is failing, and b) everybody on their old model is quite stuck.

I am incredulous that people wouldn't be worried about vendor lock-in when the valley already has a 900lb gorilla in the room (Oracle).

Ask anybody about Oracle features, they'll tell you for days about how their feature velocity and set is great. But then ask them how much their company has been absolutely rinsed over time and how the costs increase annually.

Oracle succeed by being only slightly cheaper than migrating your entire codebase. To offset this practice, keep your transition costs low.


Personal note: I'm currently experiencing this with microsoft; all cloud providers have an exorbitant premium when it comes to running Windows on their VMs, but obviously Azure is priced very well (in an attempt to entice you to their platform). Our software has been built over a long period of time by people who have been forced to run Windows at work -- so they built it on Windows.

Now we have a 30% operational overhead charged from microsoft through our cloud provider. But hey.. at least our cloud provider honours fsync().

I think perhaps not all vendor lock-in is created equal. I too shudder at the thought of walking into another Oracle like trap, but it's also an error in cognition to make the assumption that all vendors will lock you in to the same degree and in the same way.

I guess the part of us that is cautioning ourselves and others are aware of the pitfalls, but others also have valid points around going all in.

There is a matrix of different scenarios let's say.

  You can go all in on a vendor and get Oracled.
  You can go all in on an abstraction that lets you be vendor agnostic and lose some velocity while gaining flexibility.
  You can go for a vendor and perhaps it turns out that no terrible outcome results because of that. 
  You can go all in on vendor agnostic and have that be the death of the company.
  You can go all in on vendor agnostic and have that be the reason the company was able to dodge death.
Nobody can read the future and even "best practices" have a possibility of resulting in the worst outcomes. The only thing for it is to do your homework, decide what risks are acceptable to you, make your decision, take responsibility for it.

Vendors have 2 core requirements to continue operating: get new customers and keep the existing ones. Getting new customers requires constant innovation, marketing spend, providing value, etc. Keeping existing customers only requires making the pain of leaving greater than the pain of staying.

Sure. And from even from that you still can't infer what outcome will materialize. If you made the technically correct decision and your business went under because of it, that is still gonna hurt no matter which way you look at it. Hence the advice is do your homework, figure out which risks are acceptable to you, make your choice and take the responsibility. There is no magic bullet to picking the right option. Only picking the option you can live with because that's what you're going to have to do regardless of the outcome.

You might know all the theory on aviation and be a really experienced pilot and one day a sudden wind shear might still fuck you.

> all cloud providers have an exorbitant premium when it comes to running Windows on their VMs

Speaking from first hand running-a-cloud-platform experience, it's because running Windows on a cloud platform is not easy, and comes with licensing costs that have to be paid to Microsoft for each running instance (plus a bunch of infrastructure to support it). It's not even a per-instance-per-time-interval cost, there's all sorts of stuff wrapped up in it and impact the effective cost. It requires a bunch of administrative work and specific coding to try to optimise the costs to the cloud provider.

In addition, where Linux will happily transfer between different hardware configurations, you'll often have to have separate Windows images for each hardware configuration, so that means even more overhead on staffing both producing and supporting. So e.g. SR-IOV images, PV images, bare metal images (for each flavour of bare metal hardware), etc. While a bunch of this work can be fully automated, it's still not a trivial task, and producing that first image for a new hardware configuration can take a whole bunch of work, even where you'd think it would be trivial.

> Oracle succeed by being only slightly cheaper than migrating your entire codebase

Amdocs, ESRI, Microsoft too... Their commercial strategy is a finely tuned parasitic equilibrium.

Sales training that emphasizes knowing one's customer is all about that: if the salesperson understand the exit costs better than the customer, he is going to be milked well & truly !

I'm thinking that the sales side may benefit from hiring people with experience in the customer's business to game the technical options in actual study... I guess they do it already - I'm not experienced on the sales side.

I know whole bunch of people who complain about Oracle lock but fine with moving everything to DynamoDB/Lambda

>keep your transition costs low

For me this is key.

At least if you went with IBM and followed the old adage “no one ever got fired for buying IBM” in the 80s, you can still buy new supported hardware to run your old code on. If you went with their competitors, not so much.

You certainly can, but the prices haven’t decreased (and probably increased) from their 80s values, even though the thing is now probably a dinosaur.

Which of their competitors can you still buy new faster hardware from? Does anyone sell Stratus VOS or DEC VAX VMS compatible systems?

Yeah I know I am showing my age....

The people who designed their systems such that they could be easily transitioned off of IBM have done so long ago. Those systems now run exponentially cheaper and have access to more resources.

Vendor lock-in was just as much of a problem then as now.

Yes because people in the 80s writing COBOL were writing AbstractFactorySingletonBeans to create a facade to make their transitions to newer platforms easier....

> Does anyone sell... DEC VAX VMS compatible systems?


Someone should check if it can run OpenGenera.

Well, our next iSeries is a whole lot cheaper than our current iSeries and quite a bit more powerful. The Power 9 is not something I would call a dinosaur.

IBM's prices haven't changed at all since the 80s. They're still four times as expensive as they should be.

Sounds like you've described IBM being priced exactly right, given their tenacious longevity.

> That sounds like the perspective of someone who's picked open source vendors most of the time

More than that. They picked open source vendors that didn't (a) vanish in a puff of smoke, (b) get bought out by a company that slowly killed the open source version, or (c) who produced products that they were capable of supporting without the vendor (or capable of faking support for).

Vendor lock in can be expensive, but spreading yourself across vendors can also be expensive. There are lots of events that are free if you stay in the walled garden, but the second you starting pulling gigs & gigs of data from AWS S3 into GCP (or whatever), that can get pricey real fast.

In general I agree with you. In practice, the more practical approach may be to focus on making more money & not fussing too much w/ vendor costs & whatever strategy you choose to use. It's easy to pay a big vendor bill when there's a bigger pile of cash from sales.

IBM is awful on the way out. So is any firm bought by Oracle or CA.

That's terrible best-practice advice.

My best-practice advice is to do the math. What is the margin on infrastructure vs. the acquisition and management costs of the engineers necessary to operate the infrastructure.

Serverless doesn't scale very well in the axis of cost. At some point that's going to become an issue. If one has gone "all-in" on vendor lock-in then that vendor is going to spend as much time as possible enjoying those margins while the attempts to re-tool to something else is underway.

Best practice, generally speaking is to engineer for extensibility at some point, fairly early on.

Self hosting doesn't scale. There is very little reason for a good sysadmin to work for International Widgets Conglomerated when they can work for a cloud provider instead, building larger scale solutions more efficiently for higher pay. I'd rather buy an off the shelf service used by the rest of the F500 than roll my own. Successful companies outsource their non core competencies

Self-hosting doesn't scale? Have you done the math? If so, I'd be curious to see it.

There are a number of reasons that self-hosting doesn't make sense which have very little to do with scale and more to do with the lack of scale. For very little investment, one can get highly available compute, storage, networking, load-balancing, etc. from any of the major cloud providers. Want to make that geographically distributed? In your average cloud provider that's easy-peasy for little added cost.

Last time I had to ballpark such a thing, which is to say, what is the minimum possible deployment I'd be willing to support for a broad set of services, I settled on three 10kw cabinets in any given retail colo space with twenty-five servers per cabinet each consuming an average of 300W each. Those server were around $10k and were hypervisor class machines, i.e. lots of cores and memory for whatever time that was. Some switches, a couple routers, and 4xGigE transit links.

Of course I'd want three sets of that spread in regions of interest. If I were US focused, east coast, west coast and Chicago or thereabouts. All the servers and network gear come to around $1.5m CapEx. OpEx is $200/kw for the power and space and around $1/mbps for the transit. Note that outside the US, the price per kw can be much, much higher.

So, $6k MRC for the power and $4k MRC for the intertubes. $10k OpEx on top of ~$42k/month in depreciation ($1.5m/36) on your CapEx multiplied by three gives you $156k/month.

Lets assume my middle of the road hypervisor class machine has all the memory it needs and two 16 core processors with hyperthreading, so 64vCPU each or 14400 vCPU across your three data centers all for only around $2m/yr with nearly $5m of that up front.

That's a boat load of risk no startup or small enterprise is going to take on. You still have to staff for that and good luck finding folks that aren't asshats that can actually build it successfully. They're few and far between. That said, it does scale. It scales like hell, especially if you can manage to utilize that infrastructure effectively. I wager that if you were to look at what it would cost to hold down that much CPU and associated memory continuously in AWS then you'd be paying roughly 6x as much.


14400 vCPU of R4 for 3yr reserved, monthly is $300k MRC. I'm guessing you'd run ceph or rook on your bare metal and have ~8 1TB SSD per server, so 75 servers * 8 SSDs /3 (for replication) is 200TB with decent performance by three data centers for 600TB usable compared to EC2 GP2 at $.10/GB comes to roughly $60k MRC.

Less any network charges that's $360k vs. $156k self-hosted. Guess I'm wrong. It's only twice as much.

> Self hosting doesn't scale

That's just not true. There are plenty of companies that self-host highly scaling infrastructure. Twitter being just one of those companies. They've only recently started thinking about using the cloud, opted for Google, and that's only to run some of their analytics infrastructure.

> There is very little reason for a good sysadmin to work for International Widgets Conglomerated when they can work for a cloud provider instead, building larger scale solutions more efficiently for higher pay

That's not true either (speaking from personal experience working for companies of all sizes, from a couple dozen employees on up to and including AWS and Oracle on their cloud platforms). For one thing, sysadmin is far to broad a job role to make such sweeping statements.

A whole bunch of what I do as a systems engineer for cloud platforms is a whole world of difference from general sysadmin work, even ignoring that sysadmin is a very broad definition that covers everything from running exchange servers on-prem, to building private clouds or beyond.

These days I'm not sure I've even got the particular skills to easily drop back in to typical sysadmin work. Cloud platform syseng work requires a much more precise set of skills and competencies.

All that aside, I can point you in the direction of plenty of sysadmins who wouldn't work for major cloud providers for all the money in the world, either for moral or ethical reasons; or they're just not interested in that kind of problem; or even just that they don't want to deal with the frequently toxic burn-out environments that you hear about there.

> I'd rather buy an off the shelf service used by the rest of the F500 than roll my own.

No where near as much of the F500 workload is on the cloud as apparently you'd believe. It's a big market that isn't well tapped. Amazon and Azure have been picking up some of the work, but a lot of the F500 don't like the pay-as-you-go financial model. That plays sweet merry havoc with budgeting, for starters. It's one reason why Oracle introduced a pay model with Oracle Cloud Infrastructure that allows CIOs to set fixed IT expenditure budgets. Many of the F500 companies are only really in the last few years starting to talk about actually moving in to the cloud (when OCI launched at Oracle OpenWorld, there was a startling number of CIOs from a number of large and well known companies coming up to the sales team and saying "So.. what is this cloud thing, anyway?"

> Successful companies outsource their non core competencies

Yes.. and no. Successful companies outsource their non-core competencies where there is no value having them on-site. That's very different.

Honestly, I think Twitter is a counterpoint to your argument.

Twitter was founded in 2006, the same year AWS was launched, so in the early days Twitter didn't have a choice - the cloud wasn't yet a viable option to run a company.

And, if you remember in the early days, Twitter's scalability was absolutely atrocious - the "Fail Whale" was an emblem of Twitter's engineering headaches. Of course, through lots of blood, sweat and tears (and millions/billions of dollars) Twitter has been able to build a robust infrastructure, but I think a new company, or a company who wasn't absolutely expert-level when dealing with heavy load, would be crazy to try to roll their own at this point unless they wanted to be a cloud provider themselves.

> And, if you remember in the early days, Twitter's scalability was absolutely atrocious - the "Fail Whale" was an emblem of Twitter's engineering headaches.

That's because Twitter was:

1) A monolith

2) Written in Ruby

They started splitting components up in to specialised made-for-purpose components using Scala atop the JVM, and scaling ceased being a big issue. The problems they ran into couldn't be solved by horizontal scaling. There wasn't any service that AWS offers even today that would have helped with those engineering challenges.

It had more to do with the consistency model of relational databases, though Ruby definitely didn't help.

You honestly think that the main performance problem with a 2yo app is that it's a monolith?

Yes, based on their own detailed analysis and extensive technical blogs on the subject. Ruby was doing a lot of processing of the tweets contents etc. etc, and at the time Ruby was even worse for performance than it is today. (Ruby may be many things, but until the more recent JIT work, fast was not one of them).

> That's just not true. There are plenty of companies that self-host highly scaling infrastructure. Twitter being just one of those companies. They've only recently started thinking about using the cloud, opted for Google, and that's only to run some of their analytics infrastructure.

Twitter is both unusually big and unusually unprofitable. You're unlikely to be as big as them, and even if you were I wouldn't assume they've made the best decisions.

> All that aside, I can point you in the direction of plenty of sysadmins who wouldn't work for major cloud providers for all the money in the world, either for moral or ethical reasons; or they're just not interested in that kind of problem; or even just that they don't want to deal with the frequently toxic burn-out environments that you hear about there.

It's best to work somewhere you're appreciated (both financially and for job-satisfaction reasons), and it's harder to be appreciated in an organization where you're a cost center than one where you're part of the primary business. There are good and bad companies in every field, and good and bad departments in every big company, but the odds are more in your favour when you go into a company that does what you do.

It doesn't have to be that way, for a client I've recently set up a Gitlab Auto Devops on Google Kubernetes. Feature velocity wise it is nearly as painless as Heroku (which to me is the pinnacle of ease of deployment), but because every layer of the stack is open source we could switch providers at a flick of the wrist.

Of course, we won't switch providers, because they're offering great value right now.

I feel this vendor lock-in business is a phase that will pass. We were vendor locked when we paid for Unix or Windows servers, then we got Linux and BSD. Then we got vendor locked by platform providers like AWS and such, and now that grip is loosened by open source infrastructure stacks like Kubernetes.

> We were vendor locked when we paid for Unix or Windows servers, then we got Linux and BSD.

NT was released in -93, FreeBSD 2.0 (free of AT&T code) was released in -94. GNU/Linux also saw professional server use in mid/late 90's. People still lock themselves in though.

If a company is going from zero to something, then you are absolutely right. In that case, dealing with vendor lock in later means success!

If an established company is moving to the cloud, the equation is not as simple. The established company presumably has the money and time to make their vendor agnostic. Is the vendor lock in risk worthwhile to spend more now? How large is the risk? What are the benefits (using all of the AWS services is pretty nice)?

I have to ask, is this

>"benefits (using all of the AWS services is pretty nice)"

your personal experience? Or are you simply assuming that interop / efficiencies obtain when going all-in on AWS? I ask, because I've had multiple client conversations in which these presumed benefits fail to materialize to a degree that offsets the concern about lock-in.

I can't claim to have used all of the AWS services, but whenever I need something done I check if it's offered in AWS first. SQS,SNS,ECS,ALB/ELBs,SFNs,Lambdas, media encoding pipelines, Aurora RDS, etc... have all made my job easier.

If your time horizon is short (months to a couple of years) going all-in with a vendor can work quite well. Over longer time horizons (several years or more), it's often not so great. Forgetting about the costs involved (i.e. they know they've got you) and the fact that if that one vendor ever becomes truly dominant that improvements will slow to a crawl (i.e. they know they've got everyone), there is a very real risk of whatever technologies you are depending on disappearing as their business needs will always outweigh yours. Feature velocity tends to evaporate in the face of a forced overhaul or even rewrite.

Pricing changes are a real issue. Google maps spiked in cost and those who were using leaflet JS just had to change a few settings to switch to another provider. Those who built using googles map js are locked in.

Yeah...We had a fixed yearly price for Google Maps API Premium account - when they switched us to PAYG, our costs increased 8x...Spent 2 weeks of unplanned urgent work switching to Mapbox for 3x of original cost...

The problem with that is, there's only one thing, some product manager somewhere, that sits on a light switch of your company or project's success or failure. They could even unintentionally end you with a price change.

AWS supports everything...forever.

For instance, putting your EC2 instances in a VPC has been the preferred way of operating since 2009. But, if you have an account old enough, you can still create an EC2 instance outside of a VPC.

You can still use the same SQS API from 2006 as far as I know.

> AWS supports everything...forever.

Maybe "AWS and cloud infrastructure" will be to modern companies what COBOL and mainframes were to the big companies of 50 years ago.

No doubt somebody will be happy to charge you to support it for a long time...

They even still offer "reduced redundancy storage" even though it's been made obsolete (and is more expensive than the regular S3 storage).

And we see this here regularly, as with the Google Maps API price hike.

No you just see that with Google...a company not exactly known for its customer relations.

And Oracle, and IBM, and every company that doesn't pour every dollar of profit into growth marketing to continue redoubling down with investor money.

AWS drives most of Amazon’s profits these days. It isn’t running at a loss.

In the immortal words of @vgill, "You own your availability."

At first this seems reasonable from a 'technical debt' perspective. Building in vendor-agnosticism takes extra resources (true), that you could spend on getting features to market (true), you can always spend those extra resources later if you succeed and need them... sort of true, not really. Because the longer you go, the _more expensive_ it gets to switch, _and_ the more disruptive to your ongoing operations, eventually it's barely feasible.

Still, I don't know what the solution is. Increasingly it seems to me that building high-quality software is literally not feasible, we can't afford it, all we can afford is crap.

Gitlab.com did a switch from azure to GCP so it is realistic.

We had to migrate ~100 instances from Azure to GCP. It took us one month. At the end of the month we changed the dns entries and flipped the switch.

Its true tho that we never wanted to work with managed services, so there was literarly no need to redo any of the tooling.

There's a middle ground here. You can decide on a case by case basis how much lock-in you are willing to tolerate for a particular aspect of your system. You can also strategically design your system to minimize lock-in while still leveraging provider specific capabilities or "punting" on versatility when you want to.

In other words you can decide to not bother with worrying about lock-in when it costs too much.

This will make your code base easier to port to multi-cloud in the future if you should ever want to.

Obviously, there's a huge cost associated with the learning curve, but this the part of the reason that Kubernetes is so attractive. It abstracts away the underlying infrastructure, which is nice.

At any kind of scale, though, one is loosely coupled to the cloud provider in any case for things like persistent disk, identity management, etc.

Or the old “I use the Repository Pattern to abstract our database so sure we can move our six figure Oracle installation to Postgres”.

And then watch the CTO throw you out of his office.....

He didn’t want to hear? Why anyone would build on top of Oracle still eludes me.

Atlassian is bad, but Oracle is on a whole different level.

This, and if you really really don't want vendor lock-in, instead of inventing your own infra APIs, find the API the cloud vendor you chose exposes, replicate it, make sure it works, and then still use the vendor with the comfort of knowing you can spin your own infra if the cloud provider no longer meets your needs.

Or you don't bother replicating the API, even if you don't want vendor lock-in, because you realize that if a cloud provider evaporates, there will be a lot of other people in the same boat as you and surely there will be open-source + off-the-shelf solutions that pop-up immediately.

Agree that time to market shouldn't be impeded by any unnecessary engineering.

But pick a vendor and go all-in can work for netflix'y big companies or ones with static assets on cloud. All cloud providers have their own rough edges and if you get stuck in one you might be losing your business edge. Case in point - not going to name the provider since we are partners with them, we found a provision bug - custom windows image based vm took 15 minutes to get provisioned and also exporting custom image across regions has rigid size restrictions. The provider acknowledged the bugs but they are not going to address it in this quarter but if we are netflix big - may be they could have addressed it sooner.

We have automated the cluster deployment so we can get our clusters up and running in most major cloud providers. We are careful not to be tied to vendor lock-in as much as possible since business edge cannot be compromised based on this big cloud providers, who only heed to your cry only if you are big and they care none what so ever for your business impediment. When you are expecting cloud resources which aren't going to be static - you need flexibility so the above recommendation doesn't suit all.

I'll echo this sentiment, and while it may sound like a philosophical position it really is pragmatic experience: it is nearly impossible to realize a gain from proactively implementing a multi-vendor abstraction. I've found this to hold in databases, third party services like payments and email, and very much so in cloud services. I instead recommend using services and APIs as they were designed and as directly (i.e. with as little abstraction) as possible. Only when and if you decide to add or switch to another vendor would be the time to either add an abstraction layer or port your code. I've never seen a case where the cost to implement two direct integrations was significantly more than the the cost to implement an abstraction, and many cases where an abstraction was implemented but only one vendor was ever used.

I'll note that I have no objection to abstractions per se, especially in cases where a community solution exists, e.g. Python's sqlalchemy is good enough that I'd seldom recommend directly using a database driver, Node's nodemailer is in many cases easier to use than lower level clients, etc.

It very much depends on the nature of the services and the abstractions.

I'm currently working on a system that has several multi-vendor abstractions - for file storage across AWS, GCP, NFS and various other things; for message queues across Kafka, GCP Pubsub, and direct database storage; for basic database storage (more key-value style than anything else) across a range of systems; for deployment on VMs, in containers, and bare metal; and various other things.

All of these things are necessary because it's a complex system that needs to be deployable in different clouds as well as on-prem for big enterprise customers with an on-prem requirement.

None of the code involved is particularly complex, and it's involved almost zero maintenance over time.

That would less be the case if you were trying to roll your own, say, JDBC or DB-agnostic ORM equivalent, but there are generally off the shelf solutions for that kind of thing.

I would never argue against doing it your case, but implementing an abstraction because multi-vendor support is an actual requirement is quite different from implementing an abstraction on top of a single vendor because you are trying to avoid "vendor lock-in".

I agree with this, but would note that building an abstraction layer is not the only way to approach this issue. Just building the thing with half an eye on how you would port it over to a different platform is you needed to can make the difference between it being a straightforward like-for-like conversion, and having to rearchitect the entire app...

I've been at places where they were so vendor locked to a technology that there was a penalty clause for leaving that was in the tens of millions. It obviously wasn't cloud but the point still stands. If you don't have options you pay what they tell you or go out of business.

Yeah, I can't help but wonder about offerings like AppSync; in one level it seems cool, but I recoil a the thought of introducing a critical dependency on AWS for a core piece of the application layer.

Is there a middleground?

Perhaps standardizing on something like Terraform allows you to reduce the risk of going all-in on one vendor.

Similarly with Kubernetes; if you go all in on k8S, do you care where it's hosted or can you maneuver quick enough to the best provider?

This has been my company's approach. There's always going to be some provider specific stuff you have to deal with.. The networking has been a major difference between clouds I've noticed. But I'm guessing in most cases our Helm charts would deploy unchanged Toa different provider.

Most systems out there are not in cloud (and multi cloud is even more far fetched).

There is however a large number of painfully learned lessons of vendor locked in systems... no one got fired for buying IBM, right?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact