Hacker News new | past | comments | ask | show | jobs | submit login
Cloud Programming Simplified: A Berkeley View on Serverless Computing (berkeley.edu)
166 points by adjohn 3 months ago | hide | past | web | favorite | 42 comments

It’s also great for vendor lock in and rent extraction. At least in its current iterations. I’m fairly certain Amazon isn’t going to make it easy to run your “serverless” applications on anything but their infrastructure.

What would be more interesting would be languages and runtimes that run on the next network that is distributed, peer-to-peer, and able to be trusted. And easier to program for than they are today in their current iteration.

Edit: which is where I thought the BOOM group was going back in 2012 or so.

The interface between Lambda and your business logic is usually very thin so in case of simple REST APIs migrating from platform (as long as the target platform has a compatible language runtime) to another or to be executed in a web app framework is rather easy.

Serverless applications' reliance on other proprietary services like Cognito, DynamoDB etc is another thing however...

> Serverless applications' reliance on other proprietary services like Cognito, DynamoDB etc is another thing however...

I feel like even this lock-in is overstated. Those other services are billed based on usage and are completely separate from each other. Your migration path away would be one service at a time.

That's in no way comparable to traditional lock-in, which is like a decades-old Oracle database with high licensing fees paid annually and depended upon by dozens of complex business processes interacting directly and indirectly with each other. You don't really have a migration path in that case because there's not enough time / resources to decouple everything and migrate to something else, so you're stuck.

And that doesn't touch on the reality that if you're not relying on AWS for Lambda, DynamoDB, Kinesis, etc., then you're maintaining something yourself, which itself is not free from lock-in. Building a system that heavily utilizes Cassandra or Kafka may not have an annual licensing cost, but absolutely imposes a cost in terms of maintenance, team expertise, and (of course) makes moving away from them look not much different than moving from DynamoDB or Kinesis.

The problem is also that Lambda works really poorly with anything that is NOT Cognito and DynamoDB. For example, we were using some CouchBase we deployed ourselves - however, that meant that the Lambda now had a network interface in a private subnet, and start times skyrocketed, especially for concurrent access. When we also started doing our own authentication with yet another Lambda, request times for cold lambdas (which also means every new concurrent connection, remember) almost doubled.

There's also the fact that most of the code isn't really gone, it's just transformed from web server logic to lambda deployment/configuration logic.

The biggest win all told seems to be security - at least with Lambda you don't need to worry about staying on top of OS updates and the likes. I'm not at all convinced that we gained much of anything else.

Oh, and one thing that's rarely discussed is how much development&qa costs skyrocket when you have a larger team developing a fully serverless application, each with their own setup and running a few performance tests every few months...

We use lambdas all of the time with ElasticSearch, Aurora (Mysql), etc.

AWS is solving cold start times for VPC Lambda, they announced it at re:invent.

Cognito is just a standard authorization service that supports Oath 2.0, SAML 2.0, and OpenID Connect.

It is a drop in replacement for dozens of other authentication providers.

Migrating away from Cognito is hard and involves end user participation as you can not export password hashes.

It’s just the opposite of migrating users to Cognito.

If you are storing the passwords in Cognito and not using federated login, you should be able to insert a lambda trigger that captures the user’s password then authenticates the user with Cognito via code. Once the user is authenticated, store the password in your new store.

Yes that would be a slow process but it isn’t like you couldn’t move everything else off of AWS first and let that be a slow migration over time.

How is a .js file with a basic implementation of

  exports.handler = (event) => ({
    statusCode: 200,
    body: "Hello world: " + event.body
vendor lock-in? You can hang this in an Express server yourself at any time. You can spin up your own three billion nodeJS servers yourself with this. The only thing Amazon does is hardware / instance management, load balancing and of course billing, which I don't see as vendor lock-in.

But, the thing is so much code is removed in serverless. I'm not saying cloud providers aren't fighting to lock you in. It is just so much easier to switch if you have 1% of the code you used to.

I’m asking this from a place of genuine curiosity. Since the whole serverless thing has been gaining momentum, I’ve been mostly working on embedded stuff and haven’t been back to see what the new hotness is over in web land.

What code gets eliminated in serverless? I have historically done most of my http backends using things like Flask or Sinatra or Elixir or Go (with net/http), and I’ve never felt like there’s a whole lot of code to begin with. What am I missing (if you don’t mind elaborating)?

I'm thinking of Firebase. For example, with Firebase, my web clients (web or Android, for example) don't ever talk to my backend directly. I have a serverless firebase function that accepts a trigger when there is an update to the firestore (cloud database) or storage (like AWS S3). I get a JSON object (which is declarative) that I can use to send the file or data on to a new process. This decoupling makes my firebase function really small: it only parses JSON and sends an event, perhaps to another Google service.

Compare this to a large application where you are marshalling objects from the HTTP POST request into temp objects inside your application language. Making that work with all clients, within all server contexts, is a lot of code which I hated to write.

With serverless, I'm never connecting my clients or backends directly. I'm thinking in terms of moving high level objects (JSON, files, events) between systems, and Google provides the glue to wire those together. It takes a mind shift, but when you realize how much glue code you are writing and that you can forget, it makes your application layer really simple.

I've only done a little bit of this stuff, but so far my impression is your code will get more complicated in AWS if you glue in Flask/Sinatra/etc. for the API Gateway, but can still get a lot less complicated if you "think serverless" and run fairly tight functions in AWS Lambda. The system then takes care of auth, request parsing, response formatting, and most of the other stuff you'd do in Flask, plus of course it eliminates the actual web-server layer.

The downside is that you then need to call them via Lambda instead of using some kind of REST API -- however at least in my case, this turns out to be the much lesser evil as I'm only using them internally, and have my own code that makes the calls. My "lock in" is that I would have to change that code, but it's trivial.

API Gateway becomes your HTTP server, your routing layer defined in Flask/Sinatra/Express & co is implemented in API GW integration configuration and handling routed requests is implemented as Lambdas that process events from API GW.

With AWS Lambda I try to avoid using API Gateway as it seems to me that is exactly where the lock-in occurs. Instead I have one catch-all route on API GW and I keep my routing layer in the app. This also makes it easier to run the app on dev without recreating the entire AWS environment

If you work for any size corporation, they aren’t going to just switch their infrastructure because a developer said that they pinky promise the migration will be easy.

You’re always for all practical purposes locked into your infrastructure once your business grows.

You can absolutely do that. Then you just need to do more custom work to enable route level observability and metrics.


With no code changes, you can deploy a standard Node/Express app as either a lambda service or a standard Express app.

There are a frameworks like this from AWS for all supported languages. It’s well documented how to use the API Gateway lambda proxy feature.

For non API lambdas. The only thing you have to do is add an entry point method that takes has two arguments - a JSON event object and a lambda context.

I have another app that can be deployed as a Windows .Net Core service or lambda based on the CI/CD pipeline.

Have you seen Kelsey's Hightower demo where he used s3 event as a trigger for Google Cloud Function? You can already use serverless providers that abstract your function from the provider, but of course in that case you're locked to that specific vendor.

Also, no licence stops you from migrating from Lambda at any moment. Other Companies have you to pay enormous licencing fees and lock you for a one year contract or even longer.

It's in the paper as the second fallacy. They recommend something like a POSIX standard for Backend-as-a-Service components.

For the paper being introduced, the direct link is: https://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-...

I agree with a lot though I think they overegg the possibilities for data performance at the as-a-Function level.

Some more specific reactions:

> Put precisely, there are three critical distinctions between serverless and serverful computing ... 3. Paying in proportion to resources used instead of for resources allocated.

I've taken to calling this "buying capacity vs buying consumption". You need to think about which you need (they touch on this in the fallacies section).

An ambulance is idle almost constantly. When I was a kid they were built on F150s and vapourised ten litres every time you glanced at them. But when I need one I don't care about the idleness and resource costs, I want to come to my aid ASAP. What I don't want is the paramedics converging from random locations on electric scooters rented on-the-fly.

> In acritical departure, [AWS Lambda] charged the customer for the time their code was actually executing, not for the resources reserved to execute their program. This distinction ensured the cloud provider had “skin in the game” on autoscaling, and consequently provided incentives to ensure efficient resource allocation.

I'd say yes and no. Yes, it provides an incentive to the platform provider. But there is a much, much bigger incentive to achieve platform lockin through data services. Keeping the function alive after the first invocation is a subsidy. I would bet folding money that Amazon are tracking this cost and weighing it against their market strategy.

> By allowing users to bring their own libraries, serverless computing can support a much broader range of applications thanPaaS services which are tied closely to particular use cases.

I'm not sure if the authors are familiar with buildpacks, despite citing Heroku. They don't mention buildpacks anywhere.

Disclosure: I work for Pivotal, we do a bunch of stuff in this area, including Buildpacks and Knative. But nothing is forward looking, personal opinion, consult your dr etc etc

>We expect serverless computing to become simpler to program securely than serverful computing, benefiting from the high level of programming abstraction and the fine-grained isolation of cloud functions

In my view, that's the really key bit. Securing systems is an endless task, even (or especially) if they are cloud hosted.

Right now serverless is definitely not simpler than a monolith. It's "simple" if you squint but any real workloads have you dealing with high concurrency for almost all tasks that a single monolith could handle with a single instance.

You are right and I'm glad you're mentioning it, and the general skepticism about serverless; it's not a golden hammer, its applications are limited, and not every problem should be (can be?) solved with a serverless architecture. The same goes for microservices, which is another gold hammer that people picked up and started to apply to a lot of problems - I've seen a few projects where microservices was the initial architecture, while all of those would've been better off with a monolith for the first two years of operation (before any significant load) to figure out the actual problem to be solved.

I'm surprised a Berkeley survey would neglect to mention PiCloud, which may be defunct but cloudpickle is still used heavily today (e.g. in Spark). The paper also appears to neglect the issue of code deployment, which can be a major undertaking and hidden cost of any web-based application, especially if the app has particular system dependencies.

There are (private) solutions out there for moving parts of running JVM programs across machines. Wouldn't one expect a forward-looking view of Serverless to encompass serialization and transport of the compute environment?

moving parts of running programs across machines

Why is this desirable and what is the magnitude of the benefit?

One application is fault tolerance. If you need to take a machine down, you can move the node (JVM program) to some other machine. So similar to pausing a long-running Lambda function, moving it elsewhere, and resuming it.

Does anyone else not feel so good about a future of computing where everything but the application layer is rented from Jeff Bezos?

I feel fine; one of the main selling points of cloud computing (as will be imprinted on you if you ever do an AWS course, sigh) is that before, if you wanted to launch a product, you had to invest in a huge amount of hardware first - just look at the first few years of Twitter where they were struggling to scale up. I mean yeah you could (and can) get managed hosting, rent some servers still at a lot of providers, often for cheaper than at Amazon, but scaling those out is not easy at all.

Nowadays? If Twitter launched today they would have had no trouble scaling up from one user to a hundred million within weeks. AWS is a major (MAJOR) factor in the startup boom, and allows for crazy levels of scaling for new startups that attract a lot of users. If you set up your stuff right before launch, getting slashdotted (or the HN variant) is a thing of the past. The only limit to scaling now is your credit limit.

I agree with you on this, it is a moral dilemna I think about every day. Programming was one of the few professions where the practitioner owned/had access to their tools in their own spare time. Across history and professions this is a rare property, and I fear that we might lose this in the next 10-20 years.

Disclaimer: Views expressed are my own and do not reflect any positions held by my employer.

What I have access to with AWS just using their always free tier and their cheap offerings is more than I ever could have dreamed of 5 years ago, let alone as a hobbyist in my bedroom doing 65C02 assembly in the 80s.

It doesn't have to be Jeff Bezos. It could be anyone--like your IT team in your own company. Kubernetes already supports multiple serverless variants.

The significance is where the line of abstraction is drawn, as has been the case every time programming moves up an abstraction layer.

Developers are now in the same position system administrators were a decade ago when all of the infrastructure was moved to the cloud.

Cloud native has been the buzzword for a long time now, I don't see much difference with serverless in that regard.

>[serverless computing] closely parallels past advances in programmer productivity, such as the transition from assembly language to high-level programming languages

Is this serious? The author seems to imply that we can just abstract away the entire internet in the same way we abstracted away copper wires.

But we did, didn't we? How is modern serverless different from shared LAMP hostings of the past?

Shared lamp hosting didn’t give you the security, scalability or isolation that lambda gives you.

The implementation was lacking, but the concept is nearly identical.

Great paper, I've only gone through about 10% of it, but it offers a lot that the vendor docs and industry talks don't.

Btw, this is David Patterson one of the inventors of RISC fame. Great podcast with him here (mostly about new Tensorflow chip and history): https://softwareengineeringdaily.com/2018/11/07/computer-arc...

Worth noting that he's at Google, but it looks like he had AWS reviewers.

The paper is a good read for those fairly new to Cloud Computing, even though it was published about 10 years ago!

This is not that paper it is a new one on serverless.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact