Hacker News new | past | comments | ask | show | jobs | submit login
Apache OpenWhisk – A serverless, open-source cloud platform (apache.org)
332 points by based2 on Feb 18, 2018 | hide | past | web | favorite | 137 comments

Nowadays when someone claims to be "serverless", there can be two interpretations:

1. It's a server actually. It's just that the server serves up functions, without state. So a more correct terminology would be "stateless servers"

2. Truly P2P (Like blockchain), therefore there truly are no "servers".

Used to think Amazon did a great job coming up with the jargon that's great for marketing, but lately as we see more and more option 2s which are TRULY "serverless", the option 1s increasingly sound like imposters.

After all in option 1, there ARE servers. To be able to claim you're "serverless", you really shouldn't have servers.

Like "cloud computing", the name "serverless" doesn't make much sense when used literally. It doesn't mean that there are no servers involved, just that there are none to be directly managed by the developer.

P2P also doesn't mean "no servers", but that every node is as much server as client.

Amazon popularized but didn't invent the term "serverless", which appears to have been first used in 2012.[1]

[1] http://readwrite.com/2012/10/15/why-the-future-of-software-a...

> It doesn't mean that there are no servers involved, just that there are none to be directly managed by the developer.

They already had stuff like that. S3. RDS. Elasticache. At least the cloud gives you the nebulous feeling of servers off in the ether somewhere.

But serverless is neither specific nor accurate. It doesn't work. It's nothing more than lazy marketing-speak.

> S3. RDS. Elasticache.

A requirement for serverless would be having access to a general programming language which is why those examples don't really apply.

Functions-as-a-service seems clearer to me

Me too! It also works nicely when thinking about the IaaS ⟷ PaaS ⟷ FaaS continuum of abstraction.

Eventually we'll get to LaaS, where instead of having to find, test, and integrate libraries, you type in some natural language like "send an email", and pop down a guaranteed-compatible-for-X-years, 100% uptime, edge-case-handled function that automatically bills per execution directly to your FaaS account, no registration or invoicing required.

The name for "just give us your code and we'll run it" has been PaaS for a while.

They actually mean this - https://en.wikipedia.org/wiki/Common_Gateway_Interface.

1993 is the new 2018.

edit: And the life of the Apache Web Server (the first AWS) started with that interface.

Clearly the next big development in this space will be some form of server-side scripting language which can be embedded in HTML and is tightly coupled with the web server, some sort of Hypertext Preprocessor.

I bet if AWS acquired ColdFusion from Adobe and rebranded it Elastic Markup Platform it'd be the hot new thing

I was just telling my wife this morning that this "Serverless" business was just CGI reinvented. I don't get it.

I remember CGI; curious if you're able to elaborate how you see the analogy.

Pure CGI applications (before the mod_x / long-running workers strategy took over) are "stateless" / "function as a service" insofar as the application's internal state can only live for one request lifecycle. However, they generally ran with the external state (ex. local filesystem) preserved between requests, so the approach was still often a bit different than it is for modern FaaS applications, where state can only live off-server (ex. in a remote datastore or another service).

If you take a CGI based implementation and constrained it to use only external managed services, and you use Lambda to serve it up, you won't need to manage servers explicitly.

The real serverless apps are pure clients that use 3rd party services for everything - i.e. the developer doesn't write any backend code that runs on their own infra.

Is something like mod_php what you'd consider a CGI application? I always think of CGIs from before the day when webservers made it easy to integrate a runtime. For me, CGIs were usually binary executables that the webserver handed data to. I always thought it was about the data exchange, instead of about application state, because http requests are by default generically stateless.

To me it's about the application lifecycle, so no, I wouldn't consider mod_php CGI (which is why I specifically called it out in my parent comment).

CGI does the same thing Lambda does: it binds the application lifecycle to the request lifecycle. HTTP requests to an application server that runs across multiple requests are not stateless, they're inherently bound to the server's internal process state. If the server is well-written this shouldn't matter, but the entire reason an application has to be refactored at all to become a Lambda is that this is not usually the case. Most modern application-server style web apps are loaded with mutable global state that lives through many requests - from thread / connection pools to global caches to updatable configuration stores to even dynamically loaded code/modules.

Serverless means you don't have to explicit manage hosts yourself. They are transparent to you. In that case Lambda is no different from blockchain, where you have either AWS/server farm to run the actual code for you.

Which in term, makes this project weird. Because in order to use it, you still needs to hosts a group of servers yourself. If someones else hosts it for you, you don't really care what framework they are using.

> Serverless means you don't have to explicit manage hosts yourself

So they say but it still sounds daft.

I have a ticket to fly from London to NYC but all I have to do is turn-up and sit in a seat, someone else manages the imfrastructure. Aeroplaneless architecture!

I haven’t seen option 2 things referred to as serverless. Their descriptions often involve the word blockchain from what I gather.

Seeverless is a catchy name, and a better one perhaps would have been function oriented or function service or some such. But it really does mean swrverless in that (a) there isn’t one server and (b) there is zero configuration and maintenance of the server for you to perform.

> I haven’t seen option 2 things referred to as serverless.

People use "blockchain" to describe it to other non-technical people, but they do need a way to describe their architecture. Among blockchain/crypto developers the word "blockchain" has become washed down too much that it really doesn't mean anything. So when they need to explain their architecture they do need a way to describe how their app doesn't have a server.

What I've seen is people say "my app is 'TRULY' serverless", or "My app does not require a centralized server". This was actually my point, because it looks like people who are actually building truly "serverless" apps find themselves in a situation where the word has been hijacked by server vendors, so they have to use qualifiers like "truly", or explain it out in more detail to actually express that they indeed don't have a server. Not saying all the FaaS guys don't have a point, just saying it's kind of a funny situation not so different from when "hoverboard" was viral a few years ago. Everyone knew it didn't "hover" but it was too late because the brand was already taken.

> But it really does mean swrverless in that (a) there isn’t one server

Normally when you say "X-less", it means "there is NO X", not "there are more than one X".

Would you say that if you use Zipcar and don’t own your own vehicle, you live a carless lifestyle?

And are you saying that people in the crypto world have blockchain as such an implicit given that they don’t need to describe it, but a distributed nature of its operation is something that explicitly needs to be stated every time? That seems really silly. “As a public ledger” is better than serverless in that case. Or “smart contracts”. But serverless really is a misnomer buzzword in that case.

> Would you say that if you use Zipcar and don’t own your own vehicle, you live a carless lifestyle?

This is all wordplay and we can go on forever like this. Using your measure, we can also say any SaaS is "serverless"--heroku is serverless, aws is serverless, digitalocean is serverless--because nobody owns a physical server anymore.

> And are you saying that people in the crypto world have blockchain as such an implicit given that they don’t need to describe it, but a distributed nature of its operation is something that explicitly needs to be stated every time? That seems really silly. “As a public ledger” is better than serverless in that case. Or “smart contracts”. But serverless really is a misnomer buzzword in that case.

Again, this is really difference in interpretation of expressions. When you use Ethereum to write a smart contract, you basically store your application state on Ethereum. Compared to ordinary apps where you host your app on a server you own, this sound pretty much "serverless". If you have an AWS instance you have to pay monthly fee. If you deploy a contract to Ethereum, you just pay once when you deploy, but once deployed you don't pay for hosting.

"Public Ledger" is an ambiguous terminology which doesn't fully describe what blockchains do either. You could have a public ledger without it being decentralized nor transparent. A central authority could just make their own database public, but they could always manipulate the data if they own the database. The whole point of blockchain is that no one party should be able to control the truth.

"smart contract" is also generally considered a ridiculous terminology by those who actually do work on building "smart contracts". They are mostly "dumb" contracts. In fact, when you write these contracts, the dumber the contract is the better it is. If you try to be too clever, you end up with stupid mistakes like the parity hack. Therefore most "smart contracts" are ideally nothing more than transparent auto-updating distributed databases which can send messages to one another.

I'm not saying you're wrong and I'm right. I'm not even saying blockchains should be called "serverless" instead of FaaS owning the brand. I'm just stating what I've observed: Most FaaS services are hosted on a server that the owner needs to pay for every month. Blockchains don't have an owner and there is no "server". I think blockchains are closer to "serverless" than FaaS because of this, but by no means I consider this a perfect description of the technology.

Language construct trip - Less Sugar vs Sugarless.

On a separate track - Is it the sign of my age, that I am either concerned about developers not knowing the impact of their code on the hardware, or just whimsical and miss the days when one opened strace or kdb to see how efficiently the code executes? Does someone else feel this way?

One of the less desirable side effects of hardware becoming cheaper is that there are many places for a lousy programmer to hide.

I just interpret 'serverless' to mean 'an abstraction layer above the server'. The term is poorly named, but just like git commands you gotta go with what takes off. shrug unicode

I thought there were nodes in blockchain.

I think that the best name is "server for dummies".

As far as I know, all the projects that promote themselves as "serverless" are stateless functions and not designed to store states, which means if you want a database you will need to store it elsewhere and utilize it from the "serverless" container. So no DB for you out of the box if you want to use serverless architecture.

So I don't think it's for "dummies". Dummies want an all-in-one server.

> As far as I know, all the projects that promote themselves as "serverless" are stateless functions…

Any normal PHP app is just a collection of stateless functions, but that doesn't automatically make all PHP apps "serverless" apps. What makes something "serverless" is the FaaS[1] execution model.

[1] https://martinfowler.com/articles/serverless.html

Or, you know, it's better to reference the source: https://tools.ietf.org/html/draft-robinson-www-interface-00

Most PHP apps are run by DBs. They are not stateless.

Most "FaaS" apps also access a remote datastore as well. The fundamental value of most web software applications revolves around state transformation in some way or another. FaaS just "cleans up" the application state insofar as it forces the state boundary to a comparatively slow edge and doesn't let you, for example, store things in in-memory shared datastores.

in the past year or so i've seen serverless extended to anything where the underlying compute is abstracted away. so ex. bigquery and dynamodb, two very stateful services, are also described as serverless.

Probably a good time to mention the CNCF Serverless Workgroup and the Serverless landscape. It lists all the Kubernetes-native Serverless frameworks as well as the proprietary ones.

OpenWhisk was one of the first projects in this space and is written entirely in Scala. Good to see a Kubernetes integration emerging.


Other notable projects: Oracle's Fn, Bitnami's Kubeless, Platform 9's Fission, Iguazio's Nuclio and OpenFaaS (community project). (there are more)

Red Hat is participating as well:


They're really invested in Kubernetes with their open source OpenShift PaaS.

You can except more commitment to Kube from OpenWhisk.

For JS developers looking for a Backend-as-a-Service, Hoodie(http://hood.ie/) deserves a mention: <Hoodie is an Offline First, noBackend architecture. Its Dreamcode API gives you user signup and administration, data storage, loading, synchronisation and shares, emails and payments and can be extended with plugins. Hoodie is written in JavaScript and Node.JS and relies on CouchDB.>

Can someone explain to me WHERE the servers exist for this "serverless" platform? Is this code i run on lamda and lamda competitors? Is apache hosting it? Do i have to set up my own servers to run the code?

All the intro docs and FAQs seem to assume that "serverless" is literally no servers, just magical cloud fairies or something.

I get that it uses docker containers, but where can i / am i expected to run those?

This is open source, so "someone" has to run it.

This someone could be e.g. IBM or RedHat, it could also be someone like Adobe that wants to offer a FaaS integrated into their main product.

It could also be your own company, e.g. you could have an infrastructure team that provides it for all other teams.

I find mentions of “deployments”, but unlike a platform like Kubernetes I can’t seem to find any mention of how to set up the “serverless servers” anywhere. And until then it’s just a really neat local CLI

It looks like OpenWhisk right now deploys onto JVM application servers you provision yourself, using runc+Docker and Akka clustering under the hood to orchestrate everything. I think they have plans to target Kubernetes as well. FaaS/"serverless" (a terrible name) is just a layer over an orchestration/deployment system (like Kube) that spins up instances in response to triggers / events and handles routing and scheduling of the instances in a way that's supposed to be transparent to the end-user. https://github.com/apache/incubator-openwhisk/blob/master/do...

yeah that was my take too. I have zero clue what's involved with the hosting of this and as it's an open source serverless product that's the most important bit. How easy or hard it is to use is irrelevant if i don't know _where_ to point things or what to do to make there be something to point things at.

Think of it as of some sort of a cluster.

On a single machine, you can run processes ("invoke functions") locally. On a cluster, you can run a process, and it will start somewhere, on one of the less-loaded boxes.

With "serverless", the cluster is the cloud (a datacenter). You do not have to maintain it, but you can specify the code to run, and it will run, as many copies of it simultaneously as needed (thousands or zero), and you are only billed for the resources you consumed. It's ultimate elasticity for non-interactive load.

I appreciate the explanation, but you misunderstand me.

I get "serverless". I totally get lambda and some of the other open source clones of it. What i don't get about "Whisk" is if this is something that builds on top of existing serverless infrastructure that makes it better, or if it's it's own serverless infrastructure that you have to install on your own servers, or ... what. they're completely opaque about the actual server side infrastructure of this "serverless" platform.

It's essentially 'set up your own servers and run a lambda-like service yourself.'

Here's some documentation for OpenWhisk on Kubernetes (running locally or on your own servers or cloud providers' Kubernetes cluster meeting the requirements specified). https://github.com/apache/incubator-openwhisk-deploy-kube/bl...

There are other ways OpenWhisk could be deployed besides as well (see https://github.com/apache/incubator-openwhisk/blob/master/an... ), but it appears you will need server(s) or local computer to run OpenWhisk on.

I do completely agree the Apache site for the project seems lacking in terms of the deployment docs. Which is a shame, because as you can see from the links I posted, deployment can be quite involved. It is still incubating though. Maybe they need volunteers to step up and help with the documentation on the Apache site?

This is all just from what I could find in docs/searching for info, I've never run OpenWhisk.

I get the impression that Apache are running the servers themselves, but it’s not clear from skimming the documentation.

Edit: I could be wrong, looks like there is an OpenWhisk offering from IBM: https://console.bluemix.net/openwhisk/

The main page is incredibly ambiguous and confusing. Full of market speak and totally unclear.

The goal of the project is to give an Operations Team the ability to not have to manage the platform and software stacks being utilized and to give developers the luxury of not worrying on how to scale their application so long as they follow some basic constraints.

It's actually serverless for the user. You run a server but the user of your service (apache openwhisk) only needs to define code and triggers on your service, that runs on your servers.

So serverless is essentially a return to the PHP/CGI execution model... What's changed to turn a bad idea into a good idea again?

Was it ever a bad idea, or did it just go out of fashion? I'm not a PHP fan, but I miss the simplicity of low-end deployments on a LAMP stack by literally dragging files over FTP. It seems like something has been lost with the total embrace of more modern web frameworks.

It's just a difference in scale. I mean, you can still setup a FTP and drag and drop your files to deploy, if you want to. But companies with many developers deploying multiple changes per day, needs something that works better and faster. Not saying that serverless is for them, but that's why things get more complex sometimes. But you're still able to choose "older" things.

Definitely, I guess I just miss having a platform (LAMP) that scaled down as well as up. I think it's really cool that some of the most trafficked websites run WordPress and MediaWiki which can also be installed in a few clicks on a $5/month shared host. I recently wrote a Python web app that uses Dynamodb and Lambda and even though it's open source it feels way less portable.

Yeah. PHP may be out of fashion, but it's still powering a massive portion of the internet because this deployment model is so lightweight. Much different than the now-popular model of "spin up an app server and reverse proxy to it".

It's less of CGI (some projects only support that) Think of it like FastCGI where most of the issues with CGI are fixed up (i.e. throughput isn't an issue) and then combine that with containerisation features from Kubernetes etc to make packaging easier (again not all projects use a Docker image format).

If you're wondering whether Serverless is catching on then see also: AWS Lambda.

But isn't "serverless" fundamentally a restart-the-world model, which would make it more like CGI than FastCGI which is a application server type model which isn't fundamentally that different to putting a reverse proxy in front of a HTTP speaking application server?

No. In AWS Lambda at least, the container your application runs in lives through multiple requests until some period of inactivity passes and it is shut down. It isn’t “running” in the sense that your code is in control, but long-lived things like database connections do not need to be re-established on every request.

That analogy only really applies if you stick your serverless function behind an API gateway and invoke it through an HTTP request; honestly, AWS lambda behind an API gateway has always felt like a pretty poor way to build a microservice. Serverless functions seem to come into their own when they're connected to a reliable event triggering system - that looks to be the really interesting part of OpenWhisk, that it adds extensible triggers, much like the way AWS lambda can respond to events within the AWS ecosystem - but platform independent. In a high volume event-based world, small stateless processes become a very appealing idea again.

- Scaling infrastructure up and down is hard to get right.

- Paying for idle servers can get costly if you have many of them.

- Idling servers are inefficient for cloud providers as well.

A complete machine instance is too coarse a grain sometmies. So the progression is: "A physical box" → "A VM with many things running" → "A half-OS and a couple of related processes in a container" → "A single OS-process-like function invoked on demand".

The upside is that you invoke the "functions" not as processes on one box you have to maintain (*CGI), but on many boxes maintained automatically. Your limiting resource is not the limitations of the box, but only your budget. Also, stuff like security, software updates, load balancing, etc is taken care of by the provider.

CGI didn't scale because forking is literally part of the API. It was a good idea other than that.

I have been interested in running a serverless setup for a while now, so I’ve been looking at different open source alternatives to AWS lambda. I considered OpenWhisk but felt IronFunctions would be quicker to setup and easier to run. Anyone has experience deploying/using both? Would anyone that is running OpenWhisk comment on the process to get it running?

OpenWhisk can be started using Docker Compose locally with a single command (make quickstart) using this repo: https://github.com/apache/incubator-openwhisk-devtools/tree/...

It can also be deployed on k8s with minimal effort: https://github.com/apache/incubator-openwhisk-deploy-kube

There's also instructions for using VMs as the infrastructure layer: https://github.com/apache/incubator-openwhisk#quick-start

If you have any issues open an issue or join us on the slack (http://openwhisk.incubator.apache.org/slack.html).

(I'm a committer on the project).

In my previous quick search I did not find the incubator-openwhisk-devtools repo, that left me with the impression that I had to run k8s which for my purposes is too heavy handed. A docker setup is more in line with my needs (run on vpc for quick side projects). thank you!

It would be great if they explained that anywhere in their documentation.

You might be interested in [Oracle's] Fn Project, it is open source and created by the team behind IronFunctions. I think it started life as a fork of IronFunctions. It is written in Go.


Indeed, the original founders of Iron.io are at Oracle. They are up to some good stuff. See https://medium.com/fnproject/8-reasons-why-we-built-the-fn-p...

Thank you, will check it out. It did not come up during my (rather quick) search for alternatives.

Self plug but https://github.com/1backend/1backend literally needs just 4 docker containers to run.

They are in the ReadMe, let me know what you think of it.

It contains a bunch of unique features, so it's not your typical serverless platform, but the core is the same if you don't want to go crazy with the web framework like features.

It's great to see so many functions platform - in some ways it shows there's increasing interest, awareness, and adoption. The field is still quite young and there will be a lot of experimentation and exploration.

There's also a difference between deploying rapidly for local development (less is better) and actually running a functions platform at scale. AWS lambda reportedly handles over 2B lambdas a day (this was a year ago).

Not every platform needs that kind of scale, but it does put things in perspectives.

Hi there!

Thanks for the encouragement. To be fair we at 1backend don't consider the field that new at all. It's just the old PaaS concept with some twists and reworded marketing!

I personally spent years building microservices platforms at various places and after years of not implementing my insights I decided now the time is right and people might adopt it given the current hype cycle =)


I'd agree, old ideas, new contexts. The twists matter.

Take a look at the other options I posted. IronFunctions runs one container per request so you'd need to evaluate if that's fast enough for your purposes. They have a new "hot" mode too but I'm not sure how far they are with supporting that.. maybe Chad could comment if he's around.

If anyone else was wondering, this seems to originate from IBM in collaboration with Adobe. More information on the architecture here: https://thenewstack.io/behind-scenes-apache-openwhisk-server...

OpenWhisk started entirely from IBM Research. It was open sourced in Feb 2016. Adobe came on board as part of the induction to Apache Incubator.

OpenWhisk is what runs IBM Cloud Functions: https://console.bluemix.net/openwhisk/

To anyone who has a good feel for this space. Where do you think all this is heading in a 3-5 year time period? What is the medium-term vision like? I'm just curious.

I'm currently migrating to Kubernetes. Hard to tell where things go in 3yrs (the pace of change is so high, a year seems like forever)...BUT...a couple of things that drove me to the migration:

1 - Less stickiness to AWS since the serverless architecture would abstract away the provider and i could theoretically even grab the cheapest spot instances across providers (AWS, Google Cloud, SoftLayer, Azure) 2 - Lower compute bills (due to better usage and being able to more tightly pack stuff into nodes) 3 - Lower HR bills due to a lot of stuff just scaling up without effort rather than dozens of deployment/babysitting staff

Hard to know for sure, but these are the three things that drove me to Kubernetes.

But I find some type of persistence of data usually figures into my serverless calls. And those tend to lock you into a vendor. Especially when you want to start analysing the data you have.

Even if you could separate them it would add latency. Now if it JUST compute... and I can see that as a possibility for some CV applications.

Consolidation. There are way too many frameworks right now and they offer nothing different, especially for something so commodity as packaging and running a single function. The major cloud vendors are also rapidly evolving with Azure Functions being the most integrated and useful with AWS second and GCP far behind.

The next few years will see 1-2 open-source frameworks along with all the cloud vendors getting to parity of running almost any language in response to almost any event source. Templated data pipelines and other processing should also become common and as easy to run as a docker container is today.

>The major cloud vendors are also rapidly evolving with Azure Functions being the most integrated and useful with AWS second and GCP far behind.

Can you expand on this, please? I've got a very basic understanding of the first two, would like to know more.

Azure has the best "serverless" functions platform out of the major cloud providers right now. Easier deployment process and triggers to activate a function outnumber the other vendors. The new Event Grid service allows connecting them to pretty much anything, especially when used with the Logic Apps service.

AWS Lambda is decent and has features like Lambda@Edge which runs on CDN nodes and can do processing on each request.

Google Cloud Functions are basic and only support javascript with http, pub/sub and storage triggers without any integrations beyond that or into the rest of the platform.

I am somewhat confused about this product. I think the Function as a Service approach has really some good advantages on more classic approaches sometimes, but if I want to go that path, what is the advantage to actually use an open source product and maintain the server with all the maintenance required (security updates, platform updates, handling scale, etc...)?

Wouldn't be an out of the box solution provided by AWL Lambda, Google Cloud Functions, etc... a better and more straightforward option on this space?

I am just wondering if I am the only one with this opinion, what is your experience on this topic?

Freedom to deploy anywhere is quite important to some people. One can truly appreciate it only after gets burned by one of these "cloud" provides(i.e. Google appengine/cloud).

IBM offers that open-source platform as a managed service, IBM Cloud Functions (https://console.bluemix.net/openwhisk/).

"Um, how is this SERVERLESS? Clearly it's running on a SERVER. I'm so sick of this marketing jargon."

- HackerNews, every time.


I'm new to serverless, but I've wondered what happens if:

- The function get stuck and run forever?

- It causes a request to loop?

And you have hundreds or thousands of these "functions" deployed?

Couldn't that cause sky-high costs if it goes undetected or even unwanted DDOS against other's infrastructure?

The first one is easy, you just have a hard limit on runtime per invocation.

The second is harder. Last I checked, AWS Lambda for example would allow a loop between two functions to run forever. You have to build loop detection into your software.

Systems like this must have an extensive quota-and-limits system. It will disallow overlong runtimes or excessive memory use, and if you keep doing that will run out of quota (which is usually attached to billing.)

For now, when adopting FaaS, you must be vigilant to adopt rigorous monitoring and alerting for how your system is behaving so that you can detect situations and correct them before they become significant problems. Of course, the complexity introduced by monitoring flies somewhat in the face of the simplicity of deploying to FaaS, which is friction that your developers must overcome. - Designing Distributed Systems by Brendan Burns

Lambda recently added customizable concurrency limits per function which was partly to address the "function cycle" issue, ie at least your costs wont blow out too quickly.

The classic example of this is a function subscribed to S3 events which in turn creates an s3 object itself :P

Excellent, more competition in serverless means increased adoption. Even better if it is open source, in contrast to GOOG/AMZN/MSFT offerings! Seems like an undercutting attempt on the IBM side?

Could someone with production-level serverless experience share his insights on OpenWhisk capabilities and roadmap? What's missing from the G/A/M stacks?

Apache OpenWhisk has a number of differences to other serverless platforms (as well as being open-source) including...

- Excellent "out of the box" runtime support including Node.js, Java, PHP, Swift, Python, any binary runtime (Go, Rust, etc..). - Custom runtime support using Docker images. Allowing custom images as the execution environment makes it easy to use (almost) anytime on the platform without needing this to come built-in. Custom images can be used to add large libraries or other resources. - Integrated API Gateway. Makes it simple to expose custom routes to functions. - Multiple built-in event sources including HTTP events, CouchDB, Kafka and more. Platform supports registering custom event providers to add any external data feed or event source. - Higher-order function composition patterns including sequences.

There are numerous open-source serverless platforms but OpenWhisk is the most mature and one of the only open-source platforms powering commercial serverless offerings, being used by IBM Cloud Functions, Adobe and Red Hat. It has used by IBM's offering with customers since early 2016.

Development is all public with the project being in the Apache incubation phase. Upcoming features being worked on include Swift 4 support, on-going k8s integration and better higher-order functional composition patterns. Check out the Github PRs and issues for details.

(Disclosure: OpenWhisk committer and IBMer).

I'd point out that supporting languages "out of the box" and having extensibility via docker images is more or less table stakes at this point.

A lot of the other things you listed are present in other FaaSes too in varying degrees.

No questioning that OpenWhisk was the first major opensource project in this space, though.

Docker extensibility is common in the open-source faas projects but not in any(?) of the commercial serverless offerings (Google, Amazon or Microsoft).

Runtime support (upload code not a container) is more common in the commercial offerings that most of the Docker-based faas frameworks, which use a container as the packaging.

OpenWhisk is more flexible than most of the platforms with respect to packaging IMO (Wanna deploy code? Great. Prefer containers? Sure. Wanna use code and a custom image? No problem!)

Event support is more common in the commercial offerings (as they have cloud services for you to use), whereas the "faas" framework seem more bare bones. You'll have to configure event integration manually. Same goes for API gateways.

OpenWhisk supports numerous open-source event providers with a extensible API for integrating new providers. This is more complete that lots of the other projects in this respect.

Over time these features will become default in all projects but there does seem a split in focus between the commercial offerings and "faas" frameworks at the moment. I'm biased :) but I think OpenWhisk has a comparible feature set to Lambda (having been around for over two years) whilst still benefiting from the open-source docker story.

> Even better if it is open source, in contrast to GOOG/AMZN/MSFT offerings!

Most of the Azure Functions codebase is actually open source under the MIT License: https://github.com/Azure/Azure-Functions

That's great but it would be good to know what's missing? Can I take that repo, deploy it and start selling FaaS?

I don't know the answer to that myself unfortunately. I know the entirety of the runtime itself is OSS and when you do local run/debug of a Function it's the same runtime that Azure uses.

It's not an apples to apples comparison though since OpenWhisk is intended to be self deployed.

I mentioned Functions here just because many folks don't know that most Azure SDKs, runtimes, tooling, etc are OSS by default.

That's the problem though. It doesn't matter much that it's open source if it is not usable in a sense that you can't easily deploy it yourself or repurpose, without committing (and paying) for the rest of Azure stack.

Does anyone have informations about the scalability of FaaS models ?

Despite the monitoring part that must be quite hard.. I wonder about the network saturation in case of services that get millions of requests per minutes and having, in this model, function interactions going through the network

If you're run k8s on your infrastructure there's a pretty slick alternative called Open FaaS: https://github.com/openfaas/faas

There are a number of alternatives targeting Kubernetes. I work on one - Project Riff. Others I've peeked at are Kubeless, Fission and Fn. Some are pure k8s systems (like Riff) and others like OpenFaaS aim to be portable across orchestrators.

I believe OpenWhisk is being steered towards running on Kubernetes at some point, but I am not qualified to say. When we looked at it last year as part of the research which led to Riff, we were struck mostly by how many moving parts are involved.

Disclosure: As I noted, I work on one of these things for Pivotal.

OpenWhisk supports Kubernetes as a "first-class" deployment platform. Red Hat have been doing lots of work on this since adopting the project (https://developers.redhat.com/blog/2017/06/07/red-hat-and-ap...).

See here for more details: https://github.com/apache/incubator-openwhisk-deploy-kube

(Disclosure: I'm a committer on Apache OpenWhisk).

To be fair with other solid "on-top-of-K8S" frameworks: - Fission by Platform 5 https://platform9.com/fission - Kubeless http://kubeless.io/ - Nuclio by https://github.com/nuclio/nuclio

Not to mention few newer ones (there are over 30 serverless frameworks for any taste and size).

Thanks for the mention of OpenFaaS

"an open source cloud platform" is probably an overstatement.

It's what OpenStack has been - "opensource cloud platform". But yes, I agree, overstatement for OpenWhisk.

Is anyone actually using this (and other open source solutions that do similar things) for anything?

The cloud providers offer "serverless" billing for AWS Lambda/GCP Functions/Azure Functions in the sense that you pay just for the compute time and don't manage the underlying infrastructure.

Outside of vendor lock in, what advantage does running a service like this really offer? To make a system like this usable you'll probably need at least a couple of boxes available 24/7

Run on top of k8s it, and similar projects like Riff, allow you to run Lambda-style FaaS outside of clouds / on bare metal. That's the main benefit I see currently, besides that many companies what to abstract their code to be as cloud agnostic as possible.

Also, if you're running lots of function iterations, it can be cheaper to run this way than Lambda etc.

IBM offers that open-source platform as a managed service, IBM Cloud Functions (https://console.bluemix.net/openwhisk/).

People hear server-less, and go straight to thinking about hardware … in my mind, server-less apps are missing the "application server" software, or it's extracted away so far as to appear to be the operating system from the server-less functions point of view.

I am excited about this project especially as its built using AKKA which I am a big fan of.

Is there something to bridge the gap between serverful and serverless? Is there a good enough framework that lets me write Rails-like code and have it compile into a bunch of serverless jobs?

I don't know about rails, but there's a good Flask -> Lambda project called Zappa https://github.com/Miserlou/Zappa

I found Zappa to be a bit peculiar - so I’ve been toying with `serverless`, which isn’t without its faults either.

However, with Zappa, you begin by downloading a bunch of wheels (pre-compiled python packages) as dependencies for your project - I guess this helps with the packaging / deployment problem, but it’s also bloat that you have to think about - and quite frankly a bizarre introduction to a framework that made me feel less than confident that it’s not duct tape app the way through.

With serverless, the wsgi plugin, and the python packaging plugin, you’re compiling wheels on a rocker container which means you’re requirements remain “yours”, you compile only and exactly what you need, and it looks a bit more like “normal Python”.

The downside of `serverless` is that you don’t get the “task” integration built in - so you pretty much need to wrap boto3 with your own api that mimics celery/rq/etc.

Also, as is the case with any solution I’ve found is that they’re peculiar to work with if you write your python code as a package - i.e. with a setup.py. You end up in a strange place with “-e .” not behaving well in a requirements.txt or a Pipfile that is used to package for Zappa.

Close enough -- I'll look into it, thanks!

AWS Chalice is similar to Zappa as well.

Similar but if you try the two side-by-side, you likely like Zappa much more... unless you work for AWS :)

so, serverless is (fastcgi + container) these days? at least fastcgi website is down for long time, and seems abandoned already along with cgi.

Yes, FastCGI would work, as long as you don't mind wasting money.

But what happens when you have billions of scripts, each executed infrequently? You need some way of fetching the scripts dynamically (they are changing constantly), and way to time-multiplex their execution so you don't waste RAM on inactive scripts. Oh, and load-balancing suddenly becomes non-trivial because you prefer to route to one script spun up on one box (as long as the usage doesn't justify multiple boxes), instead of accidentally spinning up that script on every box.

What's new is the layer that is managing fetching, starting, and stopping the scripts. For developers, it's 'kinda the same'. But for operators, it's a massive change in how things work that enables a different paradigm.

Any way to compute in HPC clusters using FaaS?

Not where you can bring your own image as of yet. Although Thomasj above mentions you can do this in OpenWhisk.

The closest to scientific HPC I have used personally is pywren https://github.com/pywren/pywren

Which is attempting to add more cloud backends. It's out of Berkley University.

Hi! Pywren project lead here, we're working on more and more fun HPC-style projects! We're also interested in expanding to more and more backends, so please feel free to reach out if there's something you'd like to see.

I meant more like the opposite, where you can do parallel computing in a cluster by using a simple serverless FaaS paradigm. The idea is to remove the headache of manually setting up an MPI cluster, worrying about MPI communications, etc.

Since it's serverless, I assume it runs on the client. How does it scale then? And is the client really the right place to run these functions, in view of performance, and perhaps battery use?

"Serverless" is a marketing term invented by Amazon for what might in retrospect have been better described as mini-PaaS.

It doesn't literally mean "without servers". It means that the developer doesn't care about them. AWS have energetically pushed the line that in The One True and Holy Definition of Serverless operators don't care either.

Which, gee whiz, what a coincidence, distinguishes Lambda from opensource projects like this.

Disclosure: I work on a FaaS, Project Riff, for Pivotal. So I guess we're kinda competing with AWS? -ish?

> "Serverless" is a marketing term invented by Amazon for what might in retrospect have been better described as mini-PaaS.

Popularized by, but not invented by: http://readwrite.com/2012/10/15/why-the-future-of-software-a...

> "Serverless" is a marketing term invented by Amazon for what might in retrospect have been better described as mini-PaaS.

Almost. Imho "serverless" describes more the architecture of applications, in contrast to "mini-PaaS" which would then describe the platform on which such applications are deployed.

What about the architecture is different, in your view?

Warning: handwaving ahead. Serverless just means each and every request from the client spins up a process on a cloud platform to handle the request. It is "serverless" in the sense that the actual hardware and OS running the process is completely abstracted away from the developer. You just tell the platform which functions should respond to which routes and the platform handles the load balancing, deployment, and (usually) spins up a container for each request on demand.

serverless means that the server is abstract it away from you, you just have scripts that run and if it needs to run grow to more servers it will do by itself

An emerging market for functions is also in IoT and edge use cases. See https://aws.amazon.com/greengrass/ if interested.

I'm curious to know whether what your describe actually exists.


Serverless is an annoying buzzword term that actively confuses those who don't know the architecture. What is it, really? (someone else tried to explain above). What does if offer? What is an example use case or business/research function deployed on "serverless" services?

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact