Here is an example of a program built in Python that uses Kappa, and here is a video tutorial on how I deploy that program with kappa.
Obviously I disagree with the premise. It's true that it is more difficult to use than other technologies and you'll certainly pay the pioneer tax, having to develop your own tooling, but it's ready for production traffic.
Error handling is ok, could be better (it takes a while for the cloud watch log to show up).
The real big problem is testing. It's really hard to test if you have more than one function because there is no mocking framework (yet). It's fairly easy to deploy and test with a test account, but local testing still needs to be solved.
The really hard part is integration testing. You can Chaos-Monkey your Lambda functions, but you can't Chaos-Monkey DynamoDB. We're looking at ways of building tooling to do that.
For deployment, we wanted to use Serverless, but as they started to move away from CloudFormation, that didn't work for our more enterprise needs, so we've been rolling our own, based on ideas from Kappa.
Have you guys figure out a way to do offline testing of one Lambda calling another? I'd love to see it!
I bet you could achieve this effect by either temporarily decreasing provisioned throughput, or inserting clientside middleware that occasionally refuses to do a DynamoDB op.
But it means you have to write the middleware and have it be it's own source of bugs, which is still better than nothing, but another risk.
Anyone else heard the same?
Someone might hate python just because.
During re:Invent 2015 there was talk about supporting a windows runtime environment.
Arbitrary Executables = Arbitrary Linux Executables.
Since Python3 runs on linux, you can invoke python3 scripts.
Although actually, I see what you mean now. D'oh. Windows was so far out of my mind I wasn't even thinking of it as a distinction to be made.
But yes in 2016 everyone needs day 1 Python 3 support. Python 2 support is optional.
Of the top 360 libraries, only 332 of them support Python 3 (but almost all of them still support Python 2).
This analysis was done in March and expected even support for Python 3 and 2 around now:
So maybe next year it will be optional, but not yet.
So yes, it seems to me that from the standpoint of a hosting company like Amazon for example, Python 2 should be the option for them to host, should not be guaranteed, and maybe even shouldn't cost the same as Python 3 moving forward. That's the present of the language. You are asking them to support the past of the language. Perhaps you should be willing to pay for them to support the past of the language.
(Of course, this particular example is moot because it sounds like Amazon themselves have engineers in the "Python 2 or Die" clade, but it's the example at hand.)
The more people that say "no python 2" the sooner it will die.
I don't know, python2vs3 thing is weird. Bet someone has a good book on it.
I can't image rolling out a new python hosting product in 2016 and not even covering python3 ?!?!?
I'm not sure how this applies to Lambda, but we've found a workable solution for Service Fabric: the services and actors are almost nothing more than a thin host for the actual implementation (or are DI'd into the implementation) - which is defined in a completely separate package/assembly.
there are very few times where you are offline and would like to run a test suite, but it's much easier to handle those than try to fake the service AWS provides
When the events eventually came in and fired an event, the logs reflected the time, hours before, when it should have happened.
Sadly, I don't have a support contract so I couldn't get any help. The forums just assumed I was doing something wrong, until the outage which was linked to dynamo IIRC.
We moved on from lambda as well.
To what, might I ask?
I love lambda, and if I had a budget and a contract, I'm sure Amazon would have solved the issue. However, I think it was merely an underlying problem. I believe Lambda uses Dynamodb streams under the hood.
We've been running APIs on it for six months with no issues, and are now in the process of moving the entire backend from heroku to Lambda. So far, no major issues.
Documentation is there, just not in the most sensible places, and the whole pipeline is optimised for Java processing (eg Velocity VTL for API Gateway transformations allows people to do everything they need, as long they know the Java collections API executed below).
another thing is that although Api GW can behave like a web server for most things, some things are more restricted than with a fully programmable server. for example, you need to declare all HTTP response codes upfront, so that the pipeline can be configured. There can be only one success code (so eg returning 204 when there is no content and 200 with content from the same endpoint isn't trivial).
the third thing, that several commenters mentioned on this page, is that if you want to use API Gateway, processing binary data requires S3 or something else for storage. We currently let people convert files by using ApiGW + Lambda to get a signed URL for S3, then post to S3, another Lambda converts it into PDF and saves back to S3, and the client polls S3 to pick up the result. It's fantastically scalable with that design, much better than posting a file to heroku and then getting a synchronous response, but it takes a bit of rearranging the code to work.
To top it off there's no way to report non-UI bugs unless you are paying $$$.
API Gateway seems to have been pressed into the role of proxying Lambda without necessarily being fit for purpose. If I understand correctly, API Gateway is a third-party acquisition (which is why its API behaves quite different from most AWS APIs) originally intended to translate and proxy between JSON and XML APIs.
The API Gateway has the concept of "custom authorizers" in which another lambda function gets called to authorize the request before it gets passed to the real lambda function (WHY!!!).
The API Gateway feels more like "enterprise" software than anything else I've used in AWS (not in a good way).
I hope the AWS team and jeffbarr are listening.
That's not necessarily true. The AWS forums are pretty active, and I was actually able to get some help/feedback from engineers themselves including promises to look at improvements. This is also a good example of how limiting API Gateway is - https://forums.aws.amazon.com/thread.jspa?threadID=228067
I wish that it started with the verbs rather than the resources. Set up basic behavior for GET, POST, PUT, &c. and apply it to all resources.
Lambda are unpredictable which is probably its biggest downfall. You can get Super fast deployment and execution one second, the next you're getting random execution failures out of your control.
Lambda often feels like its unsupported by AWS. It took them over a year to support the latest version of node.
Java perf is terrible and support should either be dropped or fixed. Go really should be supported out of the box.
Responses from lambda are not flexible.
Deployment of lambda is frustrating, and the inability to execute a lambda from SQS is even more frustrating.
Could go on.
But my personal experience says Lambda is ready for prime time. We use it in production. ~15 million API calls per day. Mostly user facing HTTP requests. Even with the rough edges I would prefer to use it for any new web development project. It feels like at least an order of magnitude reduction in risk/complexity of scaling and deploying code. Its not "zero" but that is huge for me and for my team. We spend more time shipping.
We wrote one: https://github.com/bustlelabs/shep but there are plenty of others mentioned in the comments.
Isn't that horrifically expensive?
I have one hard example I can share. We had a node service that was running on ec2 and cost ~$2500/mo. Moved the code directly over to lambda. Now ~$400/mo.
Quantifying other costs is a bit harder but do you have a DevOps person on your team? Or multiple people? How much do they get paid?
$3.50(cost per million calls)*15 = $52.50
Lambda is pretty cheap.
For example a 256 memory function running for 300ms being called 15,000,000 times would cost 21.77.
All together that's $74 a day for just lambda and API gateway without any extras (cache, bandwidth pricing etc).
Maybe more expensive that raw infrastructure but it's a pretty inconsequential amount of money per day for close to no ops.
Practically any smaller instance type (i.e. m3.medium) can handle this small of a load all by itself, without even breaking a sweat.. and instead of paying $74 per day, it would cost less than $74 per month.
In fact, ELB + an ASG of three t2.micro's running continuously would cost around $49 per month, not per day, and possibly around the same amount of effort (or less) to create/maintain/manage.
It's somewhat apples and oranges, but there's no doubt that lambda is expensive compared to plain old EC2, and that cost disparity increases linearly with scale.
But you have api management to sort out and versions to solve which api gateway can do fairly easily.
API gateway is connected to Cloudfront for low latency.
You can simply add a cache for your API.
You have analytics already setup up and ready to go.
Also other things like API keys, auth and cognito integration with other integrations etc that API gateway has.
You can deploy and maintain tens of lambda functions fairly easy, to get something similar you would either have to use some container service like ECS or Kubernetes so have to figure them out compared to just deploying your code with one of the frameworks out there for lambda.
I'm not looking to put down Lambda, although it could maybe be a bit cheaper; we use EC2/ELB/ASG extensively with Userify but we might use Lambda for eventing-based services in the future. Evaluating each on its own merits will probably give you the best picture of what's right for your project and team.
This approach allows to avoid a lot of the mentioned Amazon API Gateway hassle.
RAML (http://raml.org/) seems to be also around the corner. (https://github.com/awslabs/aws-apigateway-importer)
"API First" is in general quite promising.
But we spent the better of 4 weeks figuring all of this out and automating it. Once automated it's pretty brilliant.
It's not just the automation of getting a lambda function and api gateway working together (though that's a royal pain). It's building the tools to develop and test locally (which we've also done).
The service we created to automate everything is called Joule (it's not ready for prime time but you can kick the tires here; https://joule.run - it supports node, python would be easy if we ever get around to it). Docs are here https://joule.run/docs/quickstart
Anyways, the point is that it's possible and pretty amazing once you start deploying microservices using Lambda.
DDNS using Lambda and Route53 and Joule - https://medium.com/@jmathai/create-a-serverless-dynamic-dns-...
Group Text Message Channel using Lambda and Twilio and Joule - https://medium.com/@jmathai/create-a-group-text-channel-in-u...
Sources on Github for the above Joules
Edit: Here's a Joule that takes an area code, looks up the city name by parsing Google search results and using that to get a creative commons photo from 500px.
Source (ugly but functional) - https://github.com/jmathai/area-code-500px/blob/master/src/i...
1) Documentation is bad, but not insurmountable. There's enough usage of these platforms now that you'll get pretty far searching for and adapting open source code.
2) Error handling is fine once your code is running, but getting execution there (and the response out again) can be painful.
3) Once sufficiently automated, all these woes go away.
This automation could be done with a framework, however I was skeptical of giving something like Apex or serverless access to my AWS account. Instead I've hand-written terraform for all of my deployment. The documentation isn't great, but there are enough examples out there now to make it possible to glue something working together. I started with this project and wrote a bunch of bash and terraform templates to make it extensible.
Speaking in another way, AWS lambda is really good for coordinating/dispatching tasks based on events happened to S3, sns, dynamodb etc.
Overall, after I work with lambda for a while, I think that I am not a big fun of using AWS lambda for writing any services which hold business logics.
I've used lambda for supporting a static website's need for storing and e-mailing forms, and collecting analytics events. It is very limited compared to what this framework provides, but I already see how helpful it could be.
The first point revolves around AWS not providing tools they provide building blocks. To me this is true basically of all AWS, not just Lambda. If you come into AWS without understanding that they are building blocks meant to be used together, you will experience pain and suffering. If you want EC2 to be a virtual hosting service, for example. You are expected to use RDS if you want to store data, not put it on an EC2 instance. You are better off using their blocks together rather than trying to roll your own.
The second thing about the documentation seemed like a non-starter... I took him up on the challenge of finding out how to pass arguments to a python lambda, so I typed in "aws lambda python", which google suggested adding "example" to. The first hit showed "If you are passing in this JSON, your function will look like this".
Obviously, this guy had a hard time trying to deploy a Python lambda, and I haven't tried to do the same myself, but it felt a bit off base to me.
Heaps of people run their own datastores on AWS, this doesn't feel like a great example.
The "Programming Model (Python)" section addresses the OPs complaints about documentation.
It makes it trivial to tail, filter and range all the logs in a log group.
That plus some structured application logging that is lambda func, and request aware and includes backtraces and you can make sense of what's going on.
Here's the setup I'm hoping to leverage lambda for: light workers. I have kue.js/redis for submitting jobs to, and creating workers. My subscribed worker listeners will simple trigger lambda.invoke with json packages (no need to call or support http endpoints etc). No need for api gateway either.
I'm starting with apex.run as a deployment tool and writing/running all tests locally. I assumed local testing is doable by hitting the same func exports with mocked inputs - this could be off.
See any big hurdles with that usage plan?
As an aside, I've got a backlogged task to explore serverless, and their moving away from cloudformation shouldn't be an issue (assume terraform?)
This sounds like a useful metaphor, but I'm unable to fully comprehend it because I suck at thermodynamics and information theory. Could you please elaborate a bit and include some examples of high-entropy names?
(I'm asking because naming is hell, and I'll need to name three products very soon).
tl;dr If you use names that are similar to other names, you will convey less information every time you use such names. So pick a name which doesn't require additional bits to disambiguate.
Note that English conveys roughly 1 bit of information per word.
I think that's per character, not word.
You could estimate this by doing a search.
This tool will be opensource in the following weeks, but at the moment is in closed beta. I'm looking for people interested in giving it a look. If you are interested, drop me an email to me[at]jorgebastida.com and I'll invite you to the repo.
We haven't had any recorded incidents of lost data. We did have an issue after a security upgrade where older accounts did need to be migrated with new access tokens. If you lost any data drop me an email and I'd be glad to find it.
The internal datastore is mostly for development purposes. We generally recommend persisting data to an external database like DynamoDB.
If it would make you more inclined to try the platform, I'm glad to adjust the tone of the copy. I get easily excited!
I wonder why that is...?
I can suggest to give it a try to tj project apex.run, it saved me a lot of time.
Also error and debugging are difficult but remember there is an ec2 instance at the end behind lambda. I just mocked the lambda function and debugged on a real server.
Overall, Java is stable slow all the time under the same configuration, so, you might wanna to use higher config for Java to get stable and performance.
You can redirect to a file the Lambda function writes out. But that sucks.
AWS Lambda is the perfect use case for something like dynamic image sizing. Except if you use it for that you'll force all your users to do a redirect when fetching images. No easy way to clean up when you do it that way either.
I guess this depends on your setup, but I don't see why you would have to do this. Lambda takes the image in, resizes them and uploads the result to S3.
If you use predictable S3 paths, your clients can just look those up. Of course, there are things you'll need to watch out for, but no redirects needed.
Any other webserver is simply get image off S3->Resize it->Send raw binary data to the user (and optionally cache the resized image if you think it'll be requested in that size again).
The Lambda flow is as follows. Get image off S3->Resize it->upload to S3 and then redirect the user to that image via a cloudfront URL.
Unnecessary steps caused by the inability of Lambda get binary data out through the API Gateway. Particularly from the users point of view. You double the number of requests they have to make to fetch images.
Lambda is great for running code in response to events (like SNS messages or S3 file uploads), but I really don't think it should be used for something like handling web requests. Just because you _can_ do something with a million steps and configuring 50 AWS services together, doesn't mean you _should_.
You're far better of sending out an SNS or an SQS triggered by an S3 object creation event.
AWS in general is not well documented. Well-written drivel mostly.
My guess is that there's little to no feedback between documentation writers & developers actually trying to use that documentation to achieve real outcomes. Same could be said about many of the AWS UIs.
Is there any service out there that does this? Basically I'm willing to pay $1 to rent a c4.4xlarge instance for just 1 minute. Keep in that the hourly rate of c4.4xlarge is only $0.621 on US East, so I'm willing to pay a huge premium here.
You could then have your worker lambdas being triggered off the SNS publication and doing the work.
Joyent's Triton is pretty interesting for running one off docker containers though last time I tried it took a variable number of seconds to start one.
AWS Lambda is only going to get better as even Google has come up with their implementation called Google Cloud Function https://cloud.google.com/functions/docs/
It's more than ready for prime time! It's an awesome tool.
However, I suspect they don't do that because it's build to be fast on a streamed event from another of their service, and that if you want that feature, you're supposed to implement it yourself by dumping each event to S3 or something... but it should really be there as something you can turn on.
My only two complaints, python 2 and code deployment. From reading the responses it sounds like there may be options for both.
Lambda seems best suited for data processing tasks, one off 'cron' style jobs and synchronous request/response tasks that don't execute frequently
Data processing (i.e. pulling events from Kinesis, responding to S3 events) seems like the perfect use case for Lambda, we have thousands of Lambda invocations a minute working against that and it works fine.