Hacker News new | past | comments | ask | show | jobs | submit login
Serverless Map/Reduce (tothestars.io)
259 points by emilong on Nov 4, 2016 | hide | past | web | favorite | 152 comments

This is a screenshot of my google search from 2 days ago:


I've been using Lambda quite a bit, I think it's SO amazingly useful. Tasks that are highly parallelized and CPU intensive can literally be infinitely scaled out. I find it weird that their poster child use case is still always a reactive event like watching S3 and formatting images. There are so many use cases for directly invoking a lambda directly from your code.

Imagine a case where you had to parse a million documents with a relatively expensive computation, let's say 250MS per document. Maybe you have a solid machine with a few cores that's running your server, but even then you can't have the server cpu locked for so long, so naturally you'd need some sort of worker server set up. With a good machine and multiple cores, maybe you get 8 running at once. With a lambda, you can forego the worker server altogether. Just invoke a million lambdas directly from your application server, completely parallelized.

Theoretically, you've taken something that would take 70 hours and had it run in 250ms without having to set up any additional infrastructure.

I just restructured a job to be parallel and run on Lambda and I couldn't be happier. Glad to see this idea is gaining mindshare. I found that it was easy to test, easy to debug, easy to maintain, not very expensive, and just kinda worked.

The only caveat for those considering taking this path - AWS has an account-wide limit of 100 concurrent lambdas running at once. There's no way to know how many are currently running unless you keep track of it yourself. You'll only find out when you try to kick off a job and it get's throttled. I haven't contacted support yet to find out how hard it is for the limit to get raised or what they'll raise it to.

We had it raised to several thousand concurrent requests with no questions asked. YMMV based on age of account, amount of spend, etc.

Yeah, most of the limits are there just to prevent "whoops, we fired up 500 $2/hour servers and left them running all month, can you refund us?" sort of newbie situations. Very few are technological and AWS has been happy to bump them up for me in each occasion I've had a need.

I've had plenty of cases where it's taken time, and required escalation and justification to get even relatively modest instance limits raised, depending on the data centre and instance types.

It's not technological - it's about ensuring they have a reasonable handle on capacity and what they happen to have plenty of.

It's not a problem as long as you're not asking for astronomical limits, but people need to be aware of it and request increases in advance of starting to hit limits rather than assume it'll be near immediate to get them lifted.

Some are, some aren't. For example, my experience has been that it's harder to get the 100 S3 bucket-per-account limit raised.

That used to be a hard technological limitation but they fixed it a few months back. You should be able to get more now (but you have to have a good justification for it).

100 seems an odd number to be a technological limitation, doesn't it? It's not a power of two or anything so it's not the size of an integer. How could technology have been limiting it?

S3 bucket names are globally scoped - ie. they are literally scarce resources.

Technology in the sense that it was a hard coded variable, but it was picked arbitrarily.

What would be a good justification? Is there anything you can do with 101 buckets that you can't do with folders one bucket?

Requester-pays billing (http://docs.aws.amazon.com/AmazonS3/latest/dev/BucketBilling...) for more than a hundred clients, static sites served through CloudFront, etc.

IIRC that was one of the few the AWS docs specifically called out as unraisable.

I once wanted to raise the limit of g2 instances from 2 to 100 in us-east-1 and the reply from the customer agent was he first "has to check if that does not affect overall region stability" WTF Anyway, it was raised a couple of days later.

Very useful, thanks!

raising the limit is a pretty easy request to aws support.

So google compute setup I did a while back with preemptible instances + a celery queue + some autoscale based on load...

The guts to make all that work was 50 or so lines of config. I think my auto scale script was 20 lines or so of Python.

I guess the biggest downside was spinning up the new server took about 2 minutes, so for big load spikes it took a bit for it to level out... but with GCE per minute billing, all the napkin math I did says it is a fair bit cheaper per cpu unit to do it this way.

In conclusion, someone good with devops + a normal work queue system could do this years ago. I guess it's cool to lower the barrier to entry, but not it does not seem like a game changer. It totally IS cool to scale out backend work to a job farm. Just seems like not the only way to do it.

That's essentially what Google Cloud Dataproc gives you (managed Hadoop/Spark):

- Per-minute billing

- 0-to-cluster in under 90 seconds (aim for 30 seconds)

- Pre-emptiblem VMs

- Custom VMs

Now you start with a job, pay a 30 second penalty, and execute it on an entirely ephemeral cluster. The "get a cluster and fill it with jobs and round up to an hour" model is indeed outdated IMHO.

(work at Google Cloud)

Well it depends on my workload. The setup above on GCE had a stable load of X thousand jobs per minute, then burst loads up to 100x for short times. So for me it made sense to have a 24/7 celery cluster for the base load and add and remove nodes for the variable node. There was never a point shutting down the cluster made sense.

" I find it weird that their poster child use case is still always a reactive event like watching S3 and formatting images."

Because those are the things that need to be done in the real world, 99% of the time.

Sadly, we don't all get to work on cool stuff :)

But I think as more people get access to the power and find uses cases, it will get used more.

The 'next' step for Lambda is serverless hosting for web-sites, something which Lambda is 'almost but not quite' ready for, the limiting factor being transactional latency, especially in some weird cases. Once they have that sorted out, I will never want to see a server again :)

Except for the fact Lambda is limited to 100 concurrent executions.

This isn't a fact. The default limit is 100 concurrent per region.

Also a simple conversation with AWS increases the limit[0].

They obviously don't want people to write absurd runaway structures and 100 is an arbitrarily chosen value to guard against that.

[0]: https://aws.amazon.com/lambda/faqs/

>Theoretically, you've taken something that would take 70 hours and had it run in 250ms without having to set up any additional infrastructure.

And you've spent the cost of building out that 8 server infrastructure in one batch.

One of the nice things about lambda is the billing is super granular - you get billed at 100ms intervals.

Assuming 70 hours at $0.000000834/100ms [1]

The whole job costs $2.10.

[1] - https://aws.amazon.com/lambda/pricing/

By comparison (since I was curious), 70 hours on an m3.medium spot instance will run around $0.70. On an on-demand, it's about $5.39. EMR will cost you about $7.00 on top of the EC2 costs.

If you can peg the CPU and don't mind getting interrupted, spot instances are still a fair bit cheaper. But Lambda looks pretty attractive for any other use-case where the statelessness of Lambda doesn't bite you.

Keep in mind that with EC2 you're billed hourly so the fastest a 70 cpu-hour job could finish on m3.medium for $0.70 is 1 hour, and that's ignoring setup time, etc.

Meanwhile, on Lambda, you can actually run 1600 60s jobs (or 27 CPU-hours) in 3 minutes. This is inclusive of setup time, job submission, stragglers, etc. [1]

Of course, if you've got sustained load, it's cheaper to go with spot instances, but the "occasionally I need a buttload of compute," model is well-served by Lambda.

[1] http://ericjonas.com/pywren.html

As a note for people, if your constraints are a bit different then these are some services to check out:

Joyent Manta: https://www.joyent.com/manta

Hyper: http://hyper.sh

Possibly Joyent Triton: https://www.joyent.com/triton

I personally often want to run a bunch of things for ~1-15 minutes, and have too much data or setup to fit neatly in a lambda function. However, I don't need 1000 things running simultaneously, although manta would help still there.

I'd love to see some more layers over the top of services like this, hopefully someday getting us back to picloud. I miss that service.

How secure are docker hosts like hyper.sh? I've always been skeptical, the multitenant docker security story hasn't been very encouraging, or has that changed?

hyper.sh containers are kernel-isolated like virtual machines.

70 * 60 * 60 * 10 * $0.000000834 = $2.1

As the other poster who replied to you showed, it's actually incredibly cheap. Additionally, it's perfectly elastic. That expensive hardware you provision or buy for your server that can handle this task (at a much, MUCH slower rate) would cost you significantly more. In addition to the cost of the server and running it, you also have a much harder job developing it. If we're talking about multicore, you're now managing the job concurrently on a single machine, and need the code to facilitate it.

I also feel like your comment is implying that the outcomes of the lambda option vs the worker server are comparable. As mentioned, the worker machine will still run in hours, rather than less than a second.

The cost to execute a 250ms Lambda on the most expensive tier 1M times is about $7.70. If your data is in S3 the cost of 1M GETs at the most expensive is $0.40. You don't pay for transfer between S3 and Lambda.

if it's one of you will not be able to get the limit if you do this even at a rate of 1 req/sec you will be paying close to 1.4 million USD per month just for compute.

Author of the figures used in the blog post here. We wrote https://github.com/ericmjonas/pywren somewhat on a lark, because it seemed to fit well with our research goals and it's fun to push systems to their limit. I'm now a total serverless convert! I'd love more collaborators and feedback, the goal is to make these sorts of computations as easy as possible for python developers, especially on the scientific computing side of things.

OT: I teach computational methods and even as much as I dislike teaching/conflating it with web dev, I have included "let's build a web app" because students like building and deploying a thing, and because Heroku has a free tier.

I've considered the possibility of having students do things on AWS (beyond web dev), including Lambda, and just expensing the costs. It seems feasible to quickly set up every student with controlled access via IAM...but is there a way to set up rate-limiting, ideally through a policy? That is, shut an IAM down if a student accidentally invokes a million processes? Or, for that matter, limiting the storage capacity of a S3 bucket?

I would just set up a new account for each student, have them use their own billing info, have them use the free tier, teach them how to set up billing alerts, and let them go to town. They're going to need to learn to take cost into account when working at a real job with AWS so this is the best way to teach them to take accountability.

I'm not sure its reasonable to expect your students to have a credit card.

Depends on the age group. Anyway most could get a secured credit card if needed.

Yeah that's the thing. Most students do have a credit card. But I've had a few who are very much against it, for financial or privacy reasons. For me, it's not worth compelling them to change their ways (which I'm highly sympathetic to) for what could amount to as little as $5 of AWS costs.


Google cloud gives free edu grants.

I am taking a cloud computing class. We each use our own account and have registered for some free educational aws credits that come with the student developer pack. I think teachers can also request credits as well since we get an extra $40 or so from the prof. Whenever a student accidentally leaves an m4xlarge on for a week, which happens occasionally, he just calls up amazon support, explains what happened, and usually they'll fix it up if it doesn't happen too often.

If you go serverless though, it's probably not going to be an issue. I would be amazed if a student used more than $10 doing anything with lambda.

Check this and the github education pack out: https://aws.amazon.com/education/awseducate/

Oh, very nice! I hadn't heard that AWS had a program. I've used Github's education pack, once as an educator to get a free org with 20 private accounts, and once where I asked students to signup. The former situation was too cumbersome (I had to create a repo for each student, and students constantly confused their own personal account with their org account), and in the latter, only a few students out of 20 got approved by the time the course ended -- I think someone told me the process was backlogged.

I'm willing just to expense things on my account for the sake of simplicity but will see if AWS has a more streamlined process for class/per-student approval. Thanks!

AWS has a free tier for a year

It becomes a problem when you're a student and people expect you to use AWS's free tier for different projects in two different years.

You can easily set up multiple accounts with same credit card and they will all be eligible for free tier. I use gmails email+alias@gmail.com feature for this.

I wonder if something like AWS Lambda could be applied to multiplayer games? It seems like game-loop based games would be a good domain for such a programming model. The entire game could be expressed as a function that turns tick N into tick N+1. Such a function would be composed of many other functions, of course. So for example, there would also be a function that took as an argument the player at time N and gave the player at time N+1.

Such a model would allow infrastructure developers to abstract away most of the concerns around networking, collisions, security, etc., and let game developers concentrate their efforts on simply making the game.

I currently have a game server cluster written in Golang, where the locations are instantiated with an idempotent request operation. It doesn't matter if a particular location-instance exists at a particular moment. It's sufficient for the "master control" server to only approximately know the loads of the different cluster server processes. My experience leads me to believe that something like AWS Lambda, but optimized for implementing game loops would work well, so long as game developers could get their heads around pure functional programming and implement with soft real-time requirements in mind. (John Carmack already advocates the use of pure functions, and game devs in general already do the latter.)


This was exactly the demo they gave and their AWS Dev Day in SF in July. Using Lambda and the IOT gateway to run a massively multiplayer game (they had everyone in the room play from the web via their phone).

But was that just a demo, or are they going to make a product out of that?

I mean it was a demo of a product they created using Lambda and the IOT framework. It was a working game, it just wasn't very fun. :)

The code was on GitHub but I can't seem to find it.

What is the latency overhead in AWS Lambda ?

Too large for me to want to make a game in it. But if you made a specialized version that had an actual game loop underneath it, there's some potential there.

Latency is around 80ms-200ms per invocation from what I've seen. Definitely not suitable for any real-time game.

How does it compare to 3 years old Joyent's Manta ? AFAIK it was especially designed for this kind of purposes. The processing is made directly on the servers storing the data..

Manta is pretty similar to Elastic MapReduce, which also runs the computation on the same node as the data. So it compares pretty much the same as EMR.

The article counts characters in documents stored on S3 - which makes sense since S3 is great for storing documents and can handle unlimited concurrency, priced per usage.

But what's the solution for structured data? DynamoDB is the obvious main candidate, but it's billed by hour and high concurrency is very expensive, requiring complicated temporary increases and decreases of concurrency that are hard to predict.

Is there a good solution for running massively parallel lamdas on stuctured data?

If you're doing any sort of table scan op then DDB perf/cost will be less than stellar. If you have an index / range key it works really (like really) well -- even in massively parallel situations.

If you're dealing with a TON (5+ TB) of data I recommend heading in to RDS, BQ, or redshift.

It's less the total size of the data I'm worried about and more the concurrency. For example, say I had a process that retrieved 1000 tiny records (using index query) and ran some cpu-intensive calculation on them, and I wanted to run 1000 of those processes simultaneously to reduce into a final result. This would require tuning dynamo to thousands of concurrent reads (and maybe writes, depending on the process), then scaling it back down after the operation because it is very costly and priced by hour. This makes it complicated and expensive on dynamo.

It seems the only storage services compatible with variable unlimited bursts of concurrency are S3 and SimpleDB. S3 comes with many problems for handling structured data (no update of records only replace, locking, listing items is slow/costly, etc.). SimpleDB is no longer being iterated, is limited to 10gb per domain, and looks like it's being slowly phased out.

It seems like massively parallel lambdas depend on few fetches of large blobs of data - which is basically batch-processing EMR-style, or better suited to redshift. Not something that opens the door for novel use-cases.

I would have really liked for dynamodb to be more of a service than a vm. I wish its concurrency was unlimited and you paid for usage rather than time. Basically DynamoDB with SimpleDB pricing.

Just use RDS and S^3 for the blobs. RDS can do tens of thousands of index lookups a second.

If you only need one index, then just name your s3 document by the compound index value and call it a day.

Otherwise, just use RDS for everything.

From the RDS FAQ:

> In order to maximize your workload’s throughput on Amazon Aurora, we recommend building your applications to drive a large number of concurrent queries.

Perrrrfect. Thanks!


I do not agree with term serverless. Amazon Lambda is a service, therefore there is a server involved.

It's like saying deathless meat, because someone else killed the animal you are consuming.

Fight the good fight, you're not the only one standing up against this ridiculous moniker. Never back down. Keep calling out the bullshit every time and everywhere this term is promoted. People ask "what's the big deal?" well the big deal is ,calling a service "serverless" is both a lie and misleading for marketing purposes.

Like when people say that integration tests are unit tests. That gets me going every single time (a unit test is a test on a unit of code, e.g: a function or method).

If you have seen "the interview", this is like when the guy says "on the line", and gets corrected each time.

The name "serverless" will and is likely doing everything I'm sure it was intended to do... provoke curiosity, signal "this is different than just another PaaS", sound cool and maybe polarize or incite those who get angered over technically less-than-accurate names for frameworks. I'm both surprised and not surprised that HN comments have been so relentlessly focused on this seemingly trivial matter. I can understand that the engineer/scientific mindset would likely take issue with this name but I think the "I'm a human that's grown up in an age of omnipresent marketing-fu." part of you needs to realize that all ideas survive and die on the act of finding a place in your brain- a hook so that next time you discuss with someone their need to scale infinitely/instantly without provisioning containers or an IoT device that only needs logic run in the cloud every 5mins you'll be more likely to remember and suggest: "Well there is this framework with a terribly misleading and inaccurate name that may help you called serverless."

It's not a lie, and it's not misleading.

The concept of 'server' loses all meaning in this given architecture.

You create 'Lambdas' - units of functionality - and they do that they are supposed to do entirely independent of the underlying architecture.

In fact, using the concept of 'server' in a Lambda situation probably obfuscates the situation and adds unnecessary complexity.

A 'server' is an implementation detail that concerns only those providing the container/Lambda services. As long as the implementation lives up to the SLA (i.e. performance, uptime, security, price, redundancy) that you have agreed to - then it doesn't matter how it works.

All of that is running on a server, connecting to a server, etc.. If this is "serverless", then any random cpanel host has been doing "serverless" php/mysql for 20 years.

So it is, in fact, both a lie and misleading.

I think a better term would be FAAS (Functions as a service), since that's what it is. Lambda looks kinda neat as far as tech goes however.

The term 'Lambda' has a basis in Computer Science as a type of 'anonymous function' - in which case the name is reasonably well suited.

'Serverless Computing' as a term to describe the paradigm I think is very neutral, descriptive and apt. It's even a little bit boring but it does the job.

How is it apt and descriptive, and also not blatently wrong and misleading, if the name explicitly suggests that servers are executing code "without a server"?

From the customers perspective -> there is no server.

The primary difference between and Lambda architecture and a 'serverless architecture' is that the app-maker does not spend any effort, though, or concern in managing servers.

That there may or may not be servers under the hood is irrelevant.

By that logic, linode, shared webhosts, and facebook are all serverless.


Nobody ever used a social network and had to manage servers to do it, or ever referred to servers, nor where they implied.

Using a 'social network' has nothing to do with servers.

Managing back end infrastructure is fundamentally a 'server oriented' paradigm.

'Electric cars' are a perfectly apt name to refer to cars that are not gas/cylinder engines, just as 'steel roofing' is perfectly apt to differentiate from the standard tar based products at least in consumer roofing (in cold climes).

Salesforce's mantra for a long time was 'no software' - which was reasonable because IT departments didn't have to install and manage software, although obviously salesforce themselves use 'software'.

'Serverless computing' is a perfectly fine name, 99% are fine with it, they know what it means, and it has nothing to do with Amazon's marketing initiatives. In fact - Amazon is way behind in positioning it - they are not even properly catering to hosting websites using the service, they are still mostly focused on IT folks.

> Nobody ever used a social network and had to manage servers to do it, or ever referred to servers, nor where they implied. > Using a 'social network' has nothing to do with servers.

So when I goto facebook, instead of a server, I connect to a ______?

> 'Electric cars' are a perfectly apt name to refer to cars that are not gas/cylinder engines, just as 'steel roofing' is perfectly apt to differentiate from the standard tar based products at least in consumer roofing (in cold climes).

Electric cars are called so for semantically useful and intuitive reasons: They are electric instead of gasoline. Steel roofing is called so for semantically useful and intuitive reasons: its a roof made from steel instead of tar. That seems incoherent as a defense to using them term "serverless" to refer to something that is literally a several buildings full of servers.

Yes, I also think that's a much more accurate term that better represents the concept.

Great. We now know what color to paint the bike shed. Hoorah, progress!

Yep, that's how progress is made, one step at a time.

As seen in: human history, evolution, etc.

Exactly. The cloud is just someone else's computer.

It's a genuine business case, as running a lean and secure server farm is no child's play, but it's also oversold so much that my eyes hurts.

Actual serverless computing would be implemented as p2p. Possibly interesting in a typical EvilCorpesque large corporation.

Besides, Seti@home (Boinc) have been doing serverless computing for almost 20 years and it wasn't the first implementations.

Achieving real-world service objectives via cloud infrastructure requires a different approach to architecture, data management, process design, service ownership, availability, cost management, the list goes on.

Some uses of cloud services are superficially comparable to old-school bureau computing. But to say it is "just someone else's computer" is trite, grossly misleading and downright bad advice, because treating it as such will get you burned.

I think we are on the same page here. How about - it's nothing more to it than someone else computer. With all the risks and benefits.

If you have the right workload, it is super awesome.

If you have a reasonably matching workload and you can spend spend time on all the things you counted, it's also probably good but it's far from a clear-cut certainty.

If you have heap of legacy systems you will basically need to rewrite a lot it to make it work, and then it might not be worth it at all. This is the case that is oversold.

P2P would be using peer computers as servers, so it wouldn't be serverless by your definition.

That would be computerless computing - seems to a harder problem to solve... P2P was about using non-dedicated computers. It's like the antithesis of the currently marked cloud solutions isn't it?

With AWS API gateway + lambda + dynamodb, I maintain exactly zero servers. Zero physical servers. Zero server OSes. Zero sysadmin.

It's perfectly reasonable IMO for that to be called "serverless".

With my car dealer, I maintain exactly zero factories. Zero physical factories. Zero factory workers.

It's perfectly reasonable IMO for that to be called "factory-less car".

If originally you had to precure time at a factory to produce your car and now you can go to a dealer and buy a car, I would call that new way factory-less — the factory has been taken out of your personal car buying equation.

So I am not paying for the factory infrastructure, workers, etc. when I buy my car? wow! that's great. We've came a long way with these factory-less cars!

Why is everyone here so upset about the word "serverless"? It's exactly what it says: You aren't responsible for maintaining any OS or server. That's it.

We are upset because we're being told that it is serverless, then there are clearly servers anyways.

No need to dig into the weeds or wax poetic on the term, people call it serverless because you, the user, don't have to manage any servers personally when you use the "serverless" application.

It sounds like the dislike comes from folks here who just want to be contrary is all.

So by that definition, facebook, linode, shared webhosts, GCE, etc.. are serverless. AirBnB is homeless. Uber is automobileless.

Ok, maybe I'm not being contrarian, maybe I just like semantically useful and intuitive technical terms instead of blatantly wrong and misleading ones? Maybe the problem is that I care about my field of expertise and don't want it to be further watered down by the perpetual sloppiness of "who cares, it's just a name"?

I feel like I'm repeating myself, but serverless means you don't care about servers when running your code At all. That's it. Your examples above don't fit into that at all, so they aren't really applicable.

You submit your job, function, whatever and it...just runs. You don't have to worry about servers, hence the name. Not sure why everyone feels like arguing over a simple concept, but alas.

> but serverless means you don't care about servers when running your code At all

We already have semantically useful terms/naming conventions for that. Shared Host, Cloud hosted, $LAYER-as-a-Service, etc.. No need to use a blatantly misleading term instead.

With my supermarket, each time I buy a chicken I maintain zero farms. Zero physical farms. Zero farmers.

It's perfectly reasonable IMO for that to be called "farmless chickens".

It's just a term. Don't worry about it.

Don't worry about roman numbers! It's just a representation.

Don't worry about them. Just keep doing your math in roman numbers. It's fine.

It bothers me that some people don't like my roman numbers. They work! Look!!

I + I = II


There's absolutely no way of making this better through representation! Anyone who says otherwise is a bike shedder!

In fact, also don't worry about words either. Instead of serverless, let's call them catgiraffetablechaircthulhu. It doesn't matter!

You're being downvoted but I feel your pain.

I have over 1000 HN karma, I have some karma to spare.

Note: not that it is less convenient. Not saying that. Just going against the terminology.

If anything, this model can provide a more straightforward costing model. e.g: x transactions = x lambda calls = x cost

Also saves you the work of setting up an autoscaling infrastructure.

I've always had one big question about Lambda. Is it really worth the cost you get for the convienience of it?

Is anyone using it in production that can comment?

I think you can do the math yourself - the costs are published.

FYI - we did some experiments and the limiting factor was latency. 250-300ms on average, you have to go through their API feature as well, and that's part of the delay. But worse - Lamda's that have not been called for several minutes (I'm assuming they are not 'hot') often take several seconds, up to 5s to be called. So it creates a problem for intermittent traffic.

If that kind of latency is acceptable to you, it might work for you so long as the cost equation is right.

I think some other people had issues with versioning, it's a problem we didn't go far enough to observe.

Quick clarification: No need to go through API Gateway if you don't actually need the https endpoint - all AWS SDKs can hit Lambda's REST APIs directly, which also reduces p50 latency.

I'm not using it for anything like this, but I am using it for production. I scrape CloudWatch metrics into our standard time series metrics repo. Total cost for every service I care about (ELB, ASG, RDS, ElasticSearch, SNS, and SQS) has been about a dollar for the last 3 months.

Saw the presentation last week at ServerlessConf in London and it really looks very promising. The cost behind this solution is what will really make me check this out :)

P.S. Quoting the author: "As you can see for these queries, the reference implementation performs reasonably well; it's nowhere near Redshift performance for the same queries, but for the price it really can't be beat today"

Does anyone have experience building mobile backends in Lambda? I was looking at an API Gateway / Lambda / Amazon RDS stack for building a central data store and was wondering what people's experience with that setup is?

Using a framework like server less or chalice makes it incredibly easy for an MVP

Thanks for the recommendations. Any experience with how these hold up as they scale up?

About the site: quite hard to read - almost white text on white background.

Worse than that, doesn't appear for me at all until I use Readability Mode.

Note: the underlying comparison to other systems is from a 2014 blogpost [1] which suggest they used the m2.4xlarge series of EC2 VMs (which were Nehalem class parts from 2010). Nehalem vs Haswell or Broadwell (the likely parts underlying Lambda) is a pretty big jump.

Disclosure: I work on Google Cloud, but I'm just pointing out a fact ;).

[1] https://amplab.cs.berkeley.edu/benchmark/

Implementation guide for Serverless MapReduce: https://aws.amazon.com/blogs/compute/ad-hoc-big-data-process...

I wonder if Amazon will ever open Lambda up to any Docker image? (I know it's possible to run binaries, but its a bit of a pain to compile with the Amazon AMI, etc.) Being able to have a bunch of `docker run` with any image would be pretty powerful.

Check out Hyper. https://www.hyper.sh/

"With HyperContainer, we make the performance/overhead of virtualized container similar to linux container --- 130ms launch time, and sharing the read-only part of memory (kernel and init)." -gnawux


Yep, and "hyper func" in the roadmap

This has me exceptionally excited!

Yes. First step was https://aws.amazon.com/blogs/aws/new-amazon-linux-container-.... Layering Lambda's image on top of that to assist people building and testing is definitely on our roadmap.

What about running lambda on custom AMIs ? Is that even remotely feasible one day ?

Good to know :-)

I guess the example assumes the data is already somehow in AWS. How is the total cost affected if I wanted to run this setup on a 10TB dataset?

Is there any AWS Lambda equivalent that could be deployed on bare metal?

I've recently come across funktion[1] which is designed to run on kubernetes (which can run on baremetal). Also there is OpenWhisk[2] which is open sourced by IBM. Probably alot more but those are the ones I know off the top of my head.

[1] - https://github.com/fabric8io/funktion.

[2] - https://developer.ibm.com/openwhisk/

No first hand experience with it, but I believe Joyent's Manta system can be, across a cluster of SmartOS machines.


That's a horrible and flat out wrong answer. Docker simply provides a transferable containerized execution environment for processes with a simple push pull workflow.

Lambda is a fully elastic infrastructure you can build, deploy, and execute functions.

Simply having docker-engine and docker-cli installed on my machine doesn't give me anything close to lambda.

If it doesn't run on a server, then what does this plumbing-work run on? Clickbait name?

The term "Serverless" means that you don't manage the servers, neither hardware nor virtual machines. Instead, you use exclusively AWS lambda and/or similar services.

I find that term confusing, too, but it seems to be well-established by now, so we have to live with that.

See also: http://martinfowler.com/articles/serverless.html

Doesn't necessarily have to be through AWS. In fact, the Serverless Framework folks are working hard to make their framework suitable for other serverless service providers.

> Doesn't necessarily have to be through AWS

Please refrain from strawman arguments. I wrote "AWS lambda and/or similar services".

In the Fowler article, you find a nice classification of these services:

- BaaS (Backend as a Service)

- FaaS (Funciton as a Service)

Sorry :P read "exclusively AWS", and stopped reading!

I understand the confusion, but the term "serverless" refers to the developer not having to provision and maintain a server like an EC2 instance themselves.

On AWS' side yes there are servers powering Lambda, but developers do not have worry about them at all. The code just runs.

Oh come on. Is it still not accepted that "serverless" is the colloquial name for AWS Lambda and comparable services? Stop trying to make "FaaS" happen. It's not going to happen.

Just because you abstract something away doesn't mean it no longer exists. I don't manually manage cache on my laptop's HD, but that doesn't mean it's "cacheless".

I'm not saying "serverless" is a good term, I'm saying that it's the term. It's won. You can argue all you want that it's a terribly misleading/incorrect term, but people aren't going to stop using it. So let's move on.

I'm not convinced it's won. Especially since there's still tons of blog posts arguing over the definition

> It's won.

That’s the linguistic descriptivist position.

We can also require all papers and journals and conferences to use "Function as a Service" everywhere, and force all professors to teach "Function as a Service", and require all official publications to use "Function as a Service", by defining an authoritative dictionary, which gets its authority by law.

Then wait a few months, and the term "serverless" will be gone.

Some countries handle their entire language that way – and have an official institution tasked with updating the language every few years, and the updates become mandatory for business communication, press releases, and schools.

Germany and France are some examples.

IMO, having grown up right after one of the largest such changes in recent German history, it’s a better system than letting the mob decide how to call things, or how to write words, because that leads to pure chaos.

Having a unified way of how to spell things is quite different then prescribing what to call specific things. Which luckily outside of very limited areas (legal terms, protected names and trademarks) doesn't exist in your example countries either.

Yes, it does.

We’ve renamed Camouflieren to Tarnen, and many other words like that.

There’s hundreds of cases of words being entire replaced.

A list of replacements of the past centuries, for example: Distanz → Abstand, Liberey → Bücherei, Moment → Augenblick, Passion → Leidenschaft, Projekt → Entwurf, Addresse → Anschrift, Korrespondenz → Briefwechsel, Komödie → Lustspiel, Dialekt → Mundart, Orthographie → Rechtschreibung, Journal → Tagebuch, Autor → Verfasser, Fundament → Grundlage, Antike → Altertum, Parterre → Erdgeschoss, Universität → Hochschule, Terrorismus → Schreckensherrschaft, Singular → Einzahl, Plural → Mehrzahl, poste restante → postlagernd, Coupé → Abteil, Perron → Bahnsteig, Billet → Fahrkarte, Retourbillet → Rückfahrkarte, download → herunterladen.

And even today there are large companies even funding groups working on replacing parts of the language, be it to replace foreign words, or to simplify words: https://www.rossmann.de/unternehmen/soziale-verantwortung/so...

Obviously, the far-right takes it to an extreme level, even replacing Internet with Weltnetz, but even in the left-wing there’s no opposition to replacing words, or simplifying the language.

And even those opposing those changes (see criticism section here https://de.wikipedia.org/wiki/Reform_der_deutschen_Rechtschr... ) don’t oppose these concepts in general, just would rather like to see different changes.

And most of these examples are in the common dictionaries, used in major newspapers, will be accepted in your high-school German tests (as long as you use them appropriately and spell them correctly) and are used or at least easily understood by all native speakers. The others fell out of use over time.

Recommendations by various authorities on what constitutes "good" use of language change, and in Germany there might be more reliance on the big dictionaries (but I honestly can't accurately gauge how this compares to various parts of society in English-speaking countries), but actual language use does not care all that much. The reforms and the dictionary are very relevant for spelling and grammar, but have not much influence on actual selection of words, even less in specialist subjects.

People try to change language use with all kinds of motivations all the time, but they can't do much more than suggest, individuals and organizations decide what they agree to and what they don't. And this exists in other languages just as much (trying to avoid offensive terms, trying to sound modern, trying to remove foreign influences).

Half of the words in above list were invented by one single person.

And that person has pushed all of those words into popular use by cooperating with other authors, writers, newspapers at the time.

So, yes, that stuff is possible.

> but they can't do much more than suggest

Except, the authorative dictionaries have authority in Germany because, per definition, tests in schools have to be graded based on them, and official communication of companies has to be written with them.

If a dictionary says "deprecated", these have to switch.

Which, in turn, has a direct effect only a 13 years later (the maximum time it takes someone to go through school).

Change through collaboration and use is exactly what you argued against: a new term is coined, used first by "experts"/influential people, then goes into widespread use and is codified in dictionaries. Right now we can watch a group of people establishing "serverless" as a word for some kind of PaaS in technical language, as stupid and confusing we might think it is (I personally hate trend and would prefer the word be used for P2P or client-side applications, but I think that ship is sailed). Documentation of expert language will soon pick it up, if it hasn't already.

For purposes other than spelling and some grammar rules, dictionaries are nice suggestions, but even in school (where the Rechtschreibreform actually has legal "power", even if it doesn't anywhere else) a word not being in the Duden didn't mean it didn't exist (and conversely, just because it is in there doesn't mean it's good to use). Professional usage has its own conventions (newspaper styleguides, common terms and ways of writing in scientific disciplines, "PR speak"), even if they make for "worse" language, common usage varies even more. And nothing actually enforces language in all those areas, which make up most of our language use. On the contrary, it's used as input for new iterations of guidelines and dictionaries.

Spelling and grammar has been "designed by committee" and relatively successfully legally enforced, the words used are not. They are influenced by motivated groups, but that's part of the linguistic discription model just as well.

It really has more to do with what the vendors creating the "serverless" tools perpetuating at the industry level. You can blame the technical marketing teams and tech media that also pick up the terminology (whoops, our bad).

Seems like we are getting in the weeds over silly semantics.

When you hear "serverless", just think of it as a service where you don't have to worry about running or maintaining your own OS on a server weather that is virtual or bare metal. That's it, it's as simple as that.

Is this a parody? Prescriptivism in law is completely unthinkable to UK/US native English speakers.

Well, in France, or Germany (where I’m from), I grew up right after such a change had happened.

I’m not sure why English speakers resist it so much, it simplifies the situation so much, and isn’t an issue anyway.

It's a philosophical red line, really. It's simply not something that belongs to the state, or even to the country. It would be like trying to legislate that the sky was green.

Similarly the metric system - which has actual advantages that mere linguistic change does not, and legal support - has been accepted only very grudgingly. ( https://en.wikipedia.org/wiki/Metric_Martyrs )

(Doesn't linguistic reform occasionally cause discontinuities in literature? What happens if you want to use a metaphor or turn of phrase that's been forcibly obsoleted?

Like, I might occasionally refer to something as being "tuppence-ha'penny", despite that coin being abolished over thirty years ago.

> What happens if you want to use a metaphor or turn of phrase that's been forcibly obsoleted?

The same that happens when you try to read any old literature. Like the Illias, which talks about "wine-colored sea" (it was dark blue – the brightness was meant, not hue), or shakespear, etc.

> It's simply not something that belongs to the state, or even to the country.

That likely differs here because the German language was unified in the first place by an academic, and not through consensus.

Sigh, Why isn't just "Cloud"? They thought "cloud" sounded old and lame, so they made up an even dumber name to sound futuristic?

Serverless is definitely the accepted term. FaaS and BaaS (backend as a service) are both serverless implementations--is my understanding.

Call it Function as a Service if you prefer.

I think it is wrong because AWS doesn't provide function. More correct would be Function Caller as a Service

How about "worker queue" ? because it is exactly what is it. Obfuscating the details is silly. There is nothing new about the technology. It is only marketed as new (which is reflected in the pricing).

Even that's tricky, because it's also Backend as a Service BaaS/MBaaS at work, various frameworks and other solutions, etc.

It can power a front end. It's like a micro container as a service.

As long as you don't mind 3s startup latency, or 200ms hot path latency?

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact