Hacker News new | comments | show | ask | jobs | submit login
Serverless Raspberry Pi Cluster with Docker (alexellis.io)
247 points by alexellisuk on Aug 21, 2017 | hide | past | web | favorite | 75 comments



> 'Serverless is an architectural pattern resulting in: Functions as a Service, or FaaS'

Then call it FaaS - it's not serverless and that term is misleading, marketing bunk. Even the title of the post describes the servers used in it's deployment, I'm quite sick of this term - I don't find it at all helpful when describing application architecture.


It is what it is - and I put a paragraph at the start of the blog for folks who are still confused. It explains that Serverless is a pattern and not a literal term. I even wrote a blog post about it - we've done such a poor job in this industry of explaining it. https://blog.alexellis.io/introducing-functions-as-a-service...


Maybe if a better term is used it wouldn't need so much explaining.

To me, "Functions as a Service" is massively more obvious than "Serverless".


And the cloud would be better as a server somewhere on the internet but terms like this get created because it's easier to sell a one word concept.


Yes... can we all start berating every blog post that uses the word "cloud"? There are no clouds in this server! :-D


It is not to me. Isn't a 'Function as a service' just a... service? Why not just call it a multi server "Platform for services".


Exactly, because what can be considered serverless can wildly vary, such as:

Google App Engine, Google Big Query , AWS Lambda, AWS Athena

The name (which is just a marketing term, not technical) just reflects that you don't deal with servers but with services.


I consider it "serverless" similar to how "stainless" steel only lessens, but isn't a full 100% guarantee. That makes me feel slightly better about the term, ha.


Even more so confusing as the term "server" is currently pretty meaningless IMO (exhibit one: tiny rpi board is also a server).


I agree with you, to an extent. By the same criteria, some PaaS services could also be considered serverless, but the term does not apply to them. However, since the term evokes the idea of "less infrastructure to manage" and, consequentially, "lower costs", it works well to draw the attention of executives.


AWS Lambda is "serverless" to customers. You only pay for running the functions, even though Amazon pays for running the underlying architecture which obviously runs on servers.


Well, the difference between that and running on Elasticbeanstalk would be you choose the sizing while Amazon chooses the sizing for you in Lambda. The reason people seem to advocate FaaS is because Lambda can be viewed as a job sent to queue and then runs based on event criteria (time based or event trigger based). Look at Lambada like writing and calling a job from queue (eg celery).

Nothing fancy about serverless.


I disagree; I find it very helpful.

It's not terribly complicated either - if you are only concerned with code, then your deployment is 'serverless' in the sense that _you are not concerned with servers_.

Of course 'servers' are involved; the point is whether that's of concern to your deployment.


Daemon processes (100% software) also get called "servers".

In an old job, in an effort to disambiguate, I would always use "computer" to mean hardware (losing battle).


I'm not sure that distinction is clear. For example, in AWS Lambda, you still have issues like functions being "warm" or not, and people write code specifically to handle that. Meanwhile, nowadays you can acquire and configure even dedicated machines using just code.


If you have to configure a machine, even in code, then you aren't "serverless". To me, serverless means that you do not manage the machines or even have direct access to them.


What's a machine? :) A dedicated server sure is, but a VPS? An Heroku Dyno? What's the line?


> What's a machine?

I would say "machine" means the hardware and/or the operating system of the "server" (which is a fancy word for "computer").

The "and/or" part being very important!


Fair points. I'd say it's as clear as 'middleware'. When you are trying to get to the limits of performance (or just optimizing), then you tend to start thinking about the platform it runs on.

This is the problem with most abstractions in IT I guess - they're beautiful and clean until reality bites.


Wouldn't that definition apply to services like Heroku as well though?


For sure.


> I don't find it at all helpful when describing application architecture.

If you can draw the application architecture without anything that is best labeled a server, wouldn't that make it a "serverless" architecture?


This may be a stupid question, but is there an actual use case for building clusters of raspberry pi's?


Part of me wants to believe that a solar powered raspberry pi cluster has a legitimate use case. The things use like ~1W of power, so the board itself is surprisingly efficient compared to a "real" server.

But really, I think the primary use case is cost. Actually having access to that many physical machines to play with in a classroom or home learning environment is sort of new! The market hasn't really had such accessible linux computers at "Ehh, if it breaks I'll just buy a new one, no big deal" prices. It's educational, and the more stable the ARM support is, the better a student's skills will transfer over into the real world of systems administration.


> The things use like ~1W of power, so the board itself is surprisingly efficient compared to a "real" server.

Try 3.5 watts[1], not counting overhead of most USB power bricks being incredibly inefficient.

A current-gen 35W laptop CPU will be some 10 times faster[2] as a RasPi, have much faster storage available (SATA3 or NVMe versus… USB2), much faster I/O (GBit LAN and GBit Wifi versus… USB2), and a lot of other benefits. (Like an integrated screen and battery and keyboard and …) It also won't need external hardware to communicate with other cluster members – that 10-port ethernet switch will need power, too.

One RasPi is relatively energy efficient; RasPi clusters… not so much.

> But really, I think the primary use case is cost.

Indeed.

[1] http://raspi.tv/2016/how-much-power-does-raspberry-pi3b-use-... , see the numbers for "Multi-threaded CPU Tests", which is the most applicable for server workloads

[2] Running that script manages ~9 runs/second on an i7-6700HQ, vs. ~0.9 run/second on a RPi3.


They're using Pi Zero's in this post, which draw much less than others, between 0.4W and 1.0W, probably safe to assume 0.7W as an average load [1]

And at $5 each, if we're talking hardware costs for setting up a "toy" cluster for, say, self-learning or student labs, that's hard to beat. I suppose you could do better using VMs for a virtual cluster, but that adds other complications unrelated to the clustering task. But I agree there doesn't otherwise seem to be much practical purpose here, and the overhead of running an OS on each Pi really cuts into performance compared to a single chip w/ multicores instead.

[1] https://www.jeffgeerling.com/blogs/jeff-geerling/raspberry-p...


If you had ten of them, ideally you'd use a USB power supply where that overhead is a much lower percentage.


You can probably shave off another .2W by disabling the HDMI and LEDs, but an RPi3 at load will probably be at 4+W from the wall.

At the same time, you're comparing the power consumption of lets say 10 whole RPis/platforms to the consumption of a single processor. Stick that processor in a platform (laptop), and it's going to use much more than 40W.

Like you said, you get a lot more with the laptop, but given your benchmark (10x difference), my guess is that 10x RPis would still be more power efficient than a laptop with a 6700HQ at that specific task.


6700HQ is 45 W TDP?



/Looks at price...

Gets you 11 Pi's ... Gets you only 1 Intel CPU, no memory, motherboard, heatsink, fans.

Reminds me of the Celeron® Processor J3455... 10W rating on Intel there page. On AVERAGE! Then when you see the real power usage under load for MB + CPU + 16GB memory, its actually doing 35W.

Where as the Pi's are doing 3.7W max per piece. So even with 4 pieces to match the performance, your still half the wattage.

If Intel really scaled that good in power vs performance, why are we not seeing x86 phones all the time?


If you do a TCO calc against TDP and performance, you'll find that a cluster of RPIs is lower TCO than a more traditional low power intel solution.


That should be higher tco...


I have a Clusterhat[1] and a Pi Zero cluster in an MPI (Beowulf) configuration.

It's the 3rd beowulf cluster I've ever built, the 2nd being from recycled PowerMacs and the 1st being built with Pentium IIs. It's the most powerful Beowulf I've ever built. It's also the smallest. It fits in my hand and it runs off USB.

Now you know what it is, I'll tell you about what I use it for.

The first problem I used it for was to approximate 1 billion digits of Pi. I started with Monte Carlo methods, but while they scale well they're non-optimal. Eventually I managed to implement a Chudnovsky-type algorithm that worked despite the limitations of the Pi 3 head node and Pi zero nodes.

Most recently I wrote code to explore the Mandelbrot set. Using some custom software I knocked up, I set a start and finish x,y,z,w and h coordinate set and it renders individual frames which are then stitched together with ffmpeg.

I need to rebuild the cluster because I made some booboos with how it was set up, and there's been substantial advances in the HAT configuration. I'm thinking of doing it over christmas.

What I've found works best are:

* Learning about problems * Learning about scaling problems * Learning about scaling problems with solution constraints * Learning about scaling problems with solution constratints over a very long period of time.

As long as you're not in a rush to finish calculations and don't mind picking something up, pecking at it and coming back later (like say, a week or so) the Pi is mostly fine. Although ISTR my final Pi approximation was in the order of minutes to run to a million digits.

I know other people host sites, I just like doing basic maths problems to improve my maths and algorithms knowledge.

[1] - https://clusterhat.com/


It's a cheap way to play with clusters while still using pretty common hardware and operating system choices. In addition the pi is small, the hardware is pretty simple and available, and it has enough flexibility for any basic project.

In terms of any performance benefit? No.


Is that very different from run a dozen containers on a single machine?


Containers come with their own set of expectations, needs, requirements and problems. They can be a useful tool but I suspect the cluster of Pis is a much more accurate model for multi-computer modeling.


Probably better models bandwidth constraints, being able to simulate network splits easily, etc.


Learning how to work with computer clusters is an actual use case I believe.


You can do that on any reasonably powerful desktop and virtualization and not have the hardware costs.


But then you're missing out on a whole class of hardware related problems that you may need to learn.


If you used the gpu maybe, but in reality you are better off using a modern intel or amd cpu. Not only do they have much better performance, but they even have better performance per watt.


If you still like going "serverful", but want an easy method of deploying to your Raspberry Pi using Docker, check out this open source Docker Hub for ARM alternative

https://marina.io/

(Full disclosure: I am one of the authors)


Marina looks very cool and I'll take a look at it soon. Let's get in contact somehow.


Sure, that would be great. Yeah you have our email and other contacts on – https://cloudfleet.io/ or mine directly on https://metakermit.com/


Does anybody really use this FaaS thing? This is the weirdest of the New Things I've seen recently.


Despite appearing like "magic" it's basically smaller micro-services, but with a different packaging, deployment and monitoring model. The entire Alexa skill set is driven from these functions.


Been using AWS Lambda for a few things... it works surprisingly well if you have intermittent processes that can fit in the memory and time constraints... for example, one of the lambdas I worked on is triggered from S3 uploads (CSV for processing from a client), the Lambda will parse the CSV into bundles of JSON objects that are then sent into SQS for processing individual items, which can take up to 2 minutes each item.

You can build pipelines from S3, SQS, SNS, Lambda to do a lot of work very quickly in parallel with less overhead than similar self-hosted or self-managed solutions. You don't have to worry about spinning up extra VMs, or dealing with overprovisioning. It all just works.


Like most new tech, it has it's applicability. Depending on your needs, and the characteristics of your application load, it can save you money.


Could you give an example of a scenario where it can actually save me money?


Chromeless looks promising. Being able to run a load of scrapers in parallel at certain times could be useful if time sensitivity is an issue. E.g someone sends you a batch of URLs to screenshot and each site takes a while to render. You could use lambdas to run all those processes in parallel instead of wasting money keeping capacity lying around just for those spikes.


I suspect it makes economic sense only when your workload is relatively elastic, having a relatively low duty cycle. The ability to pay less when you aren't actually using any resources is likely of more economic benefit as it becomes more fine-grained. If you aren't in that position, other models of lease or ownership of computing resources probably merit consideration.


Well, to be honest I pay 40€/month for a baremetal server (i7 SkyLake, 64 GB RAM, 4TB HDD). It's a powerful machine running many services, including a few virtual machines. I consider whatever it does would qualify as low-duty. Now, I know that each month I'm paying 40€ for all this. The last time about I read about serverless was when someone directly discovered the money saving part is quite tricky: https://news.ycombinator.com/item?id=14982220.


When the cost of a Dev Ops or Sys Admin is more than the service.


That's a general answer on how you can save by using the cloud, I specifically meant the "serverless" variant.

(Apart from that, I think this is a misconception - Amazon seems to have convinced people you don't need a sysadmin anymore, whereas in fact once you start exploring the whole AWS infrastructure and its complexity, you quickly realize you still need sysadmin's knowledge plus understanding of how their services work and all their quirks.)


I understand that with AWS, you need a sysadmin. That is why I was saying with serverless (in this case, AWS Lambda) you didn't need a sysadmin. I assume it's the same with Google Functions and whatever Azure has.


But you will need other things... you need some kind of fronting system to tie things together, you need some sort of DB/Storage. And while you can get away with fewer admins, someone will be spending part of their time in a sysadmin role. It's more a matter of how much can get done with how many admins.


I get the point you are trying to make, but as AWS offers many services, I don't need a sysadmin for a DB (RDS, Dynamodb, ElasticSearch, S3) or for APIGateway nor for any coordination between systems (SNS, SES)

True serverless let's me offload that cost to AWS instead of having a sysadmin


Who maintains the database, schema, updates? Application deployments, testing, qa, updates? There's someone doing the job, even if it's fewer people, or someone with multiple hats.


> database, schema, updates

Whoever created them

> Application deployments

Build pipeline

> testing, qa

QA / Customer Support

So, still no sysadmin.

Not saying it has to be this way, just saying originally that serverless can save you money.

In fact, we are in the process of moving all our APIs over to AWS Lambda w/ ES and it's going to save us 25-50% of our EC2 costs.

We might be able too do the same since we are re-writing in another language, but without AWS Lambda, we would have never gotten that shot.


This is super awesome and fun but personally I had to migrate my micro datacenter from pis to nucs.

The "armhf tax" is that you tend to have to build your own images for stuff :( Then you need your own build infra (or "heath robinson" qemu builds) because pis run out of memory building a lot of stuff... but mainly if C++ is involved so ymmv.

That said, I got a rack of 8 pis doing nothing right now, so...

(unrelated http://www.bitscope.com/product/BB04/ is handy if you want to rack a lotta pis, not affliated...)

There is probably a micro business for someone running a slick docker build system for armhf handling the qemu emulation or toolchain dirtiness "under the hood" in the cloud somewhere, on x86-64 boxes with a lot more than 1GB of RAM.


You could try running a Raspberry but add a Network Block device from another machine and. (xNBD: https://bitbucket.org/hirofuchi/xnbd/wiki/Home )

Then export a RAM disk from that machine and add the NBD disk as swap on the Raspberry. It would be slow, but builds would complete. Then you'd need only one low-to-moderate power machine (a PC presumably) in your Raspberry cluster, just with lots of RAM in the one PC.


Scaleway has a good array of ARM cloud offerings: https://www.scaleway.com/armv8-cloud-servers/

Could this work for speeding up builds of ARM images and then deploying locally?


Packet.net can go one further - take those 8 cores and upgrade to 96 cores and 120GB RAM.. it's ARMv8 which will be next for OpenFaaS when the Docker support catches up :-)


Can you elaborate on your NUC setup? Which NUC did you choose?

I have a similar RPI rack collecting dust for the same reason, hence the question.


I've built ABS (archlinux) packages by having some swap space on my pogoplug mobile's boot drive (1TB USB). It's a slog, but stuff will finish sooner or later, and by that I mean later or really late.

Now that I'm thinking about it, I'd like to see if going from the RPI's USB3->SATA adapter->M.2 adapter->16GB of Optane ($40+tax locally) would work, and if it did work (a big if), what performance is like.

Edit - Scratch that, I just remembered the Pi 3 is still USB 2.


Serverless meme means that all I got is a chrooted directory with only systemd and /lib but without signals, similar to an apache vhost with a cgi-bin, so I could run my full-stack crap and it is supposed to be so damn cool because actual server maintenance is now someone else's problem?


great article, been a huge fan of your stuff in the past. Helped me get some good ideas of things to use my Pi and Clusterhat with. https://clusterhat.com


So this blog instructs to install commercial docker instead of moby. I wonder how the licensing goes if you were to use this.


I believe it actually installs the Docker community edition (CE) which is under the Apache 2.0 license. Docker (the product) is assembled using the Moby libraries and components.


Correct, Moby is not a distribution of a container runtime.. it's not the "docker" you are looking for. Docker CE is and that's what's used in the guide.


Full Disclosure: I, Manik Taneja, am the Product Manager for all the open source efforts at Docker and work on the Moby Project.

As suggested here, the blog post only talks about installing the Docker Community Edition (CE) that is published under Apache 2.0 License. This is the official incarnation of the Docker Product and is provided to have:

- a consistent user experience across different linux distributions - strong security guarantees - regular bug fixes and updates

Moby Project serves as the upstream for the entire Docker Product and includes all open source components that make up Docker, such as runc, containerd, notary, moby, infrakit, linuxkit, libnetwork, hyperkit, vpnkit, datakit, etc.


I imagine the licensing goes something like this:

https://www.docker.com/components-licenses

It's my understanding that the bulk of the components that make Docker work are fully open source, and some extra support and deployment related things are the only things that are commercially licensed. This would include Docker Swarm, as it's a part of Docker itself and not something separate. IANAL though.

I'm pretty sure the reason they go for plain Docker over Moby is sheer ease of use. Despite being a bit weird to understand under the hood, Docker is just dead simple to get up and running with, and using the clustering mode that's built right in is easier to teach new readers than Moby, which from its Github page is obviously designed for folks that are already rather comfortable with Docker. Straight from Moby's Github page:

"Moby is NOT recommended for: Application developers looking for an easy way to run their applications in containers. We recommend Docker CE instead."

https://github.com/moby/moby


That's wrong - this is the free / open-source Docker version. How did you get that impression?


I'm surprised that the raspberry pi monoculture prevailed even until today.




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: