Hacker News new | past | comments | ask | show | jobs | submit login
Multi-Datacenter Cassandra on 32 Raspberry Pi’s (datastax.com)
276 points by zzzqqq on Aug 21, 2014 | hide | past | web | favorite | 55 comments

I'd be worried about just switching RPi's off. We recently got a Pi for the office to run as a dashboard - and after a couple of power cuts it corrupted the SD card.

Now I'm going to have to set up the system again, and I don't know whether this is going to happen again. The SD card that got corrupted was a Class 4 Kingston.

Maybe I'll look into a Sandisk (possibly Class 10?) next time. But I am worried that it's not the SD card's fault, but rather a combination of a journaling filesystem, an SD card and a sudden power outage.

Edited: Apologies, I realized now that the red button cuts power to the network switch, not to each individual Pi. But my concerns about the Pi and power cuts still remain though.

Raspberry Pi's are very susceptible to SD card corruption. And it's hard to fix: http://raspberrypi.stackexchange.com/questions/7978/how-can-...

I had one device that just killed all SD cards on power up. I still don't fully understand how this is even possible.

I learned this the hard way with my RasPbx system. One or 2 power failures and database is hosed..solution is ups or even crude backup battery pack.

I found it very useful to snapshot SD cards frequently.

It's super easy to revert back to and an easy way to support less technical people.

SD cards are much like SSDs - they are a combination of NAND flash and an embedded controller. Upon power-up, the controller has to initialise by reading the block mapping tables (BMT) from the NAND. This wouldn't be a concern, and it wasn't, in the days of large-geometry/high-endurance flash where the controller would power up and just sit idle waiting for commands after initialisation. Any power interruption would just reset the controller and it'd try again.

However, newer higher-capacity (and smaller process size) flash is far less reliable - endurance and retention are orders of magnitude less, while raw bit error rates are correspondingly higher. MLC makes this even worse, but manufacturers have been masking the problems by using stronger error correction. This strategy mostly works, but combined with another characteristic of dense NAND flash -read disturb - makes for memory devices that are far more fragile and sensitive to power interruptions than before. Read disturb means that repetitive reading of the same blocks in flash has a writing effect to adjacent bits to the ones being read, so even read operations are somewhat destructive. What this means is that the block management controller may have to perform a copy (i.e. a write) and erase after a certain number of reads. Furthermore, blocks which have been idle for a long time also need to have their contents periodically refreshed, since the data slowly fades away as the electrons leak out.

All these characteristics together mean that at any one time, even if the SD card is "idle" or only being read, block erase/program operations maybe occurring internally. If a power interruption happens, then depending on what was being written at the time, anything from silent data corruption (if the block was storing user data) to a complete failure of the card (if the block was part of the BMT or other management data) can occur.

Most applications of SD cards don't often cut power abruptly, which is why this problem doesn't occur. The RPi is an exception. If you want to reduce this problem as much as possible, my recommendation is to use older, low-capacity SD cards, which may contain large-geometry SLC flash. This is not going to cheap (per capacity), but will be cheaper than new "industrial grade" cards (which may actually be worse). I've had good luck with cards from relatively unknown Chinese/Taiwanese OEMs - many of them explicitly specify "100K program/erase cycles", something that the "consumer" brands don't even mention.

Is there such thing as a modern high-reliability SD card, resistant to power cuts and tough environmental conditions? It seems like there would be a market for that in embedded industrial or military devices.

They don't do transaction systems on these cards?

That's like "Writing to a Consumer File System 101". Yike.

I'd say that they certainly try to, but the block-erase and page-program granularity of NAND flash makes it pretty much impossible to guarantee any atomicity of operations without ridiculously wasting the capacity. Furthermore, pages within a block can only be programmed sequentially to avoid corrupting previously programmed pages (this is known as "program disturb"), and each page can only be programmed once, also to avoid program disturb effects.

Interesting. I'm surprised that NAND flash chips don't have some kind of hardware assist for this (e.g., really small pages for handling the much smaller checkpoint writes). But I guess at the level of utterly fungible consumer hardware it's about cost rather than reliability, since the latter doesn't have any absolute metrics.

Read-only filesystem on the SD card for boot, and a read-write USB thumb drive for everything else. It's been a while since I looked, but from memory this avoids corruption on power-off.

In my experience it doesn't eliminate the corruption problem, but it definitely mitigates it. I've had far fewer problems since doing this.

You've had sd card corruption with a system that doesn't write to the sd card at all? That doesn't sound good...

userbinator explained how this is possible above: https://news.ycombinator.com/item?id=8210453

There are some changes you can make to reduce the chances of SD card corruption. That being said, I've run plenty of Pi's in production over the past year, and I've only had 1 SD card corruption issue. That issue was caused by a customer unplugging the pi over and over trying to fix what turned out to be a network problem.

The easiest change, if you're not really worried about reading the logs in case of power failure is to move /var/log (plus a few other directories normally written to like /var/log var/tmp etc) to memory instead of on the SD. Also disable swap. That way it's less likely there is a write going on when the power is pulled.

Another thing to look at, is making the entire card read only, and setting up a temporary directory in memory that's periodically backed up somewhere remote.

Maybe that's why they don't do that. Big Red Button "actuates a power relay to cut AC power to the network switch"

Make sure to cleanly shut the Pi down with "sudo shutdown -h now" every time. I'm yet to corrupt a Pi SD card and I have 4 of them which I've abused in a lot of different ways.

The real trouble area is the SD card reader in my experience. The pins are very easy to break off accidentally. The new B+ model uses a microSD card so it's not nearly as troublesome.

The pin issue can be somewhat mitigated with a case. If there are corridors for the SD card to slide down before entering the reader, it makes it much more difficult to contact the pins at bad angles.

This definitely worked for me anyway. Plus the case keeps the SD card from moving almost all while in position (which means you cannot suffer corruption from shaking it out of the reader, only power loss or similar).

The "fix" for inexpensive hardware is to create a card image once the system is setup. Leave a freshly imaged card taped to the Rpi, and when it corrupts the SD card, swap cards, take the corrupted card and re-image it and tape it to the RPi.

An alternative is to boot the RPi diskless, this works but since everything is going through the USB bus it gets even slower than it normally is, which can make it unsuitable for an application.

You actually can't PXE boot the RPi. At best, someone might get a port of U-Boot working with the RPi and that would be what's on the SD card. I was hoping to test out bare-metal provisioning systems using RPi's but no love ther.

Correct, you need to craft an NFS u-boot which can boot read-only from the SD card and bring the system up. (http://billforums.station51.net/viewtopic.php?f=1&t=17 is one such example, I found one on Robert Nelon's pages but can't find that one at the moment)

>Leave a freshly imaged card taped to the Rpi, and when it corrupts the SD card, swap cards, take the corrupted card and re-image it and tape it to the RPi.

That's exactly what I've done.

I periodically backup a sqlite db file to S3, and I wrote a script that will retrieve the latest backup on boot. Just plug in the new card and everything is back to the way it was minus at most 3 hours of data.

I talked to the authors, they ran into the same issue. They just kill the network instead.

I wonder if there's a hardware fix. A familiar way to make a system more tolerant of blackout is to detect and use the grace period between when incoming power goes down and the internal power supply loses regulation. When this condition is detected, it triggers a process to gracefully square things away. Noncritical peripherals, such as graphics display, are allowed to simply capsize.

Does the Pi already monitor its supply voltages? If not, something could be hung on the GPIO to monitor +5 V. More elaborate circuitry can be used to extend the duration of the grace period if needed. (Diodes and capacitors, nothing exotic).

It's possible to boot a Raspberry Pi with a read-only filesystem, though I don't know if they've set that up.

It would be nice if Raspberry Pi distros came with an easy way to do that.

You could buy a deep cycle battery and a charge controller and run several raspberry pi off of the battery for several days if there was a power outage at a pretty reasonable price (~120$)

I've used a USB portable power brick for charging phones, that can run a Raspberry Pi for a few hours which avoids minor power outages.

I had a similar issue with my cubox... not cool at all... if I get around to setting it up again, will definitely backup an image/copy.

I just received one of these through the post:


Lets you use a battery pack with a handful of AA batteries as a UPS. You can even detect when one power source disappears and then safely shutdown. Amongst various other useful power related things.

I was just working on the RPi reliability problem myself. There are way more solid small embedded ARM systems out there, like this one:


Yikes, approximately the Pi's specs at 3 times the cost. Might make sense for a business, but not for me. :(

That's nowhere near Pi's specs. Although the FPGA is interesting. Pi got two orders of magnitude higher FLOPS.

Some integer code might run at comparable speed.

Yup, we don't need fast floating point - this just acting as an RMQ provider for some analog sensors.

What we do need are boards with much better manufacturing QA/QC than Raspberry Pi. After the nth time the USB 5V falls out, or the SD reader loses contact, you rapidly realize they're not targeted towards a production environment. As inexpensive and powerful as possible is a great goal, but you invariably lose some reliability ("pick two").

The only SD cards I've had that have suddenly and catastrophically become completely unreadable are Sandisk class 10 cards, so don't assume they'll do any better (I've owned too few SD cards to tell if this was just bad luck or a problem with the model).

Not the answer you're looking for, but if your dashboard config is under configuration management (Ansible, Puppet), it'll be much easier to recreate.

Or even just back up the entire card to an image somewhere.

I had the same thing happen to me, but not just on Pi - but also two beaglebones black.

I bought these also to run dashboards in Kiosk mode - but they were too underpowered to drive my StackDriver graphs. :(

SD Cards can have issues when you have many read/writes per minute.

However, my RPi that sits at home, is attached to a UPS and doesn't seem to have any issues related to SD Card or stability.

Yea, I agree. Sounds fishy.

Is there a video we can see? Hitting the button im imagining the circles showing some kind of re-sync animation?

Well, if everything works it will be a very boring video, nothing will happen.

Well I'm assuming when half the pis turn off half of them will go red.

Yea a video would be great!

Pretty cool. Would like to see it working (video/timelapse/gif)?

Also, any reason for not making the big red button randomly select a "datacenter" to take offline?

Idea: transition this into a 3 or 4 datacenter cluster.

I noticed the mention of FIRST and at the same time, noticed the red/blue color choice. I'm sure its just a coincidence, but still entertaining. Project looks awesome.

Demoing the multicluster setup and simulating the failure to various people was the hardest part for me. This will help so much. A video will be nice.

How is the circle of lights set up? What does it show?

The two rings in the middle are from OpsCenter, the admin UI that they provide. The lights on the outside are from the LED on the Pis I believe.

If you hit the button, I believe one of the rings would turn red but the cluster would still be able to function.

Picture of the back of that wall !!

I see rpis everywhere... Are they self-replicating?

and here is the link to the "high res" picture of the setup 4000x2000: http://www.datastax.com/wp-content/uploads/2014/08/cluster_c...

Wicked! I want to see a video.

They need a custom designed enclosure with pretty lights.


They say it was difficult to get a high performance DB running on a 700mhz chip with 512Mb of ram. Perhaps its just the wording but that sounds like the opposite of high performance to me.

i think they meant to highlight the high resource requirements.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact