Over 2PB per cluster, thousands of clusters, but only 100's of PB of data. What ...

bluedino · on Oct 9, 2022

Facebook database guru Mark Callaghan posits that Apple’s Cassandra workload likely relates more to iMessage than iTunes, but whatever the project, it’s massive… and it’s not uncommon.

https://www.techrepublic.com/article/apples-secret-nosql-sau...

candiddevmike · on Oct 8, 2022

I thought Cassandra was bad at storing big files?

magnawave · on Oct 8, 2022

Correct, that’s what object stores are for. However, metadata on said files, is probably very handy to have in a database.

I’m quite sure not all this Cassandra capacity is just file/photo metadata storage either.

riku_iki · on Oct 8, 2022

you can easily chunk big files.

brianwawok · on Oct 8, 2022

It’s a horrible use of the tech. It’s like storing jpgs in postgres. You can do it, it’s just a really bad idea.

peytoncasper · on Oct 9, 2022

Depends heavily on the use case. I know multiple video related companies that chunked data for replays into 1s and 500ms segments.

Having many keys, means that you can preform thousands of asynchronous requests for small pieces of data and then piece it together on the client side.

Super low latency is just something else to optimize for.

alexott · on Oct 9, 2022

But then you need to push these segments into partitions, and big partitions are really bad, especially for old versions of Cassandra… Although I met customers with partitions of size of 100Gbs…

peytoncasper · on Oct 9, 2022

As another commenter said, 500ms clip is super tiny. Each key is a a separate partition.

So each segment is a separate partition.

riku_iki · on Oct 9, 2022

chunks will be evenly distributed between many partitions, no need to store chunks in one partition.

riku_iki · on Oct 8, 2022

it depends, storing files in cassandra and jpgs in postgress allows you to not add another software with bugs and maintenance burden.

If it works - everything is fine.

alexott · on Oct 9, 2022

It may work, until you start to perform maintenance things, like repair, or new node bootstrap, etc. Then it may fail with high probability

akho · on Oct 9, 2022

100’s of PB seems low for iCloud. iMessage, probably.

pmcf · on Oct 9, 2022

On the slide shown, it says "1000s of applications."

xvector · on Oct 8, 2022

They use it for storing iCloud Photos without E2EE while heavily marketing privacy

cassonmars · on Oct 8, 2022

They were moving towards E2EE when everyone freaked out about the on-device perceptual hashing trade off.

AmericanChopper · on Oct 8, 2022

Well… you don’t actually need to make a computing device automatically report its owner to the authorities for a serious crime based on a provably flawed automated process, prior to implementing encryption E2EE for a cloud storage service. That was simply the strategy that Apple chose to pursue. Blaming the users for reacting poorly to this strictly anti-user approach is very backwards.

spacedcowboy · on Oct 8, 2022

That wasn't what Apple was proposing though, they were quite clear [1] that it would be a human making the call to notify authorities, and only after a scoring algorithm had passed a threshold set pretty high (expected to be a trillion-to-one against false positives).

[1] https://web.archive.org/web/20211210163051/https:/www.apple....

AmericanChopper · on Oct 8, 2022

Yes, which isn’t a massive privacy concern and is of course not also susceptible false positives. Oh, wait a minute, it is…

https://www.nytimes.com/2022/08/21/technology/google-surveil...

Anytime this feature does anything, in every case, it will be acting against the interests of the user. There is no requirement, legal or otherwise, to implement this feature in order to enable E2EE for a cloud storage service. To act as if there is is simple gaslighting.

spacedcowboy · on Oct 9, 2022

shrug Every, and I mean every cloud provider is currently scanning every image uploaded to it.

If every single provider is doing that, then I'm going to think there's a good reason for that, and maybe (just maybe) there's something behind it. Some reason for it.

But I'm not here to change your mind - believe what you wish.

AmericanChopper · on Oct 9, 2022

The claim that I was replying to was that the reaction to CSAM scanning was the reason Apple didn’t implement additional E2EE for iCloud storage. When in fact CSAM scanning was neither a legal nor technical requirement for expanding E2EE in iCloud.

Your question as to why so many companies gravitate towards the same anti-user patterns is both unrelated and pointless.

sneak · on Oct 8, 2022

Or they could just deploy e2e without turning our devices into things that spy on us. It’s a false dichotomy.

spacedcowboy · on Oct 8, 2022

Premise [1]: Anything uploaded to the cloud will be scanned for kiddy porn.

Corollary: There are precisely two places that this data can be scanned

1) On the cloud servers. Anything and everything uploaded can be pushed through a scanner and anything that matches is flagged and sent off to a human to (ugh!) verify before being sent on to 3-letter agencies.

2) Within the privacy of your own device, only on things that are uploaded. Anything that "hits" is flagged and the same process (ugh!) as above is followed.

This is not a false dichotomy. Once you accept that the scanning will happen (and it does), then it either happens at the source or at the destination. Right now everyone offering a cloud service scans at the destination (the cloud servers themselves), and everything is scanned. It is not possible to have e2e if the server can read the data to scan it - this ought to be obvious.

Apple was offering to not do that scanning in their domain, but to trust that the device could do it in your own private domain, and that could have led to no further requirement for the server-side to be able to read the data (to do the scan). Which could have led to a fully end-to-end encrypted service for data, while still helping prevent ugly crimes.

Users chose option (1), that is: scan everything uploaded on the server all the time and deny the ability for end-to-end encryption to occur.

This is why we can't have nice things.

-------------

[1]: This isn't quite a legal requirement, but every cloud service does it because the lawyers won't issue advice to CYA (cover your arse as the service provider) if you don't do it. To mount a successful defence against being sued, you need to make an effort to detect, seems to be the legal opinion.

majou · on Oct 9, 2022

It's thoughtful, but clearly deputizing your own property against you is not a premise people are comfortable with.

spacedcowboy · on Oct 9, 2022

Yup. That was my lament.

sneak · on Oct 9, 2022

Nah, if it is end to end encrypted, it is rightfully opaque to the service owner. Who would sue them, and why?

spacedcowboy · on Oct 9, 2022

I’m not a lawyer. My wife is, but she’s my lawyer. Get your own damn lawyer ;)

However, I could see:

- Bad person A is convicted of kiddy porn, as part of a plea deal, he gives up his sources etc - Turns out A has been sending stuff to B via iCloud - B does something nasty to C’s kid and gets caught - C sues Apple (who has money and certainly doesn’t want to be defending this in court) for making no attempt to stop this from happening to C’s kid, or worse assigns culpability due to being the medium of transport.

Would this have merit ? Probably not, but it’s not something Apple want splashed all over the interwebs. The court that matters is public opinion, in this instance, and mega-corp vs parents-of-abused-kid doesn’t play well whatever the merits of the case.

So Apple (and everyone else) scan, in part for self-interest, and also because I’m sure people at Apple/whoever have kids too, and have just as visceral a reaction as other people when confronted with hard evidence that this shit really happens. It’s easy to play the “think of the children is all bollocks” card - it’s harder when there’s a real abused kid that is front and center.

novok · on Oct 9, 2022

We send letters and parcels all the time within the USA and they are not inspected either. People also do horrible things in cars too, we don't systematically 'inspect' the content of every single car that drives on it past a bridge toll or similar. It feels rather flimsy IMO potential lawsuits without specific laws making this a liability that this would be the reason, like SESTA / FOSTA did.

Once precedent shows that apple or anyone else will just never have that info because they deliver things in the equivalent of opaque letters, legal precedent of previous court cases will make these happen less and less, if at all.

If I were apple, I would rather not have the responsibility of inspecting people's contents if I was a medium of transport, because it prevents an entire duty of care aspect that would pop up. You prevent more lawsuits by being E2EE IMO.

You see this avoidance behavior within medicine with malpractice lawsuits, where doctors would rather patients not speculatively test so they don't create duty of care issues, and where they outsource some kinds of testing to other firms, so the 'duty of care / malpractice' aspect that pops up in case they missed something goes to the firm instead of them. There is a big 'avoid seeing things if you don't have to' energy in a lot of medicine, and it comes from malpractice anxiety.

So there are things forcing apple to go this way IMO, I don't think they would do this by default.

sneak · on Oct 11, 2022

My operating theory is that Apple is being coerced by federal regulators not to release e2e software without backdoors like this.

This is clearly unconstitutional and illustrates that the US and China are pretty similar when it comes to human rights policy.

MBCook · on Oct 8, 2022

Yup. The knee-jerk privacy reaction cost us privacy.

Gigachad · on Oct 8, 2022

I don’t think it’s fair to say we need to accept either options. Yes the crime they are trying to stop is horrific and something must be done, but that doesn’t justify unlimited technological spyware.

And the scope for abuse is so large. People in the UK are getting arrested for retweeting mean memes, it’s pretty easy to imagine Google and Apple added offensive images to their scanning and you get arrested for saving something that goes against the current political agenda.

As well as the case where google locked the account of a parent who had taken photos to send to a medical expert.

MBCook · on Oct 8, 2022

I have no interest in relitigating the saga or the recent Google incident. HN did that for months. I was simply agreeing with the irony.

We had no privacy before, it was offered, people freaked out, we have no privacy today.

smoldesu · on Oct 8, 2022

They should have wheeled out a better marketing spiel than "trust us ;)" then.