> Each file is broken into chunks and encrypted by iCloud using AES-128 and a key derived from each chunk’s contents that utilizes SHA-256. The keys and the file’s metadata are stored by Apple in the user’s iCloud account. The encrypted chunks of the file are stored, without any user-identifying information, using third-party storage services, such as S3 and Google Cloud Platform.
From what I’ve heard, Apple’s services run on their own cloud platform called Pie . It sounds like this platform probably abstracts away whatever storage service is used, allowing Apple to use whatever fits their requirements.
 - https://9to5mac.com/2016/10/06/report-unified-cloud-services...
this is exactly the strategy described at the Google Cloud Summit 2017 relating to how security is managed for their platform ... so it lines up
I think this is just coincidence though. Apple most likely performs their own layer of encryption on top of what Google already does.
They protect against two different attack scenarios: Google doesn't want plaintext data to show up on disks. Apple doesn't want plaintext data to be accessible by anyone at Google (or any other 3rd party provider).
Convergent encryption is a fairly standard way storing encrypted data that might duplicate across users. It allows for deduping of data while in an encrypted state.
That means that the provider always knows if a file is shared across multiple users (even if they don't know what the file is) and given a cleartext file they can always check if somebody has it stored on the service. It's not ideal at all if you want good privacy.
SpiderOak is explicitly supposed not to do that to avoid these issues, however I refuse to recommend them until they finally decide to release an open source client and 2FA support so caveat emptor.
Anyway, in this case it's irrelevant because Apple has access to the keys anyway, it's only supposed to prevent the third party (Google Cloud in this case) from having access to the files.
 https://spideroak.com/resources/encryption-white-paper in the "Data Deduplication" section.
This is particularly space saving if a particular file type was chunked cleverly so that static parts of the file's structure were stored away from dynamic parts (Microsoft's Office XML formats for example could definitely be split this way).
Hetzner charges less than half of that for their storage solution, and that's without de-duplication.
Does decreasing the security of their users & not offering end-to-end encryption save them even more money? Sure, but I don't see how it couldn't be profitable without it, seeing as you can easily buy non-de-duplicated cloud storage for way less from other providers.
Don't forget that the price includes VAT and other taxes. General infrastructure costs, including traffic. Redundancy. Backups. I don't think they're turning much of a profit, but offer it because it makes their platform more attractive.
I have a 2TB family plan that I'm paying $10/mo for, a total of 135GB is used. Most of that usage is from iCloud Photo Library, and while I could use the 200GB plan I'd rather just pay the extra to have effectively unlimited space and not worry about needing to upgrade the plan, especially as I record video in 4K by default now.
I'm sure I'm not the only person who does this, $10/mo is probably nothing to the vast majority of people on HN compared to the hassle of worrying about whether we have enough storage or not.
HEVC/H.265 introduced in iOS 11 largely makes shooting 4K on an iOS device cost _roughly_ the same in storage terms as 1080p/H.264 did, so recording in 4K doesn't really move the storage needle here like it used to either, assuming of course you have an iPhone 7 or newer.
Same boat 100%... Family sharing, 2TB plan, have my wife and mother in the family. Combined we use less than 200GB, but I am happy to pay for more storage rather than deal with tech support calls from them two when we go over 200GB and they get error messages popping up. :P
I do also pay for Office 365 though, $100/yr is significantly cheaper than purchasing Office for every computer in my house.
Apple’s $10 is absolutely not a good deal.
Personally, I'm using seafile on a server, it's cheaper than iCloud or Dropbox, with more storage, and more custom functionality. Many NAS nowadays support Nextcloud out of the box.
You don’t have to dumb down things unnecessarily. If a person can buy a smart home device, plug it in and configure it, or an Apple TV, or a computer, then they can also get one of the simpler NAS, and sync with that.
Geographical redundancy and low-latency data transport are nontrivial factors in pricing cloud storage.
(Which is kind of a shame because a truly bulletproof, user-friendly NAS might be an interesting product. Who is going to be the Apple to Synology's Microsoft?)
It would have to be very, very large to justify the claim that it turns a previously non-cost-effective service into a cost-effective one, especially as prices have been dropping for some time and would hypothetically have simply not dropped as fast if this was the case.
* Offer to charge less usage if a duplicate block is detected, but allow users to pay full price for privately-salted storage. This is similar in principle to how Data Saver works on your mobile device (opt-in to allow a man-in-the-middle to compress or downsample your data)
a) Once you know the chunk size, you can determine based on pricing whether another customer has that data, which can have huge privacy implications.
E.g. let's say we both work at the same company and get the same salary statement PDF aside from the dollar number & your name (which I know).
I can simply brute-force craft a file that changes that number around and upload it to iCloud, when I stop paying for storage I know I've cracked what's on your drive.
In any case, I'd be surprised if Apple's not already leaking this information due to caching in a way that could be revealed via timing attacks.
b) It'll lead to hugely erratic pricing for consumers. E.g. let's say you download 100TB of movies from BitTorrent, now you pay almost nothing for it, but if everyone else deletes their copies pricing for you will go up.
Apple could mitigate that by never raising the price on a given chunk, but that just leaves them paying for it, and it's easily abused. Open two accounts, upload the same data, then delete it from one account, pay 1/2 for storage.
Not completely, I would think. Google may be able to discover whether any user stores file F in iCloud by creating an iCloud account for themselves and uploading the file to it. If that doesn’t create a new file, it already was there before. Depending on what exactly Apple stores on iCloud, they may even be able to detect how many users or (unlikely) even which users store the file.
I don’t see how they could use it, and don’t think they would use it, but Google also has a large data set of email messages and attachments sent between iCloud and gmail accounts that they could somehow use to correlate activity between their gmail servers and the “iCloud on Google cloud” servers.
I assume some other people do this as well.
The devil is in the details of how each provider offers vendor-specific things and you want to take advantage of them. For example Reduced-Redundancy storage
is iirc an Amazon-specific offering, or if others offer it, it's probably under different SLA terms/measurements. This rapidly breaks many generic abstractions; maybe this is why everyone ends up writing their own little shim layer for their situation.
In some sense it reminds me a bit of building database-connection-pools in the 90's, before they were really standardized everyone rolled their own and learned all the awful lessons about reference counting along the way. Then along came ODBC, then JDBC, and things were so much easier because you only had to deal with one API, and the databases would conform to their side of it. So I think, isn't that what OpenStack (or something?) is supposed to be for cloud services? But whoa, the depth and complexity of these services far exceeds that of a 90's database. It will take a while -- but over time and with patterns of common use well established, a stable base of standard APIs will abstract away most differences, making things so much nicer. I can dream.
I think Tahoe LAFS is the open source solution for this.
Pretty much table stakes in large enterprise at this point; and as usual IBM / SAP / Oracle / SoftwareAG cabal have bought up everything that matters.
Don't make me laugh.
Saying they “use GCP for iCloud” sounds intentionally misleading... there’s a missing qualifier there. But I guess “uses GCP for some stuff in iCloud” isn’t nearly as click-worthy I guess.
Also, there's a Reinvent talk from several years ago with "8 exabytes" in the title, so surely Google and Amazon both have many exabytes by this point.
(it has other useful information that might not have been mentioned anywhere before, like the crazy Colossus on Colossus... or D, the GFS chunkserver replacement)
You can use that to revisit Randall Munroe's estimates: https://what-if.xkcd.com/63/ You can infer from some of the comments that e.g. very dense disk capacity is not a good idea. There's more, of course, but I can't go into details (ex-Googler).
Like, a hellabyte?
I'm still trying to make hella- happen as an SI prefix.
Snowflake customer requires too much custom feature and that lead to less competitive prices. When you reach the point you pay Amazon/Google/Apple as solution provider (think SAP, IBM) you need to bite the bullet and build the expertise in-house to avoid unhealthy ties between your and your provider business.
See also, Apple Maps.
Prior to iCloud's launch Apple built a massive $1bn datacenter in North Carolina and has continued to build new facilities at an aggressive pace.
I presume they are leaning into large public cloud platforms when their own capacity runs low.
> From Apple’s actual iCloud security document:
>> The encrypted chunks of the file are stored, without any user-identifying information, using third-party storage services, such as S3 and Google Cloud Platform.
Apple also view Data Storage, as another aspect that depreciate quickly and offer no strategic advantage to owning all of it.
Ever Since Apple merged and moved all(?) of its Cloud operation using Mesos, things has been great every since. Last time they said it was the world largest Mesos cluster running in operation. I suspect it is even bigger by now.
I wonder what happen to Project McQueen. 
but how do we reconcile Apple's strategy with the fact that Amazon did invest in an "asset heavy" approach? i have been under the impression that Amazon makes a substantial percentage of its profits from S3 and other cloud services.
But then their Apple TV is completely opposite of their Asset Less approach, as they decide to spend more money to create their own TV Asset.
One of them, despite having all the automatic update and backup features we could find turned off regularly attempted to upload gigabytes of data to multiple IP addresses that resolved to Google Cloud. Since there were fairly few apps installed (none of them suspicious) and a large number of photos and videos on the device, my conclusion is that it was a spurious iCloud photo backup.
Unfortunately, iOS does not appear to provide a way to see what's using data on Wifi, only mobile, nor to designate a Wifi network as metered.
Apple uses software / cloud services as a way of selling expensive hardware, and the margin (if any) made on its cloud services is a rounding error on the overall P&L.
EDIT: Generally, the 'Apple insources everything' meme is a massive oversimplification. Apple insources when there is a competitive advantage to doing so. Apple insources its chip development because it is able to get better thread performance on its phones when the software is aligned tightly to the chipset. Apple insources Cobalt because you need Cobalt to make smartphones and if others find it difficult to source Cobalt, they're going to find it hard to make smartphones, driving up their unit costs. It gets no such advantage from insourcing compute bar a slightly lower TCO, which isn't going to be a huge deal to them anyway.
The product I would like is a photo album with an indexing and web interface components, and plugins to make sense of different phone platforms. Nannies love to text videos and getting those off the phone has been a pain; I end up saving the whole iOS backup and scraping any media looking files from it.
I have a 2TB iCloud plan for my iPhone X. The phone is set to "Optimize Storage", so it will de-scale old "local" pictures once I hit ~256GB.
In the background, I also have the free and unlimited version of Google Photos running. It automatically uploads everything, including videos. Videos will be scaled to 16 MP / 1080p, but I'm not complaining since it's free.
I haven't hit 2TB yet, but so far it works. I also have a 20TB Synology NAS setup, but have little desire to run my own photo storage/hosting. The Photos app on my MacBook Pro also is set to download everything via iCloud Photo Library, so that's also backed up to the NAS via Time Machine.
I have to imagine you're usage is an outlier here.
I doubt my usage is an outlier. But I suspect that I keep more video than most people just because I can.
At some point I might start thinking of re-encoding the originals, but so far storage is cheap to have locally, and I'm going to see how long I can keep this up.
Frequently there are proprietary things that aren't included in the library. For example, if you become dependent on Google Photos ability to recognize faces and objects, you tend to stop bothering to tag photos manually so when you export everything all you have is a big pile of unorganized photos.
Why would you want to increase risk of losing them all at once?
Speaking of cost, Apples war-chest is not small- if anyone can afford a premium, it'd be them.
Cloudkit and iCloudDrive use Account Keys, but it isn't clear if those are key encryption keys or data encryption keys. It also isn't clear if those are protected from information only from the client, or if the cloud is able to freely read them. The differences are massive in regards to privacy, and this document really doesn't have the needed technical information to make an informed decision.
In the beginning of 2017 Dropbox claimed a $1 Billion dollar run rate, however at the end of 2017 Dropbox had a net loss of $110 Million. When your entire year is a loss of $110 Million, saving $75 Million is very important to the companies survival.
Since Google is working to challenge AWS, it probably makes sense for them to strike deals with companies like Apple with even small profit margins if this increases their efficiency and allows them to then make more profit from other customers.
And then we also have the option that Apple simply can't build own data centers fast enough to accommodate all their needs. Encrypted storage blocks are a safe thing to distribute, so rather distribute those than some Siri voice recognition stuff.
Dropbox and Apple are completely different businesses with different business modules. If Dropbox suffers downtime, they lose revenue. If Apple suffers downtime, they can point at a cloud provider, while they keep selling their products.
It gives Apple flexibility and avoids secondary technical and real estate type debts.
It makes sense for Amazon, Google, and Microsoft to build out datacenters because they're both using them for their core business and selling compute and storage resources to third-parties.
Unless Apple intends to get into the cloud computing business, eating the cost of tooling up a datacenter puts a big construction and maintenance cost on the "Liabilities" side of the T-sheet that they can avoid via a smaller rental cost.
Now, if I had a fleet of cars, the calculus may change on that.
Owning and operating a datacenter is a whole kettle of fish that they don't want to get into if they don't need to. The real estate taxes (wherever they set it up) will matter, but the larger costs are likely to be in day-to-day operation, upgrading and maintaining hardware (and the entire process for that), dealing with actual natural disasters like a flood (or a bird flying into a transformer house and blowing power to N% of the datacenter, which means they need a backup generator, which means they need to test and maintain the backup generator, etc., etc.).
There's definitely a break-point where it's cheaper to pay someone else (in Apple's case, multiple someone-else's) to deal with that hassle.
We are in a construction boom nationally and it's currently difficult to get enough people to construct DCs. The ability to scale out is limited by construction.
Apple is building DCs, they are late to the game, but they probably can't scale their infra fast enough and need to use cloud services as a way to keep scaling without impacting their customers.
That definitely depends on where and which kind of data center. If you're willing to take a derelict warehouse and put in containers like Google did, all you need to do is providing power and fiber (which should be plenty in any industrial settings) and you're set in a matter of weeks to months.
If you're aiming higher, as in design a DC, buy the ground, build the building itself and then installing all the stuff needed, you're in for much more money and time. Depending on the local politics and laws as well as power/fiber infrastructure, you can cut some corners but the worst-case (uncooperating politicians that need to be brought in line by the courts, the next 110kV transformer being at capacity, no fiber and no empty tubes in the ground which means digging yourself) is the benchmark there.
So you essentially trade off time to market for PUE.
In a large compute region the higher PUE of the shipping container approach is going to limit how many servers you can put in that location because power to a particular site is often a limiting factor. Also not everywhere do you have access to water from the Columbia river.
They will mostly likely work with the local utility to build an entire substation for the DC, so they won't have to deal with sharing capacity on an existing sub's XFMR.
The tenant dictates the location and design, but DFT builds and runs it. 15 years ago, everyone large wanted into the datacenter game. Then they realized it sucks to own a lot of land all over and employing mainly security guards and janitors there isn't worth it. Wholesale DC to the rescue.
And yeah, I know that Digital Realty bought Dupont.
Yeah, no. With very few exceptions FB builds and owns their own datacenters.
Basically, anything you read about in the news is entirely self-owned and operated. Yes, even employing security guards. E.g. this http://money.cnn.com/2017/08/15/technology/facebook-ohio-dat...
They also buy on the wholesale market taking over entire buildings and campuses. DFT had a great business model. Leasing out entire campuses to hyperscale players on 30 year leases before even breaking ground. Pre-determined interest rates means you know exactly how much you can charge per CRSF and make a good living. Honestly, I was pretty pissed when DRT bought DFT.
FB and et al do still build their own DC's. Especially where municipalities drop their pants and offer huge tax breaks. But there's still a tremendous amount of "single tenant" purchasing done at the wholesale level by the hyperscale players.
I guess my point is that even at the top tier, people value flexibility.
There are plenty of datacenter-owning real estate investment trusts that build huge facilities expressly for the purpose of turning around and immediately subleasing space to large third parties. With facilities that are already built and online this can reduce your 3-year timespan to a couple of months from contract execution to thousands of live servers, if the people doing the contracts and technical specs are sharp.
For a company the size of Apple, if they do have intentions to 100% own their datacenters, this sort of thing would be a stopgap solution. But it's a possibility.
Looking at BackBlaze's posts about designing rack-mounted dense storage units and their reliability tests is very informative on how specialized it really can be.
This is a swag, but I imagine more content is stored on Dropbox than iCloud, potentially lowering the savings even more.
I was at mesoscon a few (several?) years ago when they announced it and gave a talk regarding the tech side of things.
why is it stupid for Apple to use AWS and not GCP? Personally, I don't think using either is stupid.
I'm wondering if this is a capacity thing, or maybe a locality thing? It would make sense to use the cloud providers if you don't have a data centre in a nearby region of a user I guess
However, farming out some of the costs to other cloud providers seems like a good strategy to eliminate single point of failure, or avoid all data being lost if somehow one provider loses data. And maybe then they can focus on adding compute units rather than storage, and backup for the storage units.
In short, despite Apple's user base, I still don't think it is on the scale of AWS or Google.
Now hopefully there's a Rust vs Go blog posted or something gets merged into systemd today so we can get back to business as usual here ;)
There is also the possibility that Apple eventually plans to move iCloud to their own storage solution, but hasn't yet scaled up to it yet.
Would it really require a datacenter to store that small amount of data?
Apple has enough money to tread carefully and roll things out slowly.
If you are a big player like Apple, the cloud is a horrible choice.
Ouch. For Apple to move to Google's cloud infrastructure, they must have been truly disappointed with Microsoft's Azure...
Though I think their average quality has went down lately (from the days of Excel pcode to the current days of Office 365), I've been seeing Powershell exceptions thrown directly in the UI in the Office 365 Admin...
Also this doesn't imply that user privacy is less secure, since they probably still encrypt user data before it goes to Google's servers.
>But in the latest version, the Microsoft Azure reference is gone, and in its place is Google Cloud Platform.
To me, this seems to imply that they still use Amazon.
> The encrypted chunks of the file are stored, without any user-identifying information, using third-party storage services, such as S3 and Google Cloud Platform.
5 years ago, selling SaaS hosted on AWS to retailers was an instant no-go "we're not funding our competitor" was told to many of my competitors
Nowadays, they've come to the realization that by refusing to do business with companies relying on AWS they were in effect loosing more money.
(I am specifically referring to retailers in orders of magnitude of WalMart. mom & pop shops never cared or asked questions)
It still looks pretty bad for them.
They take security for these things seriously. You can store data safely even in adversarial clouds.
As a cloud provider I really like GCP compared to others.. it's clean, consistent, predictable and easy to use.
Speaking personally in my employment AWS always sent sales folks and Google always sent engineers, this makes me more comfortable too.