And there is every reason to be skeptical about Apple's ability to design even mildly complex crypto given iMessage's flaws. Although the break in iMessage wasn't practically exploitable, that was luck and the fact that the only way to detect if a mulled ciphertext decrypted required attachment messages. The cryptographic mistakes were bad. Given any way to detect decryption of mulled ciphertexts for standard messages (e.g. sequence numbers, timing, actively synching messages between devices, delivery receipts from iMessage instead of APSD), Apple's crypto design bugs would have eliminated nearly all of the E2E security of iMessage.
Remember, this isn't a boon for user privacy. Apple is now collecting far more invasive data about users under the claim that they have protections in place. At best it preserves the status quo and does so only if Apple both picked the parameters correctly and implemented it correctly.
At this point Apple's position should be best summed up as: we have drastically reduced your privacy except not because magic that we (i.e. Apple) do not fully understand.
But at some level, my comment is pretty harsh on Apple when they have a better track record than most for privacy and encryption. But remember, this is Apple making you less secure by grabbing data and then saying they took care of the issue.
With iMessage, Facetime, Filefault, if it failed, you were were no worse off than if you used something else. In this case, you actually are. That means there is a higher burden to getting it right than just name dropping magic crypto pixie dust.
What would someone be better off using?
It's important to clearly distinguish what DP can and cannot do. DP is just a technique for taking a database and outputting some statistic or fact about it. The output has some noise added to it.
The guarantee of DP is (roughly) that anyone looking at the output alone won't learn much about anyone in the database. This also holds for anything you do with that statistic.
Think about this carefully when thinking about what DP does and doesn't promise. Also think about the difference between "privacy" and "security".
Example of what DP does protect against: If Apple is recommending products to people based on others' download habits, and this recommendation is based on differentially private statistics, then no other user or group of users can infer anything about my downloads. In fact, even engineers at Apple, if they can only see the statistics and not the original database, cannot infer anything about my downloads.
Example of what DP does not protect against: government accessing the data. The database still has to exist on Apple's servers. The government can get to it just as easily as before via warrants or so on. DP is not cryptography.
My assessment: On one hand it is awesome that Apple is taking a lead in using differential privacy and thinking about mathematical approaches to privacy. On the other, there are many facets of privacy and right now I think people are more concerned about security of their data and privacy from the government, or else privacy from companies like Apple itself. DP doesn't address these; it only addresses the case where Apple has a bunch of data and wants the algorithms it runs not to leak much info about that data to the world at large.
It doesn't, which is part of the reason Apple wants to do this. You can still do differential privacy without collecting all the data, you just get less accurate results. See page 232 in , re "The Local Model".
Edit: the article even says this down when it talks about RAPPOR.
If Apple really did implement some sort of randomized response (or more sophisticated variant), I think that would a real breakthrough for user privacy since they'd be giving up control of the data.
They store all iCloud sync material (backups, photos, contacts, calendars, mail, documents, etc.) without end to end encryption, and have all of the iMessage metadata.
The official explanation so far is that if the user forgot the password a user-encrypted backup would just become some useless junk.
This is (officially) the sole remaining non user-encrypted personal data on apple server that authority can reclaim using a warrant.
However after San Bernardino FBI mess, Apple start considering to also encrypt iCloud backup.
So if you think you are right, proofs please...
Take note of what the wording leaves out. Apple holds the decryption keys for just about everything.
> iCloud Keychain encryption keys are created on your devices, and Apple can't access those keys. Only encrypted keychain data passes through Apple's servers, and Apple can't access any of the key material that could be used to decrypt that data
This wording is used for nothing else than the keychain.
Fully consistent with the behavior where disconnecting all your devices from your account, to then do a password reset and logging in on a new one device will make all your data available to you again in plaintext format. Including iCloud backups of iMessage chats.
How come they are accessible from iCloud.com? Decrypted by the browser on the fly?
However I reckon that technically Apple could access data or give data stored on iCloud to NSA/FBI because they actually still hold the keys for that part too (not only backup as I thought).
Only the password/creditcard Keychain is now claimed to be fully user-encrypted and can't be recovered by any mean by apple.
For anything else than a warrant, they'll "just" have to breach every engagement they made in their contract which would, as far as I know constitute a pretty solid legal case that could only lead a public walk of shame that could compromise the whole company's future.
If you don't trust them, don't use their cloud, I totally respect that. In the end it always appeal to some degree of trust, even GitHub could be spying on paid private repositories under the hood if they really wanted to. But for what gain?
This is something I doubt. It would be rather easy to change the software and make it sync passwords, even on an individual basis. If this would come out, it would mean a big marketing problem, and could result in sales losses like 10-20%.
I said "could", but to be honest I think 2-3% is more realistic. Most people don't care. They want their data to be safe in case of theft, and have a backup in case of loss. Here on HN it's a big thing, but most users don't know, don't care.
> All your iCloud content like your photos, contacts, and reminders is encrypted when sent and, in most cases, when stored on our servers. All traffic between any email app you use and our iCloud mail servers is encrypted. And our iCloud servers support encryption in transit with other email providers that support it.
> If we use third-party vendors to store your information, we encrypt it and never give them the keys. Apple retains the encryption keys in our own data centers, so you can back up, sync, and share your iCloud data. iCloud Keychain stores your passwords and credit card information in such a way that Apple cannot read or access them.
I always find it amusing when people downvote me for telling them Apple is doing what they admit to be doing.
Of course that will never be as audit friendly as an Open-Source code. But don't call it a secret, while you actually just didn't search for the information...
I don't think Apple has ever claimed it does not collect data.
You can also add noise to the input samples, so your database doesn't contain statistically significant information about any single individual. But aggregate queries will work as the noise evens out. See http://research.google.com/pubs/pub42852.html
Edit: Just saw that the linked article actually explains this in the last section as well.
Also, Apple is woefully low on details, theoretical privacy should be accompanied by openly published research papers that are peer reviewed. I understand they won't release the source, but would you trust Apple if they said they invented a new encryption algorithm, but refuse to publish an academic paper on it? I'd be interested precisely in what they're doing. Are they claiming they're doing federated learning, by gathering anonymous image data from photos, uploading it to their cloud, training DNNs on it, and then shipping the results back down to clients for local recognition? Surely they're not training on device, as this is very RAM and CPU intensive.
Google indeed has RAPPOR (and other projects, I'm sure), but the cultural difference Apple claims is "we consider privacy in everything we do" instead of "we add privacy where we can."
I'm pretty sure that should be interpreted as "we've determined privacy is a differentiator in the market, so as of some indeterminate time in the past, ranging from a few years ago to our inception, we consider privacy in everything we do."
Now, there's nothing wrong with that, and that's not to say they haven't been privacy conscious in the past, but let's not confuse the current stance as entirely altruistic, when there are are multiple incentives at play, one of which is concern for the user.
Edit: s/months/years/, that's much more accurate.
> His comments arrived as Apple started to identify Google, and its ascending Android operating system, as its chief competitor. Here we see the first signs of the hardware seller deploying its privacy position as a branding and competitive tactic, a strategy that has come to the fore during its current standoff with the feds.
Like I said, there's nothing wrong with this. We just need to be sure we don't fall into the trap of thinking we can take what is presented at face value as the whole story, just as you can't when dealing with individuals much of the time. Apple is not our trusted old friend, that will look out for our best interests. They are at best an acquaintance that we have a business relationship with. That doesn't mean they won't act in a manner we appreciate, but it does mean we should not assume they will act as a good friend.
What a bummer!
Theses devices are never on the same network, the only shared parameters is an exchange account. As a rule I always log-out of the only Google service I rarely use, so this must be some cookie/tracker dark magic.
Sadly I have no proof, but I use gosthery to block trackers since then. (Side note: gosthery also claim to use DP btw)
Connections are a two way street. Facebook can assume that if he has connections to you, then you may have a connection to them.
Nothing nefarious is necessary.
1.Collect personal data, send them to mainframe, use them to profile and deliver custom tailored services. When sharing hash so that individual records can't be extrapolated
2.Collect personal data, use them locally sometimes using mainframe provided 'models'. Return to mainframe hashed records to improve models.
Google is clearly using 1. while Apple claim to target 2. (I don't think it's actually the case now because I though Siri actually store some personal data on the cloud so far)
But this is not how privacy should work, cause there is a lot of people out there that don't read HN and only recently found out that there is a lot pastry inside their computers.
Uh, the graph is just showing you get an increased 25% estimated risk of mortality from Warfarin, nothing close to "killing patients". Complete exageration, since the mortality baseline is probably very low in the first place.
It's roughly analogous to a doc saying "should I deviate from the baseline treatment?" and when the data say "dunno" the doc prescribes a totally random medicine rather than the baseline, because that is what "dunno" means.
The amount of noise that "theoretically guarantees" privacy protection in terms of epsilon renders any reasonable analysis impossible. E.g. How about CDC telling you that there are 0 - 2000 cases of Ebola in Massachusetts.
There are theoretical guarantees provided by Differential privacy and then there are actual requirements of conducting public health or biostatistical reaseach with certain evidence value. The gap between the noise added by the former and tolerated by the latter is enormous.
Its trained doctors and researchers who get to decide quality of medical evidence. There is a reason why we have detailed set of protocols and levels used for assessment of evidence.
Adding noise and fuzzing has a long history in statistics since the '70s , and while it does work on large numbers, it almost always messes up the details ie. the error bars.
C.D. DP is essentially a cheap ripoff of the ideas implemented in ARGUS.
 1977 Dalenius, see Do Not Fold, Spindle or Mutilate movement and earlier:
Disagree. Data is why databases exist.
> Any "average" is simply readily apparent, therefore irrelevant for serious in depth analysis.
I said "aggregate", not "average". There are many kinds of aggregate analysis useful (in Astrophysics, you can take many different samples from different stars and use the aggregate to compute commonalities in the sample that you would've detect with a single measurement). There is more to aggregate analysis than averaging data.
As for the rest of your points, I'm not a statistician so I can't comment. Also, I didn't downvote you (HN rules).
But as you say: your "aggregate analysis" NEEDS "many different samples from different stars". Commonality is the result of your analysis based on different samples. But since they are common, you can go and sample and have the result without doing mass surveillance on every star.
ps: I am fully aware of photo stacking, but also note, that stars are not humans, see context of privacy. Please look at argus or sdcMicroGUI from CRAN to get a feeling for data utility vs. reidentification risk.
"Mass surveilance" reduces noise and lets you get more data in a shorter period of time (telescopes have large fields of view, but they can't make time pass faster). Stacking (which is what the technique is called in Astrophysics) is very useful in this case. Not to mention that you can also do individual analysis as well.
Actually, most interesting of all is that you can do this type of analysis on objects like neutron stars that we can't observe directly because they're too faint. Because noise in telescopes can be modelled as a Poisson process, stacking actually increases S/N in a way you can't do without making much bigger telescopes.
PS. I'm not a statistician, so I can only speak to what I know. But my whole point is that researchers do know how to deal with noisy data, regardless of whether or not that noise is man-made or not. Interestingly enough, I found out recently that the NASA pipeline actually breaks certain data sets they have released (which have papers written about them) so man-made noise is a problem regardless of whether or not it's intentional.
This is the key point to argue against in the context of people, privacy and mass surveillance.
It is the touchstone of privacy, anonymity and crowd protection.
Regarding noise suppression: yes, the more queries (available data whether raw or extracted) the more you can filter (ask a Kalman student) to reduce your error bars and margins. This is a reason why DP is overhyped. Also, if there are no differences between queries, then data is redundant. See deduplication (database) or scaling (measurement).
About the analysis pipeline: this is why the mantra "know your detector". Coincidentally, this is why releasing only recorded datasets is next to useless for people outside the given research group. You would need to capture detailed knowledge of your data taking operations and instruments, which happens rarely, if ever. Please cite a thing such as "the NASA pipeline", perhaps you mean a given mission/experiment? In any case, detector recalibration is a usual, almost daily activity...
The specific pipeline I was referring to is the Kepler pipeline that NASA uses to take their raw pixel data and produce photon counts that everyone uses for their research (this wasn't a detector issue, it was a software bug at the final stage of the data publishing process). The point was not the pipeline issue, it was that noise is everywhere.
But as to your point, yeah okay. Maybe I shouldn't talk about statistics when that's not my field. :D
The law of large numbers says that after gathering statistics from many values of Y, they will converge (for continuously differentiable functions of X) to the values for X.
Meanwhile each individual user will not send so many samples as to identify the true values of X with any useful accuracy.
That's never going to happen. Apple sells a 'User Experience' not just hardware - having a complete and mostly closed product is an inevitable consequence of the former - and the number of Linux users that would buy a Macbook isn't a large enough part of the market for them to worry about.
With that said I've had a Macbook Pro and it was pretty much a better piece of hardware (at least as far as build quality) than any other notebook I've used.
How can you say this looking at the hardware landscape?
The recent WWDC obviously shows a big shift towards AI and ML applications within the company. Some things are possible on the device, but many neural nets just cannot be served from an iPhone reasonably. Hence, the move towards more data collection. I really wish they give out more information here. Until then, I'm not sure how much they are actually collecting after their realization that they do need the data to do AI well.
You can find all the videos from WWDC 2016 some time after the session is done. I usually check the next day. They have the videos for several previous WWDCs up as well.
If a recommender system for iTunes can predict the likelihood of me appreciating movies that contain violence against women, that information could be subpoenaed when I am falsely accused of having strangled my girlfriend.
I appreciate that Apple is trying to protect our privacy where they can. But if we want them to make predictions about or behavior, we have to be aware of the fact that we are necessarily giving up some privacy.
I understand that the database Apple wants to build does not contain accurate information about individual users. But if that database allows them to make predictions of our behavior, then there is a privacy issue. If the purpose is not prediction, then what is it?
So Apple can (for example) predict that listing to band A means you are likely to like band C, and then send a list of correlations to your device so the predictions can be made there by examining your library locally. A more probable use is analytics for marketing purposes. Another is selling just these correlations and other aggregate statistics to other parties; this is actually how Mint makes money.
And how is that different from my iTunes example?
That's what makes the data useful and that's what makes it a privacy issue at the same time.
How does sending the same list of conditional probabilities for liking pairs of bands to everyone's device and then having the device pick out the ones actually pertinent to your library compromise your privacy?
What I'm saying is that if Apple keeps data on its servers that is sufficient to predict some of my actions or likes with any accuracy greater than 50%, then that is a privacy concern.
But if you're saying that the data in Apple's database does not have any predictive power on its own, then I agree that it is not a privacy concern.
In that case, my device would have to download some of Apple's data and combine it with data that resides only on my device in order to make a prediction locally on my device.
If that's how it works then I have no concerns.
They even limit the number of samples they get from a specific person so they can't filter out the noise for that person and get their individual response.
But, keep in mind that Apple will have records of all your iTunes rentals and purchases at least for billing purposes. However, at least in the US there's a law about keeping that data private (because of Robert Bork).
My impression always has been that Apple does not collect data that can lead you to be personally identified. I never got any impression that "Apple does not collect data".
Do they care about their bottom line? Of course. It's for that very reason they are investing the time now to secure the trust of generations of consumers.
For example, Apple could easily download all the data, do a DP on the impact of adding the data to the existing aggregate data, clean out indentiers, and add it to the database.
Key here is that Apple has all the data, then purges the indentiers from it, which is completely different than removing the indentifiers before sending to Apple.
(Apple:) "Hi, I'm Apple, Trust Me! Don't mind the black bag, I just likely being mysterious, it's cool, right?"
(Me:) "Umm, no, no thanks!"
Apple needs to let go of the whole security through secrecy ploy, since it looks more and more shady.
Imagine if security modules for devices where public and non-secure section of the devices had to be encapsulated for EmSec and tamper proof. If this was the case, security literally wouldn't be an issue; either everyone is impacted, or no is impacted.