> To accomplish this task, Safari receives a list of websites known to be malicious from Google, and for devices with their region code set to mainland China, it receives a list from Tencent.
From Apple's statement, Tencent's API functions exactly the same as Google's k-anonymity model, which you can read more about the api and how it works (also via this JAMIA paper).
On the softer side, shutter sound can’t be muted on the camera app for JP sold iPhones, I can’t find any offical doc (not customer discussion) mentioning it.
edit: there was some doc, just with a vague “shutter sound might be disabled depending on your region” mention.
In the meantime any pervert (the claimed reason for the law) can use any non-phone camera or for that matter any other camera app than the built in one
It's par for the course. Laws in response to moral panics tend to suck and inconvenience reasonable people while having simple workarounds.
Only yesterday I was reading a Reddit thread where someone new to Japan asked whether to inform the police of the apparently violent arguments his neighbours were having and the responses varied from "don't bother, the police won't care" to "make a noise complaint, at least the police will be interested". That's the least scary anecdote I have.
⁎ Probably not about Japan specifically :)
As far as I remember Jphone, the company first heavily advertising camera phones, forced the shutter on its own to preemptively clear itself from any potential issue.
Every phone carrier from there had the choice to forgo the mandatory shutter sound but risk backlash if anything were to happen, as Jphone set a precedent. And they choose the safety move.
When the iPhone came, it would have just been a marketing quagmire to go with a mutable sound.
But of course that's also part of the issue. If 3rd party apps don't have to make a sound then what was the point of the law? The perverts can just use a 3rd party app and 99.999% of users have to deal with a stupid for no actual protection.
2nd, if you are in Japan pretty much everyone is used to the shutter sound.
Here's the thing you refuse to acknowledge (or perhaps didn't think through before writing) -- when someone is holding a P+S camera or bigger, everyone knows they're taking a picture. But when someone is holding a smartphone, you assume they are not taking a picture because 99.9% of the time phones are not in camera mode.
The point wasn't to ensure that all cameras made noise, it was to ensure that a phone couldn't be used as a secret camera in public.
Seriously, I struggle to parse the logic here: Do you think that creepers are taking actual full size cameras into subways for creepshots? Or are you thinking they're using "Spy Camera" type devices? Because the people the law was meant to deter generally aren't that level of creeper, and their creepshots are a crime of convenience not a premeditated act of sexual assault.
Their whole argument boils down to regular users not being able to download an app from the app store?
Wow. You're right, I assumed their argument was a lot better than that. Oof.
I require consistency, so either EVERYONE is inconvenienced, or EVERYONE can circumvent it without issue.
The part I missed is the justification for inconsistent and illogical arguments where magically your ability to be inconvenienced is based only on your desire to be a creep.
Where the hell did this come from?
In the safe browsing example, you'd just lose it completely because it would be replaced by Google's safe browsing list (also used by Chrome and Firefox), which you couldn't connect to from within China. So the end result is the same as turning off safe browsing.
You cannot begin to understand what it means to be forced to use one of those non-Google Play stores. Even if you set your device to English, Firefox will be in Chinese and you get a weird Skype version as well.
I mean it is a pretty interesting experience living Google-free on Android, and you could resort to APKPure to install some stuff like Whatsapp. But honestly, unless you get a Xiaomi phone or equivalent that you can switch to an international ROM, the whole experience is pretty annoying. Spam calls and text blockers on Android are the main highlight though.
However, I don't anything about it. Maybe you can share.
Gab's apps weren't accepted to either app store, they were booted by several hosting companies, had to switch domain registrar and their stripe account was suspended. Switch to a Mastodon fork is a recent development.
Gab ,It's essentially only used by and caters to white supremacists now.
Any sources you can share?
To my mind, no good-faith claim that such things do not need to be ostracised exists.
Like Gab- can't use Twitter? We'll make our own echo chamber.
Them being on Twitter is bad, but being on their own platform is arguably worse.
> To my mind, no good-faith claim that such things do not need to be ostracised exists.
(BTW, I really don't like it, either.)
I fully agree with this. It will end up being a self-fulfilling prophecy for those arguing against true freedom of speech. When you deplatform a viewpoint entirely, then the people with that viewpoint are going to coalesce onto a much smaller platform, which will end up being an echo chamber for the viewpoints that the rest of the world desires to quash. Then, it will end up becoming an argument to prevent such echo chambers from even forming in the first place, which would effectively make it impossible for smaller players to get an inroad, as they'll just be preemptively accused of being inherently alt-right.
These people don't understand that, when you centralize control over avenues of speech like that, then one day that centralized control can do a 180 and start banning any type of speech, and not just "harmful" or "problematic" speech.
It's literally the same type of tactic that China uses to suppress dissent, just on a different scale (and with a different culture behind it). It's so transparent, yet the prevailing "public narrative" is still highly supportive of deplatforming.
F-Droid would need coporate backing or SEO to get popular in China, not free software activism.
Get an Amazon Fire device
But just sideloading would not do much.
So, if you set Traditional Chinese for language and United States for region (https://developer.apple.com/library/archive/documentation/Ma...), [[NSLocale currentLocale] countryCode] (which Apple compares to "CN" for this feature according to this comment https://news.ycombinator.com/item?id=21242628) will return US, and [[NSLocale currentLocale] localeIdentifier] will return zh_US.
localeIdentifier also returns some other interesting things, e.g. mine is "en_US@currency=EUR", because I customized Region formats.
(Can't argue with the second point.)
They seem to be different settings and you can have Chinese language with (for example) US region.
I doubt Apple is actually "handing over IP addresses" - rather it's a feature of connecting directly to Tencent via the IP internet. Though they could conceivably provide some sort of proxying, caching, and batching if they wanted to improve privacy.
Similarly Apple isn't "handing over the timestamps" - Tencent can just look at the clock.
So, for example, if someone wanted to visit freehongkong.com with their language set to mainland Chinese, Safari would hash the URL which would give something like "ba7816bf 8f01cfea 414140de 5dae2223 b00361a3 96177a9c b410ff61 f20015ad" and then Google/Tencent would send back every hash and the URL that matches only "ba7816bf" which could be in the order of thousands of URL, or even potentially millions. Safari would then check the URLs that are returned to see which one is the actual matching URL and it blocks the content in the browser without sending any more information to Google/Tencent.
Unless Tencent keeps an active profile and uses machine-learning to create models for every single user of its systems and attempts to tie those models to IP addresses that are assumed to be static, this still maintains complete privacy for the end user. I fail to see how this method is in any way equivalent to what the article is claiming.
I think it would be difficult to get a strong signal as to what sort of browsing you're doing from this information, but I also think that if life and death ride on what websites you visit, you should turn this off.
You also have to be careful about targeted malware, which can give up far more information than these hashes. So I am not really sure where the balance is. It is more concerning when Tencent is running the service instead of Google. A well-targeted ad is better than being disappeared.
Maybe it won't matter anymore.
You would need a very, very, very large database to keep track of all the permutations necessary to determine a domain name.
Furthermore, I'm unclear how exactly Safe Browsing is implemented, but if it checks every resource on a page, the set of prefixes for all resources on a given page may leak more data than one query on its own, so you can look for requests for that set of queries in a short time frame to identify a particular page.
Note also that mass surveillance isn't partucularly concerned about false positives.
P(baduser | hashA )= ~0
P(baduser | hashA and hashB and ... hashF) = 0.75
One prefix doesn't do it, but a bunch together in a smallish time period can fingerprint a site quite well. This is analagous to browser fingerprinting.
From The Register?!?! Surely you’re joking!
I'm really curious how you've come to that conclusion and I mean that sincerely. What makes you say that?
What is not to understand?
Apple should send precisely nothing to the CCP if they care about their users privacy on principle.
I paid for device and then you prevent me from doing perfectly legal action and then notify authorities about it. That's what it is, and coming from privacy hero, Apple.
But that being said, I do not believe that hashes are enough, because all they really need is to know you visited one (hash of) a certain website out of thousands. They do not need proof, and they don't need it to work reliably, and they do not need the model to be precise or good.
For the system to work in China, this is sufficient. It need not be perfect, it just needs to loom in the background.
People will do the rest themselves.
This, I think, one should not forget, especially if you really, really don't see an issue with what Apple is doing.
Apple could have tried to avoid helping the Great Firewall (e.g. a proxy could hide times and anonymize IPs) but chose not to.
* fictional names inserted only for example's sake.
If you would like to disable this particular check, you can do so in settings on both Mac and iOS.
How do you figure? What is Apple’s target market anyway, Mr Confidently Speaking On My Grandmother’s Behalf?
For example, the hash for reddit.com/r/gifs would be different from reddit.com/r/funny and so the prefixes would be different for both of them. Unless the requested hashes are saved for every single user, it would be way too computationally expensive for them to get anything useful out of that. Not to mention the fact that hashes would return the same prefix for any thousands of URLs. Narrowing down which domains those URLs are rooted on would be incredibly hard.
I don't know why or what part you think this is that hard. Do you think a map from User -> set [Requested URL Hashes] is hard to build? Or that building the URL Hash -> set [possible domains] is hard?
Maybe I'm missing a piece of this.
Building something simple to start guessing domains visited seems pretty easy. If a user has 10 URL hashes and the same domains show up in each hashes' possible domains you're probably requesting pages on that domain. If you're lucky and all the pages from a domain fall into a single hash, all it takes is two or 3 hashes from known outbound links to show up to confirm this.
It's not foolproof but hardly infeasible? Or maybe I don't fully understand the algorithm.
It's 4 bytes by request. Google is keeping my whole history, much more than 4 bytes by request, and they do it for advertisers. I have no trouble believing a company partially owned by the government could afford 4 bytes by request.
Let say it's a billion address for each 4 bytes. You do 2 requests, how much of them will be on both list? Let's be generous and say half! You would only have to visit 30 uniques pages on that URL to find the domain. How often do you go on 30 different pages of a website? I feel it's quite regularly.
That's for a billion matches for each 4 bytes prefix, in reality it would be much less than that and there would certainly be much less than 50% matches between each prefix.
You can even do it in reverse even more easily. Go to a bunch of forum that you want to silence the users. Find their URL which allow to post on it. Get the 4 bytes prefix. Now you got a bunch of timestamped comments with username, and they are most likely pretty unique in the Tencent database. Now you found which IP is for which username.
On short enough time frames an IP is often a good enough approximation for a person.
That opens up forging SSL certificates, signing Debian packages, and a whole slew of other things that you could pull off with more computing power than has existed in the whole of the universe.
I don't think there are enough PGP keys in the world that simply trying those keys in a certain prioritized order wouldn't find a lot of keys.
This way you could recognize who someone is talking to based solely on their 'malicious browsing' send hashes.
In general, anytime you know a secret in the URL comes from a low (min-) entropy distribution you can use the truncated SHA-hash as an oracle to (approximately) figure out the secret.
Obviously, if the secret is a 128 bit session cookie, you won't find that by brute forcing. Heck, in that case your 32 or 48 bits of hash prefix won't even be enough to uniquely identify the session cookie.
When you get to a 64 bit secret (say again a session cookie), with a 48 bit prefix, things already get dicey. It seems possible that some state actors can already brute force 64 bit secrets, and with a 48 bit prefix, you'd get about 2^16 ~ 64000 (more, but dont wanna compute exactly) candidate session cookies. Those can actually be tried at the website to see if they work.
I vaguely remember that back when maps.app used google maps, the app would contact apples's servers (presumably a proxy) rather than google's directly.
> Similarly Apple isn't "handing over the timestamps" - Tencent can just look at the clock.
If Apple proxied cached and batched as you described, then looking at the clock wouldn't give Tencent a timestamp. (Sure, an upperbound.)
If it's a hot new URL that hasn't been firewalled yet, then it's also not going to be in the safe browsing database, which means that your browser won't make that second request (with the full hash) to verify it.
You can think of various other what-if scenarios. But the other thing is that all internet providers within China are in bed with the government and can cough up your network history on demand. Why bother going through Tencent?
People are pillorying Apple over a vague theoretical risk that is only really practical in the free world.
> If it's a hot new URL that hasn't been firewalled yet, then it's also not going to be in the safe browsing database, which means that your browser won't make that second request (with the full hash) to verify it.
Who say you need the full hash? Keep all prefix with the timestamp and their IP address and believe me, you can get a pretty good picture.
All you need is to keep a bunch of URL you don't block but consider harmful, and a bunch of of others URL from theses same domains.
Here's a partial hash that I'm pretty sure would be quite useful:
It map to this URL:
From your database, find everyone that called for 18ec. It may matches many other URL, but use your backup for Reddit and the timestamps. You now got the username linked to their IP.
Someone doesn't post? Use 1483, B5E7, 13F9, 5B58, 0859, etc...
The funniest is that you could probably find the URL yourself using theses prefixes... but yeah the goal is to find out who went to theses URL, not really which URL they went to. You don't need that many matches to confirm they truly went there.
Indeed, especially considering this: https://news.ycombinator.com/item?id=21241712
I think the only difference is that it also applies to the region set, so don't do that if you're not in China.
It's a way of reporting the use of a VPN.
If Tencent added ihatetheccp.com and ihatetheccp.com/about and ihatetheccp.com/news and ihatetheccp.com/blog and ihatetheccp.com/donate to their "unsafe page" list - I suspect the pattern of fetching of the full list of urls for some of those specific hash prefixes in a short time would allow Tencent (or Google) to make reasonably accurate assumptions about which site you're actually visiting.
The URLs are canonicalized first and then the full URL is hashed.
But that required by the system to work. Otherwise you have to choose between having really poor granularity (if you only check the domain), or really large black lists (if you only check the whole url). Point is, there's no way for this system to work effectively and be anonymous.
>all it would take is a change in the hash and most of the old model would be trash
Why would the hash change?
Without digging into details myself, I'm under the impression from recent reporting that Safari is sending 32 bytes of a SHA256 hash of the (canonicalised somehow) url. So there's 2^24 possible hashes that you might be wanting to check in the returned list. Presumably some small percentage of those will actually be on the list, what I'm not sure about is how many of them are plausible and completely innocent urls.
If Apple wants to create a privacy-oriented safe browsing experience, they need to ship that dataset with the phones themselves.
How many times (probability) do you think a 32-bit hash prefix matches more than one 256-bit hash? Very very unlikely; it's one-to-one practically speaking.
The issue is, as mentioned in that comment, with visiting a handful of pages you can get a clearer picture of similar URLs that are being returned among these hash prefixes, which can be used to build a profile of your browsing history.
You got 32 bits, thus 2^32 = 4 294 967 296 possibilities
30 000 000 000 000 / 4 294 967 296 = 6984.9193
That doesn't seems to me like a 100% probability at all.
So for 6,984 URLs per 32-bit hash, wouldn't that be evenly distributed since it's the result of a hashing function? Therefore we'd expect fairly close to 6,900 URLs per prefix? In what situation would you expect a 1-to-1 of 32-bit prefix to URL? Note: happy to be disproven, this is not my specialty at all.
We are working in bits, use bits instead, that will avoid theses kinds of mistakes.
> In what situation would you expect a 1-to-1 of 32-bit prefix to URL?
Oh yeah sorry I misunderstood it, yeah it's pretty unlikely that you would get 1 url for a prefix(but still possible). I would have to get out my old probability books to find that out but it's not worth it, the probability would be way too tiny.
I thought it was about certainty to be able to match it with the real URL. In theory it would takes only a few page hit to be certain of the domain and thus the URL (if there's no unknown string in the URL).
That database would be ridiculously huge. If they truly are using the same method for Tencent as they are for Google, the prefix has would give you several thousand different domains potentially. According to Google's docs, the full URL is hashed, not just the domain. (https://developers.google.com/safe-browsing/v4/urls-hashing)
The top return for a google search "how many pages does google have in it's index" says Google has "30 trillion web pages" (that's a 2013 article, but let's roll with that number for now).
The Google Safe Browsing docs says to send the first 32 bits of a 256 bit SHA256 hash. So there's 2^32 = 4e9 possible prefixes (And 2^224 = 2e67possible hashes for each prefix.)
30 trillion = 3e13 webpages. If you assume they're evenly distributed across all the hash prefixes (a reasonable assumption for a cryptographically strong hash function) there's about 1e4 or 10,000 urls matching each prefix. (And that's a lower bound, using 2013 vintage estimates of the number of known urls...)
I _think_ that means visiting 13-14 unique urls from a site in Tencent's lists would be enough to guarantee they could tell which site and pages you'd just visited? (since 2^14 > 1e4)
1: I'm pretty sure browsers keep a local list of known malicious hashes, and only request a list if the URL matches it. So those 13-14 URLs would all have to happen to have a prefix on the safe browsing list (presumably quite unlikely). I guess tencent could just advertise every hash as being on the list, but I feel like that would've been discovered if that was the case.
2: Even with 13-14 hashes I don't think that _guarantees_ a match, it's just the average you'd expect to need in order to find a match.
3: This becomes significantly harder when a user browses multiple websites at a time (which most people do, even if its just because they click a link from one website to go to another).
2) Right. I was imagining some sort of binary search through the hash space - which might be way off base. It might actually take all 10,000 pages to guarantee it? That seems wronger in my head than binary search?
3) I don't think that matters so much, the order or sequences don't matter, only a cluster of visits within a chosen timeframe. I don't mind if you browse to https://bank.tld/ in between https://free-hk.org/news and https://free-hk.org/next-protest...
When the link to the privacy document worked, it basically said we collect everything, but in a fuzzy jumble of poorly worded english that looked like it took a few laps through google translate.
However, I suppose China has rules against that and wants Safe Browsing requests made directly to tencent.
If this not possible in China due to the Great Firewall, at least it could be done in other areas. Unless it’s against Google/Tencent’s TOS’es?
That makes me doubt the effectiveness of their k-anonymity scheme.
Apple insists that Safari doesn't reveal
a different bit of information, the webpages
Safari users visit
Is it normal journalism to just say "Company X says ..." without stating who of the company stated it and how?
If it is true what the article claims, that they give hashes to Tencent, then this statement is false:
Apple insists that Safari doesn't reveal
a different bit of information, the webpages
Safari users visit.
Before bitching, actually read the article.
"In a statement emailed to The Register (!), an Apple spokesperson said:"
Is today really the first day all of y'all are hearing about Safe Browsing and how it works? The _prefix_ of the hash is sent to the blacklist service, not the entire thing.
With all due respect, this makes no sense whatsoever. You can't make any meaningful conclusion (in isolation, or even with small enough/random enough datasets) about the content that was hashed from only a part of the resultant hash, that's the whole point of how hashing works.
With recent HKmap.live deleting this is EPIC FAIL, amigos. Seems like now China just bought most Apple shares and make new Huawei. Just imagine it 3-4 years ago.
But don't forget that cryptographic techniques are already used to protect the privacy of users, Mathew Green just wrote a good analysis.
> To address these concerns, Google quickly came up with a safer approach to, um, “safe browsing”. The new approach was called the “Update API”, and it works like this:
> * Google first computes the SHA256 hash of each unsafe URL in its database, and truncates each hash down to a 32-bit prefix to save space.
> * Google sends the database of truncated hashes down to your browser.
> * Each time you visit a URL, your browser hashes it and checks if its 32-bit prefix is contained in your local database.
> * If the prefix is found in the browser’s local copy, your browser now sends the prefix to Google’s servers, which ship back a list of all full 256-bit hashes of the matching URLs, so your browser can check for an exact match.
> At each of these requests, Google’s servers see your IP address, as well as other identifying information such as database state. It’s also possible that Google may drop a cookie into your browser during some of these requests. The Safe Browsing API doesn’t say much about this today, but Ashkan Soltani noted this was happening back in 2012.
> It goes without saying that Lookup API is a privacy disaster. The “Update API” is much more private: in principle, Google should only learn the 32-bit hashes of some browsing requests. Moreover, those truncated 32-bit hashes won’t precisely reveal the identity of the URL you’re accessing, since there are likely to be many collisions in such a short identifier. This provides a form of k-anonymity.
> The weakness in this approach is that it only provides some privacy. The typical user won’t just visit a single URL, they’ll browse thousands of URLs over time. This means a malicious provider will have many “bites at the apple” (no pun intended) in order to de-anonymize that user. A user who browses many related websites — say, these websites — will gradually leak details about their browsing history to the provider, assuming the provider is malicious and can link the requests. (Updated to add: There has been some academic research on such threats.)
> And this is why it’s so important to know who your provider actually is.
> What does this mean for Apple and Tencent?
> That’s ultimately the question we should all be asking.
> The problem is that Safe Browsing “update API” has never been exactly “safe”. Its purpose was never to provide total privacy to users, but rather to degrade the quality of browsing data that providers collect. Within the threat model of Google, we (as a privacy-focused community) largely concluded that protecting users from malicious sites was worth the risk. That’s because, while Google certainly has the brainpower to extract a signal from the noisy Safe Browsing results, it seemed unlikely that they would bother. (Or at least, we hoped that someone would blow the whistle if they tried.)
> But Tencent isn’t Google. While they may be just as trustworthy, we deserve to be informed about this kind of change and to make choices about it. At very least, users should learn about these changes before Apple pushes the feature into production, and thus asks millions of their customers to trust them.
I for one I'm appalled that nobody really cares too much about this because of the covenience provided by Google tools, but we clearly have a situation building up where another RMS will come up with the equivalent of the free software foundation but from a privacy perspective.
You called it spot on and then came to a conclusion that fundamentally disagrees with the article:
* This is Apple and Tencent, not Google
* The author does not believe that Safe Search (Update API) is a way for Google to spy; or if it is, its privacy risk is marginal compared to the huge security win
* The author's beef is not at all with this functionality, but in how it was communicated
sure, there have been issues under jobs. but those were of different quality. cook counts beens and has no balls. schiller goes on stage and says „courage“. ive has nobody measuring his and his teams ridiculous design decissions — flat it must be!
i would bet money steve would have put the iphone 6 down, notice it wobbles, pull up his eye brows and sent the team back down the design bunker telling them not to emerge until that bullcrap is fixed.
Wow. As the corporate types would say - the optics on this are not great
Since Tencent is used for a malware list instead of Google for all users who have their iOS region set to China (even if they aren’t physically in China) this would technically allow them to falsely flag websites as malware even for some people outside of the country. But, this is a user configurable setting, and you can always bypass the warning.
In fact, there may be cases where being blocked by the Chinese authorities lends credibility to site, whereas being flagged for malware does not.
These are definitely edge cases, and no doubt the China government would to completely block a site in most circumstances, but it is another tool they could use. It also increases their coverage - the current censorship tools only cover China, but this covers any _device_ where the region is set to China (meaning it can apply to some people outside of China, and continues to apply to people who have temporarily left China).
Canonicalize("http://www.evil.com/blah#frag") = "http://www.evil.com/blah";
Canonicalize("http://evil.com/foo?bar;") = "http://evil.com/foo?bar;";
So fragments get dropped (as expected) buy query params do not (also, in retrospect, what I'd expect to make it work at all...)
So https://news.ycombinator.com/reply?id=21254732 will not end up hashing "https://ycombinator.com", but the whole thing including the path and query string.
This lets Google flag the actual problem, without freaking out for users who aren't in harms way. Millions of users visiting the discussion site see nothing, but everybody directed to an actual malware installer gets a warning.
So a "prefix" operation happens in the code twice, once in processing the SHA-256 hash to get a 32-bit prefix which is all Google / TenCent are ever shown but also before that during canonicalisation to figure out a list of hashes (not a single hash) for each URL visited.
There are other annoyances, I forget, every time I setup a new user or new profile with Firefox I'm reminded how much worse things are and have to go change a bunch of obnoxious stuff.
I miss Iceweasel.
A problem which already begins, if you merely switch browsers. For instance the only browser other than Safari, who has the close buttons on a tab right is Opera. I think you can tweak Firefox to do it, but you are out of luck with Brave or Chrome. That is just one example.
Besides, I didn't call KDE shitty but the UI of a Linux laptop in general. Here's the thing, the commenter I responded to did express ethical considerations as the driving motivation for a switch to Linux/GNU from BSD/Mac. Not technical ones.
I can relate to that. I myself would rather switch to an open OS, but the truth is nothing compares to the comfort of the Mac. Currently at least.