Update: the code for Tencent Safe Browsing seems to be very similar to that which talks to Google, down to it being under a "Google" namespace, the API endpoints being named the same, and performing hashing which seems to match the "Update API" here: https://developers.google.com/safe-browsing/v4/update-api. I think this is just "whatever Google could see before, Tencent can see now, if you're in China". I'm no expert, so I have no idea if that's k-anonymous or whatever if Tencent/Google decide they want to track you, but in either case it's just shifting who's getting your hashes.
Nope, [[NSLocale currentLocale] countryCode] returns Region country code from settings. The same code is also used for language region, so you can end up with something like zh_US.
I wonder if there are similar oversights with the US locale, seeing as a lot of developers prefer English interfaces to the somewhat craptastic localizations.
CN is the country code, not the language code. zh-CN is simplified chinese localized to mainland China. If you want simplified chinese try something like zh-HK or zh-SG.
But there is valid criticism to be had that Apple should be signposting more visibly the differences between its settings for CN and outside CN.
Hong Kong, Macao and Taiwan all have their own ISO 3166 codes and users there are unlikely to accidentally set the region to CN, since the difference between simplified and traditional characters is quite obvious.
Google is blocked in China so naturally they'd need a Chinese alternative, with everything going on it's easy to fear monger but people need to chill out a bit.
Locale is probably one of the least intrusive ways to determine location, using GPS would probably cause an even further problem if people realise that there's a backdoor to avoid location permission
Any company that markets/releases in China and relies on some google service (maps/safe search/safety net/google sign in/firebase/etc) need to find an alternative, not because everyone is on the Chinese payroll but more often than not these services are business critical.
It's "decompilation" of a block invoke for Backend::Google::SSBUtilities::shouldConsultWithTencent() taken by opening /System/Library/PrivateFrameworks/SafariSafeBrowsing.framework/SafariSafeBrowsing in Hopper Disassembler.
bash$ c++filt <<< ____ZN7Backend6Google12SSBUtilities24shouldConsultWithTencentEv_block_invoke_2
invocation function for block in Backend::Google::SSBUtilities::shouldConsultWithTencent()
Edit: what are downvotes for? That is the standard way to decipher C++ mangling, using built-in (binutils) tools.
You’re looking at the method name after the compiler got done mangling type information into it for the linker. The human-readable name that (what appears to be an anonymous block) likely appears within in the source code appears to be “shouldConsultWithTencent” in a namespace (class?) “Backend::Google::SSBUtilities”.
This is really a "damned if you do, damned if you don't" kind of situation.
They can either use Tencent's Safe Browsing API as a drop-in replacement for Google's API, relying on k-anonymity to leak as little information as possible. That leaves them open to accusations that they allow Tencent (or, for that matter, Google) to track the browsing history of Safari users.
Or they can essentially turn off Safe Browsing in China. (Google's API is collateral damage of the Great Firewall.) That leaves their users unprotected against all kinds of malware and scams.
I think they made the right call here by protecting users against the most common threat (most people are not dissidents), while giving advanced users with a different threat model the opportunity to opt out.
"Or they can essentially turn off Safe Browsing in China."
The OP as well as the associated blog post[1] as well as the Apple-provided fine-print language do not make it clear to me that this "feature" is exclusively enabled for Chinese users (or, perhaps Chinese IPs).
Could someone point to a source that confirms a US person, in the US, with a US-purchased iphone, would not have their browsing history transformed and sent away for analysis to tencent ?
You can try yourself by going to one of the IOS Safe Browsing test pages on your phone, and when the warning pops up click "Show Details". It'll either say Google or Tencent on the warning message, which should let you know which one got chosen for you.
Great. I disabled safe browsing probably back when it first appeared on my iPhone 3G or 4 and this test confirms I’m still not sending urls to anyone whilst surfing on my iPhone 11. Nice job preserving these settings over countless device upgrades.
For anyone else wanting to disable it (or at least learn more about it), the feature is labeled in iOS settings under Safari > Fraudlent Website Warning.
You've probably switched off Safe Browsing. I wouldn't advise anybody to do that unless they're _so_ sure that they don't need Safe Browsing that when (most likely rather than if) they get infected by Malware or fooled by Phishing they are confident they'd tell everybody they know what an idiot they are.
I have it switched off on my home PCs (but on for work). But then I also don't carry home insurance and when I was flooded I believe my first Facebook message began "This is probably a good time for you to say 'I told you so'" because that seems like the right sentiment.
Alternatively, they could purchase the data from Tencent or another company, and operate their own version of the service. That may even be what they’re doing —- but we don’t know, since they launched the service with no details or publicity.
* it exposes to the bad actors exactly which of their scams is detected, so they can simply refine their methods until their sites don’t make “the list”
Bloom filters can give false positives, and to eliminate them, you'd need to send "data (hashed, anonymized, truncated, or otherwise)" to some entity that has the full list. That's exactly how Google's Safe Browsing API works.
OS vendors already make a habit of regularly sending gigantic OS updates. I'd have a hard time believing that a compressed list of malware URLs would be noticeably bigger, by comparison.
Also, once the list is sent the first time (or just included with the OS so it'd be already present on your device when you bought it), they could just send the deltas as the list changed, and those deltas (especially once compressed) should be relatively small even compared to the original (probably not that large) list.
Google Safe Browsing transparency report lists 40k new bad URLs per week - how large do you think the list is now? It is far, far too large for local processing but k-anonymity is perfectly trustworthy when used with cryptographic hashing.
Wouldn’t it be pretty easy for the bad actors to check the database anyway? I can’t imagine they would need to query often enough to hit any rate limits.
I haven't run the numbers, but I am guessing that a clientside solution would have a lot of bandwidth sucking and avoiding false positives is very important.
Also with a clientside solution, how are new phishing URLs detected?
PS: perhaps try to assume HNers know what a Bloom filter is (I've seen them come up lots of times in comments).
Google's safe browsing API is probabilistic too. The idea is that you do so many rounds of checking to get closer and closer to the mark. You start with a fairly high false positive probability, high-privacy check, then if you get a positive, you try a lower false positive rate check that also loses you some privacy, and the trade-off is that you don't have to have the full malicious site DB with you at all times (and keep it up to date).
Why did you assume I'd not know about false positives?
400,000 sounds like a lot, but I wonder how many new URLs Tencent adds to its database each month. I expect they don't add every phishing URL but some small subset of them (possibly even a very small subset.. we'll proably never know).
But let's say it is 400,000. I took the URL you linked and made a file of 400,000 copies of it. The file size was 28 MB. I didn't bother compressing that particular file since the URL is the same in each instance, but I expect a file full of actual phishing URLs would probably compress pretty well, so it would probably be significantly less than 28 MB.
Considering that OS vendors regularly ship multi-gigabyte size updates, having to download less than 28 MB extra every month shouldn't even be noticeable. If updates needed to be done more frequently, the client could subscribe to get regular updates as they become available.
> 28 MB extra every month shouldn't even be noticeable
Parent comment suggests phishing site life-cycle <15hrs, at 400k a month that's 8333 every 15 hrs. To give an idea of how frequency sensitive this is, assume URLs are added equidistributed in time: that would be a new one every 154ms - for such time critical information it makes no sense to attempt to synchronize clients, it would require constant polling or push updates to have _any_ chance of catching a malicious URL.
At such a frequency, efficiency becomes less about bandwidth and more about the overhead of continuously synchronising so many clients (think of that 28 MiB spread out over 400k separate messages over one month, one every 154ms, that not only inflates the size, but causes a constant network usage and processing that is far less efficient than a single 28MiB download).
Or you could just send the URL hash when you visit a URL... (do you request any where near 8k URLs every 15hrs?, 1 URL every 154ms? no), it's so clearly a simpler solution that will be faster for everyone without letting bad URLs slip through before a latent sync.
There's no need to sync every time a new phishing URL is added - only every time a URL is visited by a client.
The delta can be derived just from the version number of the client's URL database, and should be a total of 1 MB in size for a whole day's worth of updates. So ~1 MB for the 1st URL visited in a day, and considerably less afterwards. Compared to average webpage size, that's nothing.
Really, only thing that changes is instead of sending a URL hash, you send the URL DB version, and the reply is the list of changes since that version.
> Really, only thing that changes is instead of sending a URL hash, you send the URL DB version, and the reply is the list of changes since that version.
Or none at all and a simple confirmation that the list is up to date. Yes this is a way better idea.
Although it's always going to be less efficient. For instance i'm not sure how it would scale into the future. Checking URLs server side is optimal, it's always going to be relatively constant in proportion to the URL size, but with DB deltas each URL is now related to both the URL size and the DB update frequency, i.e as the malicious URL rate increases over time, individual URL lookups will incur greater network cost... this is probably not a big deal for the client, but It would make a significant difference for the provider of the deltas - or maybe network caching would disolve it again? I mean there would be a lot of duplicate deltas flying around every minute... basically a content distribution problem but with a high frequency twist.
Do you really think Tencent is detecting a new phishing site every 154ms?
I'd seriously question how many of the total new phising sites they detect to start off with, and then how frequently they do so.
If a user only downloads the deltas periodically they'd risk being out of sync with the master list (which might not be updated even once a month or at all, for all we know), but that's the price they'd need to pay to have to send any information about their web browsing to parties they don't trust.
One other thing to consider is the likelihood that the URL you happen to be surfing is both a phishing URL to begin with and one of the ones that just appeared since the last delta download you did, compared to the likelihood that it's one of those already in the entire phishing URL database you've already downloaded. I'd expect those odds to be very low.
> the likelihood that the URL you happen to be surfing is both a phishing URL to begin with and one of the ones that just appeared since the last delta download you did, compared to the likelihood that it's one of those already in the entire phishing URL database you've already downloaded. I'd expect those odds to be very low.
Ignoring the first condition (otherwise why bother with a list at all)... Consider that this information is very transient (average 15hrs), this is pretty simple: deltaT / 54000
This is still horrible, because your safety is determined by how frequently you can sync with the DB.
> If a user only downloads the deltas periodically they'd risk being out of sync with the master list (which might not be updated even once a month or at all, for all we know).
Being updated monthly doesn't match the statistic of average 15hr lifecycle, because it would be useless after that length of time. And while I don't claim to know 15hrs as a fact, it is intuitive that the average will become ever shorter as malicious URL checkers become updated ever faster.
> but that's the price they'd need to pay to have to send any information about their web browsing to parties they don't trust.
Full URL information need not be sent, a hash of the URL domain and path would probably suffice... if that's not enough then it's a dilemma, but that doesn't make continuous syncing a good or fail safe replacement.
"Being updated monthly doesn't match the statistic of average 15hr lifecycle, because it would be useless after that length of time."
And maybe it is useless. We don't actually know, but we should at least recognize that there may be a difference between how frequently phishing sites allegedly appear and how frequently they appear in Tencent's malware URL database.
"This is still horrible, because your safety is determined by how frequently you can sync with the DB."
And being identified by the Chinese government as someone who surfs to forbidden websites might be even more horrible, for some.
> This is really a "damned if you do, damned if you don't" kind of situation.
Why? Simply, when setting up the device/browser, let the user choose what safe browsing API the browser shall use (both, one of them, or none).
Letting the user make a conscious choice is the best way to handle "damned if you do, damned if you don't" kind of situation. To make the choice as conscious as possible for the user, provide additional material that explains the advantages and disadvantages of each option for the user so that the user is well-informed before he/she makes his/her choice.
That’s entirely against the entirety of Apples modus operandi. It’s always make the user have as little choice as possible and assume that users are idiots. The only exception they made is to developer with cli abilities, and even that they have began to restrict
Microsoft didn’t make this assumption. Apple did. My entire extended family are now all Apple users. My tech support calls per year can now be counted on one hand.
In my extended family, the users are idiots, and thanks to Apple, I’m poorer financially but significantly richer in free time.
I didn't say it was a poor choice, rather, it's a great choice.
I'd rather be an idiot in the areas that I don't have an expertise in, I have no interest in plumbing and I would blindly follow the suggestion that the plumber who came to my house made. And I guess this strategy worked well for Apple.
I believe an argument could have been made that it was the right call if they had publicized it. Seeing as how it was implemented in the background, I am less inclined to give them the benefit of the doubt.
I'm curious if, as @thefalken brought up [0], this is illegal under the GDPR, given that it's a hidden opt out and should apply to EU citizenry with browser language set to Chinese.
Very doubtful, even with the "hidden opt out" that seems to be sufficiently poorly "hidden" that lots of people here have indeed opted out.
Safe Browsing uses very little data (pretty much the least they could get away with to make it work) and you'd have to establish either that Tencent is lying about how it uses that data AND that Apple knew or reasonably should have known that it was misused.
URLs never leave your browser, so "Apple is sending URLs" is wrong. The Update API is used, so the URLs stay on your browser but under some circumstances hash prefixes of some URLs are sent to Google/ TenCent.
If you choose to assume that Google / TenCent are bad actors then they can probably manipulate this data to target a few URLs and discover who (IP addresses) browsed those URLs. In less well designed browsers like Safari they might be able to tie that to a Google Account independent of the IP address because those browsers don't isolate Safe Browsing API calls from normal web browsing activity (this won't work in e.g. Firefox). If a bad actor did this, it would make performance worse for all users, and the accuracy of the trick would be sabotage unless the set of target URLs tracked is fairly small, if you were looking for a single PDF filename on a single web site it's definitely possible, if you want to track six thousand different articles about Xi's resemblance to Pooh Bear across tens of thousands of sites that's going to cause a lot of false positives you have to weed out somehow.
That doesn't mean that the user is in China. It means that the user wants their interface in Chinese as it is written in mainland China. In other words, the CN means simplified Chinese instead of traditional Chinese, which is what the TW region code corresponds to.
The GP poster is incorrect; the Region setting has nothing to do with setting the region code of the Language setting (each language+region pair being its own listing in Languages.) The Region you choose during initial device setup does determine your default Language region, but you can pick a different one while keeping the same Region.
The Region setting in iOS is literally just the question "what Country [or Country-equivalent political region] would you like to be considered to be in, when we make certain OS features be dependent on your country?"
This is separate from what country the phone treats you like you're actually in, geographically, which is determined moment-to-moment by geolocation and cellular profiles. (Time zone? Geolocation. Maps domestic/foreign feature display granularity? Geolocation.)
Whereas, Region is for things like, say, whether you see certain apps or features that are in partial progressive rollout; or whether you see features offered that don't make sense outside of certain regions.
Re: the first example, the News app, which rolled out in the US first, could be made to appear in other countries by setting your Region to the US. When this was done, the News app, if launched, would still detect what country you were actually in (geolocation-wise), and would make a best-effort attempt at showing news from the few sources Apple had made agreements with so far from that country.
Re: the second example, iOS has social-network "Accounts" integration with Sina Weibo, QQ, etc. just like it has integration with Facebook/Twitter/etc. It just doesn't display these sign-in options unless your phone is set to the China "Region." Because, if you're not in China or from China, why would you ever use these networks? (Note that Apple designs iOS under the assumption that people won't bother to change their Region when they travel; so it really is more of a "where are you from" rather than "where are you now" question.)
This is incorrect. en_GB doesn't mean you're in or from Great Britain. It means you want the device to show English as it is used in Great Britain, with extraneous "u"s and rearranged month and day. A user in the US can request that locale instead of en_US if that is the language they prefer. Locale is for localization of the interface, not for telling where you are from.
Now maybe iOS sets the locale based on where the user is from instead of based on how the user would like their interface localized. If it does, it is doing it wrong. Sending a user's data to Tencent based on a setting instead of based on their location is absolutely wrong.
> This is incorrect. en_GB doesn't mean you're in or from Great Britain.
You misinterpreted. "Region" is a setting in iOS. But iOS "Region" has nothing to do with the "region" part of a locale. Setting your iOS "Region" to "Great Britain" and setting your "Locale" to "English (Great Britain)" are separate things. "Region" is just what iOS happens to call a completely distinct thing. If you like, to lessen your confusion, pretend it is called something different.
> Sending a user's data to Tencent based on a setting instead of based on their location is absolutely wrong.
You wouldn't want your phone to start sending data to Tencent as soon as you cross the border into China, right? And, vice-versa, you would expect a person from China, who thinks Tencent is a great brand, to not want to stop sending their data to Tencent just because they cross the border out of China, right?
> You wouldn't want your phone to start sending data to Tencent as soon as you cross the border into China, right?
You most likely would. Google's service will be unreachable from within China. If it didn't switch providers, you would have no Safe Browsing protection. The key thing is to obtain consent from the user the first time this happens.
They should send to Tencent based on network location. If you're inside the Great Firewall, Google's safe browsing service will be unreachable. If you're outside the Great Firewall, you really don't want to use services through it if possible. https://arstechnica.com/information-technology/2015/04/ddos-...
I think their point is that changing your device language is not the same thing as changing your region. Changing your language is a simple setting, but changing your region involves re-accepting the ToS for that region. So technically they would have to click Agree on the document linked in the tweet in the OP.
“en_US” is “American English”, not “English on a Phone in america”. The alternative “zh-*” codes are SG, TW, or HK. It’s checking if the user has their region set to “Mainland Chinese”, not That their phone is “Chinese on a phone in China”.
Actually, american english is "en-US", "en_US" means english with the region set to the US, at least on iOS. But yes, it is checking that their region is set to mainland china.
The code appears to be used for fraud related purposes, meaning, to my understanding, Apple would likely argue it has a legitimate interest.
There’s a lot of legal language around this exception, but fraud is directly called out as a legitimate interest and means that the group controlling the data would not need to obtain user consent.
Is apple the data controller here since it's all happening on the users' device? And does "legitimate interests" extend beyond the data controller's interests? I.e. if it's only about fraud against apple then safe browsing (which is supposed to protect the user from fraud) would not necessarily be a legitimate interest of apple. It might have to be opt-in at least.
Great questions, which I know I'm not equipped to answer authoritatively - prior comment was just my two-cents on how I'd expect Apple to argue the issue (And even that argument may be a losing one).
In opposition to the fraud argument, one could argue they wouldn't reasonable be expected to have their data forwarded to China. The counter-argument to that would likely be along the lines of users who have their localization set to China might have more of an expectation of this. And so the lawyer fees continue to increase in what would be an incredibly interesting case, honestly.
If it's illegal under the GDPR to send the data of EU citizens with browser language set to Chinese to Tencent, it's also illegal to send the data of EU citizens with browser language set to anything else to Google. Chrome, Firefox, Safari and probably all Chromium-based browsers (unless they disable Safe Browsing by default) use Google's API and would be in violation, too.
That's true, but it's probably covered in the privacy notice. It doesn't make a difference that the data is shared outside of EU, it just has to be communicated to the user.
Also, the data shared here is not personal information, unless it's connected with personal information such as IP address or a tracking cookie.
This is pretty gray area. Apple isn't necessarily sharing information with Google, it's just the property of Internet traffic that Google / Tencent can collect the IP address from the request. Same happens when websites include resources from other websites (images, scripts, etc.), and these are not typically taken into account in GDPR privacy notices.
That's not necessarily true, since the GDPR imposes extra restrictions to sending data to countries not covered by the GDPR (essentially, outside the EEA) or that are deemed by the EU to offer equivalent protection. I don't know where Tencent has these servers, but Google has servers in the EU and managed by an EU-based subsidiary.
> Before visiting a website, Safari may send information calculated from the website address to Google Safe Browsing and Tencent Safe Browsing to check if the website is fraudulent. These safe browsing providers may also log your IP address.
we should be linting code to say whether it phones home or not, and what it uploads when it does. plain language privacy policies and ever-changing browser settings are leaving huge gaps.
when the US government bought chinese drones they hired a consultant to prove that the drones never call home.
> we should be linting code to say whether it phones home or not
Is that possible? How do you diffentiate it from expected API calls?
(Not convinced black/white-listing strings is any different from code review in this case - it'll just be changed on demand if if prevents adding what was tried to be added.)
it's theoretically possible. I don't know of any tools that do it (which could be a comment on my research skills rather than the state of the art).
in theory you can do dataflow analysis on all external inputs to the program (geo, filesystem, text) and monitor where that goes in the program. For something more complicated like a browser, you might want to do the analysis per component (URL bar in this case).
wouldn't be perfect, but it's a starting point.
linting is tougher on closed-source software than open-source, but if a company certified a linter output and was found to be lying I'm comfortable with using the law to resolve that.
Except you'd never have a good enough dataflow analysis to work on arbitrary code without burying people with false positives. Especially in C++ code, where things like function pointers just destroy call graph precision (and therefore taint analysis precision).
Linting doesn't even give you this much. All it'd be able to tell you is "where in the program are calls to networking APIs being made" and maybe determining parameters if they are defined in the same function as the call.
This is where they need to sacrifice some computer security for physical security. By turning this off, a few people who don't follow good security practices might get malware. But no one will be sent to prison or "disappeared".
Apple has done a lot for privacy in its products and its public statements. But I believe that if it has to have a better impact and be trusted, it needs someone dedicated to privacy who will (ensure that it will) publish details of its products, apps and activities in an honest form in an accessible place (and updated more often than a once-a-year OS upgrade cycle). This kind of commitment to more transparency will help the company be trusted and also held up to questions. Said trust is already eroding with recent events. Apple shouldn’t be complacent and stick to its old ways.
Sadly, Apple also has a history of brushing things away or ignoring uncomfortable questions.
Are those Google/Tencent API requests done only when browsing with Safari, or are they done for any SFSafariViewController?
That would imply it’s also inside Brave/Firefox/Chrome...
Again I feel like I'm reaching out to be educated here.. but if Safari is attempting to validate URLs for safe browsing using the Google API (which it states it will do, quite openly), and Google products is quite clearly blocked in China so it resorts to Tencents API (which it states it will do, quite openly).. why does this seem to provoke anger?
I mean this in the most equitable way possible, I'm more trying to understand where Apple has done anything wrong here?
I think the audience in HN are crazy now. Why would you prefer Google than Tencent for same purpose of API? Should all Chinese scare that iPhone would send back all logs to California? Should they scare Tesla sent back all their driving data to US?
If you don't trust anything from China, would you destroy any electronics Made In China, including your smartphones, laptop, TV etc, or even some food?
It should be noted that Apple could very well proxy those requests to Google and Tencent to protect their customers' ip address, or even implement safe browsing on their own all together.
The fact that they don't means that either they trust Google and Tencent, or that they don't care about privacy.
Just elaborating on the method google uses here. The client sends a hash prefix of the url if there is a match in the local db. The server then sends back full url hashes. Other than your IP address, there is not much data that can be collected here.
It's designed to avoid leaking URLs, but I'd be a lot more comfortable if Safe Browsing worked by downloading a list of hashes to my computer and checking locally. That way, data never leaves my device.
That means giving you all the hashes, which is a lot of data, and you'd need to constantly update it because the whole point of Safe Browsing is the dynamism.
Whereas today your browser only needs the prefix list, which is much shorter and so can feasibly be updated more often without awful bandwidth costs. The full hashes in a prefix are only fetched (which is where we get "Apple is sending URLs" by squinting really hard at the facts) if you visit a URL with a hash with a known-bad prefix.
Interesting. I do remember people complaining about a large file in Firefox profiles, in the context of multi-user host admins wanting to be able to have it in a centrally managed location rather than replicated to every profile. I recall it being a sizable sqlite DB for Safe Browsing. I wonder what that was about, then.
Probably this is the page of Tencent safe browsing: https://urlsec.qq.com/ I don’t understand why you trust Google so much. It’s as untrustworthy as Tencent for me.
Based on the twitter conversation, it's NOT China only. It's Chinese localization only. Big difference. That means anyone anywhere in the world who set their computer to Chinese has their data sent. Including Europe which is likely a GDPR violation.
The google servers apparently takes url hash prefix. Does tencent do the same? If so is it still considered a gdpr violation? There is not much info in a url hash prefix.
Suppose peeps going to HN are suspect. Then anyone who often produces hash prefixes that match HN is suspect.
When you start getting sequences, you could possible start matching how people navigate a website.
Essentially, a hash-prefix allows you to rule out / semi confirm guesses about browsing behavior.
Remarkable when people become upset when it is explicitly stated your mobile tracking device sends information to third party servers, but deep down we all know the dangers are in what is not explicitly stated.