Hacker News new | past | comments | ask | show | jobs | submit login
Apple Is Sending URLs to Tencent? (twitter.com)
572 points by mathieutd 30 days ago | hide | past | web | favorite | 144 comments



The author of the tweet goes into more depth in a blog post: https://blog.cryptographyengineering.com/2019/10/13/dear-app...


Took a quick look, and this appears to be enabled if [NSLocale.currentLocale.countryCode isEqualToString:@"CN"]:

  char ____ZN7Backend6Google12SSBUtilities24shouldConsultWithTencentEv_block_invoke_2(void * _block) {
      rax = [NSLocale currentLocale];
      rax = [rax retain];
      r14 = [[rax countryCode] retain];
      [rax release];
      rbx = [r14 isEqualToString:@"CN"] != 0x0 ? 0x1 : 0x0;
      [r14 release];
      rax = rbx;
      return rax;
  }
Update: the code for Tencent Safe Browsing seems to be very similar to that which talks to Google, down to it being under a "Google" namespace, the API endpoints being named the same, and performing hashing which seems to match the "Update API" here: https://developers.google.com/safe-browsing/v4/update-api. I think this is just "whatever Google could see before, Tencent can see now, if you're in China". I'm no expert, so I have no idea if that's k-anonymous or whatever if Tencent/Google decide they want to track you, but in either case it's just shifting who's getting your hashes.


> if [NSLocale.currentLocale.countryCode isEqualToString:@"CN"]:

So even for US and EU based users the data is send to Tencent just because they enabled Chinese language support? Who programmed that?


iOS has separate region and language settings. Quick look at Apple docs suggests that this is the former.


No, it's the latter.

In NSLocale, "region" is a subtype of language, as in a regional dialect, not an independent dimension.

https://developer.apple.com/documentation/foundation/nslocal...

https://developer.apple.com/library/archive/documentation/Ma...


Nope, [[NSLocale currentLocale] countryCode] returns Region country code from settings. The same code is also used for language region, so you can end up with something like zh_US.


I wonder if there are similar oversights with the US locale, seeing as a lot of developers prefer English interfaces to the somewhat craptastic localizations.


CN is the country code, not the language code. zh-CN is simplified chinese localized to mainland China. If you want simplified chinese try something like zh-HK or zh-SG.

But there is valid criticism to be had that Apple should be signposting more visibly the differences between its settings for CN and outside CN.


Your data is sent to Google otherwise.


...which is what I strongly prefer (rule of law and all that). Of course, as others have said, on-device processing would be the best.


Not just EU and US based users, but also Hong Kong and Taiwan based users.


Hong Kong, Macao and Taiwan all have their own ISO 3166 codes and users there are unlikely to accidentally set the region to CN, since the difference between simplified and traditional characters is quite obvious.


This sounds like the more dangerous story here. What the heck?


Google is blocked in China so naturally they'd need a Chinese alternative, with everything going on it's easy to fear monger but people need to chill out a bit. Locale is probably one of the least intrusive ways to determine location, using GPS would probably cause an even further problem if people realise that there's a backdoor to avoid location permission

Any company that markets/releases in China and relies on some google service (maps/safe search/safety net/google sign in/firebase/etc) need to find an alternative, not because everyone is on the Chinese payroll but more often than not these services are business critical.


Wouldn't the locale be set to CN for phones which are in non-china countries too?


Language and Locale are separate preferences. You can mix and match however you want on iOS.


Oh I see. That is where I got confused then.


What kind of code am I looking at, it seems pretty cool. I this some automatically 'reverse compiled' assembly?

In any case, I'd love to know how you generated this. Would be very cool to get something similar out of an executable.


It's "decompilation" of a block invoke for Backend::Google::SSBUtilities::shouldConsultWithTencent() taken by opening /System/Library/PrivateFrameworks/SafariSafeBrowsing.framework/SafariSafeBrowsing in Hopper Disassembler.


SafariSafeBrowsing.framework


> ____ZN7Backend6Google12SSBUtilities24shouldConsultWithTencentEv_block_invoke_2

I'm glad I don't use Objective-C... That's some Java level function naming there.

Edit: may have spoke too soon, appears to be possible reverse engineered / decompiled?


  bash$ c++filt <<< ____ZN7Backend6Google12SSBUtilities24shouldConsultWithTencentEv_block_invoke_2
  invocation function for block in Backend::Google::SSBUtilities::shouldConsultWithTencent()
Edit: what are downvotes for? That is the standard way to decipher C++ mangling, using built-in (binutils) tools.


You’re looking at the method name after the compiler got done mangling type information into it for the linker. The human-readable name that (what appears to be an anonymous block) likely appears within in the source code appears to be “shouldConsultWithTencent” in a namespace (class?) “Backend::Google::SSBUtilities”.

The other line noise encodes return and argument types via the process of Name Mangling: https://en.wikipedia.org/wiki/Name_mangling


This is a block invoke for a mangled C++ function. You'd know it as a lambda inside of Backend::Google::SSBUtilities::shouldConsultWithTencent().


It appears to be the symbol in the binary. C++ also does similar things. It's called mangling. https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling

It's also the reason you sometimes need to extern c or otherwise mark symbols being exported via a c-abi in c++ so that they don't get mangled.


This is really a "damned if you do, damned if you don't" kind of situation.

They can either use Tencent's Safe Browsing API as a drop-in replacement for Google's API, relying on k-anonymity to leak as little information as possible. That leaves them open to accusations that they allow Tencent (or, for that matter, Google) to track the browsing history of Safari users.

Or they can essentially turn off Safe Browsing in China. (Google's API is collateral damage of the Great Firewall.) That leaves their users unprotected against all kinds of malware and scams.

I think they made the right call here by protecting users against the most common threat (most people are not dissidents), while giving advanced users with a different threat model the opportunity to opt out.


"Or they can essentially turn off Safe Browsing in China."

The OP as well as the associated blog post[1] as well as the Apple-provided fine-print language do not make it clear to me that this "feature" is exclusively enabled for Chinese users (or, perhaps Chinese IPs).

Could someone point to a source that confirms a US person, in the US, with a US-purchased iphone, would not have their browsing history transformed and sent away for analysis to tencent ?

[1] https://blog.cryptographyengineering.com/2019/10/13/dear-app...


If this source is to be believed, it's either going to Google or Tencent, but never both:

https://twitter.com/eromang/status/1183422784082530304/photo...

You can try yourself by going to one of the IOS Safe Browsing test pages on your phone, and when the warning pops up click "Show Details". It'll either say Google or Tencent on the warning message, which should let you know which one got chosen for you.

https://testsafebrowsing.appspot.com

I just tried it, and it says Google for me in the US.


Great. I disabled safe browsing probably back when it first appeared on my iPhone 3G or 4 and this test confirms I’m still not sending urls to anyone whilst surfing on my iPhone 11. Nice job preserving these settings over countless device upgrades.


For anyone else wanting to disable it (or at least learn more about it), the feature is labeled in iOS settings under Safari > Fraudlent Website Warning.


Interestingly, none of those links triggered a warning for me on my Mac…


You've probably switched off Safe Browsing. I wouldn't advise anybody to do that unless they're _so_ sure that they don't need Safe Browsing that when (most likely rather than if) they get infected by Malware or fooled by Phishing they are confident they'd tell everybody they know what an idiot they are.

I have it switched off on my home PCs (but on for work). But then I also don't carry home insurance and when I was flooded I believe my first Facebook message began "This is probably a good time for you to say 'I told you so'" because that seems like the right sentiment.


I haven't.



Alternatively, they could purchase the data from Tencent or another company, and operate their own version of the service. That may even be what they’re doing —- but we don’t know, since they launched the service with no details or publicity.


Yet another approach is to send the entire list of all malware URLs to each client and let the client do all the processing on their end.

This way no data (hashed, anonymized, truncated, or otherwise) would need to be sent to Tencent, Apple, or Google, or anyone else.


Why is this getting downvoted? I'm also interested in why this approach isn't taken.


Chrome does something like this, you can learn more about it here: https://codereview.chromium.org/6286072/


A couple possibilities I can think of:

* the list may be prohibitively large

* it exposes to the bad actors exactly which of their scams is detected, so they can simply refine their methods until their sites don’t make “the list”


Bad actors can also occasionally poll the safebrowsing API.


Bloom filters take care of the first. There will always be an arms race between attack and defense, so I'm not concerned about the second issue.


Bloom filters can give false positives, and to eliminate them, you'd need to send "data (hashed, anonymized, truncated, or otherwise)" to some entity that has the full list. That's exactly how Google's Safe Browsing API works.


Can't the bad actors already check each of their sites individually by pretending to be a normal user?


Exactly how large is it?

OS vendors already make a habit of regularly sending gigantic OS updates. I'd have a hard time believing that a compressed list of malware URLs would be noticeably bigger, by comparison.

Also, once the list is sent the first time (or just included with the OS so it'd be already present on your device when you bought it), they could just send the deltas as the list changed, and those deltas (especially once compressed) should be relatively small even compared to the original (probably not that large) list.


Google Safe Browsing transparency report lists 40k new bad URLs per week - how large do you think the list is now? It is far, far too large for local processing but k-anonymity is perfectly trustworthy when used with cryptographic hashing.


Wouldn’t it be pretty easy for the bad actors to check the database anyway? I can’t imagine they would need to query often enough to hit any rate limits.


Local handling seems more Appley too.


Downvoters who think that might be too much data don't know about Bloom filters.


Bloom filters are likely useless in this situation - following facts for phishing only:

1. Phishing sites have a lifecycle of about 15 hours.

2. Most malicious links are hidden within benign domains.

3. About 400,000 phishing sites are created each month.

From: https://www.itgovernance.co.uk/blog/4-eye-opening-facts-abou...

I haven't run the numbers, but I am guessing that a clientside solution would have a lot of bandwidth sucking and avoiding false positives is very important.

Also with a clientside solution, how are new phishing URLs detected?

PS: perhaps try to assume HNers know what a Bloom filter is (I've seen them come up lots of times in comments).


Google's safe browsing API is probabilistic too. The idea is that you do so many rounds of checking to get closer and closer to the mark. You start with a fairly high false positive probability, high-privacy check, then if you get a positive, you try a lower false positive rate check that also loses you some privacy, and the trade-off is that you don't have to have the full malicious site DB with you at all times (and keep it up to date).

Why did you assume I'd not know about false positives?


400,000 sounds like a lot, but I wonder how many new URLs Tencent adds to its database each month. I expect they don't add every phishing URL but some small subset of them (possibly even a very small subset.. we'll proably never know).

But let's say it is 400,000. I took the URL you linked and made a file of 400,000 copies of it. The file size was 28 MB. I didn't bother compressing that particular file since the URL is the same in each instance, but I expect a file full of actual phishing URLs would probably compress pretty well, so it would probably be significantly less than 28 MB.

Considering that OS vendors regularly ship multi-gigabyte size updates, having to download less than 28 MB extra every month shouldn't even be noticeable. If updates needed to be done more frequently, the client could subscribe to get regular updates as they become available.


> 28 MB extra every month shouldn't even be noticeable

Parent comment suggests phishing site life-cycle <15hrs, at 400k a month that's 8333 every 15 hrs. To give an idea of how frequency sensitive this is, assume URLs are added equidistributed in time: that would be a new one every 154ms - for such time critical information it makes no sense to attempt to synchronize clients, it would require constant polling or push updates to have _any_ chance of catching a malicious URL.

At such a frequency, efficiency becomes less about bandwidth and more about the overhead of continuously synchronising so many clients (think of that 28 MiB spread out over 400k separate messages over one month, one every 154ms, that not only inflates the size, but causes a constant network usage and processing that is far less efficient than a single 28MiB download).

Or you could just send the URL hash when you visit a URL... (do you request any where near 8k URLs every 15hrs?, 1 URL every 154ms? no), it's so clearly a simpler solution that will be faster for everyone without letting bad URLs slip through before a latent sync.


There's no need to sync every time a new phishing URL is added - only every time a URL is visited by a client.

The delta can be derived just from the version number of the client's URL database, and should be a total of 1 MB in size for a whole day's worth of updates. So ~1 MB for the 1st URL visited in a day, and considerably less afterwards. Compared to average webpage size, that's nothing.

Really, only thing that changes is instead of sending a URL hash, you send the URL DB version, and the reply is the list of changes since that version.


> Really, only thing that changes is instead of sending a URL hash, you send the URL DB version, and the reply is the list of changes since that version.

Or none at all and a simple confirmation that the list is up to date. Yes this is a way better idea.

Although it's always going to be less efficient. For instance i'm not sure how it would scale into the future. Checking URLs server side is optimal, it's always going to be relatively constant in proportion to the URL size, but with DB deltas each URL is now related to both the URL size and the DB update frequency, i.e as the malicious URL rate increases over time, individual URL lookups will incur greater network cost... this is probably not a big deal for the client, but It would make a significant difference for the provider of the deltas - or maybe network caching would disolve it again? I mean there would be a lot of duplicate deltas flying around every minute... basically a content distribution problem but with a high frequency twist.


Do you really think Tencent is detecting a new phishing site every 154ms?

I'd seriously question how many of the total new phising sites they detect to start off with, and then how frequently they do so.

If a user only downloads the deltas periodically they'd risk being out of sync with the master list (which might not be updated even once a month or at all, for all we know), but that's the price they'd need to pay to have to send any information about their web browsing to parties they don't trust.

One other thing to consider is the likelihood that the URL you happen to be surfing is both a phishing URL to begin with and one of the ones that just appeared since the last delta download you did, compared to the likelihood that it's one of those already in the entire phishing URL database you've already downloaded. I'd expect those odds to be very low.


> the likelihood that the URL you happen to be surfing is both a phishing URL to begin with and one of the ones that just appeared since the last delta download you did, compared to the likelihood that it's one of those already in the entire phishing URL database you've already downloaded. I'd expect those odds to be very low.

Ignoring the first condition (otherwise why bother with a list at all)... Consider that this information is very transient (average 15hrs), this is pretty simple: deltaT / 54000

This is still horrible, because your safety is determined by how frequently you can sync with the DB.

> If a user only downloads the deltas periodically they'd risk being out of sync with the master list (which might not be updated even once a month or at all, for all we know).

Being updated monthly doesn't match the statistic of average 15hr lifecycle, because it would be useless after that length of time. And while I don't claim to know 15hrs as a fact, it is intuitive that the average will become ever shorter as malicious URL checkers become updated ever faster.

> but that's the price they'd need to pay to have to send any information about their web browsing to parties they don't trust.

Full URL information need not be sent, a hash of the URL domain and path would probably suffice... if that's not enough then it's a dilemma, but that doesn't make continuous syncing a good or fail safe replacement.


"Being updated monthly doesn't match the statistic of average 15hr lifecycle, because it would be useless after that length of time."

And maybe it is useless. We don't actually know, but we should at least recognize that there may be a difference between how frequently phishing sites allegedly appear and how frequently they appear in Tencent's malware URL database.

"This is still horrible, because your safety is determined by how frequently you can sync with the DB."

And being identified by the Chinese government as someone who surfs to forbidden websites might be even more horrible, for some.


People who think they know about Bloom filters should consider what attack vectors a false positive with such would allow.

(The end result of the thought experiment will be basically what Google does now.)


Running their own service would be most ideal. They are definitely not. https://github.com/Igalia/webkit/blob/9777baa3db09cad7ed5b2c....


> This is really a "damned if you do, damned if you don't" kind of situation.

Why? Simply, when setting up the device/browser, let the user choose what safe browsing API the browser shall use (both, one of them, or none).

Letting the user make a conscious choice is the best way to handle "damned if you do, damned if you don't" kind of situation. To make the choice as conscious as possible for the user, provide additional material that explains the advantages and disadvantages of each option for the user so that the user is well-informed before he/she makes his/her choice.


That’s entirely against the entirety of Apples modus operandi. It’s always make the user have as little choice as possible and assume that users are idiots. The only exception they made is to developer with cli abilities, and even that they have began to restrict


> assume that the users are idiots

Microsoft didn’t make this assumption. Apple did. My entire extended family are now all Apple users. My tech support calls per year can now be counted on one hand.

In my extended family, the users are idiots, and thanks to Apple, I’m poorer financially but significantly richer in free time.


I didn't say it was a poor choice, rather, it's a great choice.

I'd rather be an idiot in the areas that I don't have an expertise in, I have no interest in plumbing and I would blindly follow the suggestion that the plumber who came to my house made. And I guess this strategy worked well for Apple.


That is exactly my experience too. My parents just kept calling me when they were using Android phones. I bought them iPhone to get my free time back.


There is not much difference there. iPhone might be more confusing for people who are used to android.


On an Android device, there are lots of relatively simple to follow guides to be able to download 'free games'.

Rooting your phone is all well and good if you know what you're doing. It's not so good when you don't, and I'm your tech support phone call.


I believe an argument could have been made that it was the right call if they had publicized it. Seeing as how it was implemented in the background, I am less inclined to give them the benefit of the doubt.


Do you think a decentralized database of unsafe URLs could exist?


I'm curious if, as @thefalken brought up [0], this is illegal under the GDPR, given that it's a hidden opt out and should apply to EU citizenry with browser language set to Chinese.

[0] https://mobile.twitter.com/thefalken/status/1183445477645312...


Very doubtful, even with the "hidden opt out" that seems to be sufficiently poorly "hidden" that lots of people here have indeed opted out.

Safe Browsing uses very little data (pretty much the least they could get away with to make it work) and you'd have to establish either that Tencent is lying about how it uses that data AND that Apple knew or reasonably should have known that it was misused.

URLs never leave your browser, so "Apple is sending URLs" is wrong. The Update API is used, so the URLs stay on your browser but under some circumstances hash prefixes of some URLs are sent to Google/ TenCent.

If you choose to assume that Google / TenCent are bad actors then they can probably manipulate this data to target a few URLs and discover who (IP addresses) browsed those URLs. In less well designed browsers like Safari they might be able to tie that to a Google Account independent of the IP address because those browsers don't isolate Safe Browsing API calls from normal web browsing activity (this won't work in e.g. Firefox). If a bad actor did this, it would make performance worse for all users, and the accuracy of the trick would be sabotage unless the set of target URLs tracked is fairly small, if you were looking for a single PDF filename on a single web site it's definitely possible, if you want to track six thousand different articles about Xi's resemblance to Pooh Bear across tens of thousands of sites that's going to cause a lot of false positives you have to weed out somehow.


Forget the URLs, it’s my IP address I’m worried about.


If region is set to China, not just language. Locale has two components, like in en_US.


That doesn't mean that the user is in China. It means that the user wants their interface in Chinese as it is written in mainland China. In other words, the CN means simplified Chinese instead of traditional Chinese, which is what the TW region code corresponds to.


The GP poster is incorrect; the Region setting has nothing to do with setting the region code of the Language setting (each language+region pair being its own listing in Languages.) The Region you choose during initial device setup does determine your default Language region, but you can pick a different one while keeping the same Region.

The Region setting in iOS is literally just the question "what Country [or Country-equivalent political region] would you like to be considered to be in, when we make certain OS features be dependent on your country?"

This is separate from what country the phone treats you like you're actually in, geographically, which is determined moment-to-moment by geolocation and cellular profiles. (Time zone? Geolocation. Maps domestic/foreign feature display granularity? Geolocation.)

Whereas, Region is for things like, say, whether you see certain apps or features that are in partial progressive rollout; or whether you see features offered that don't make sense outside of certain regions.

Re: the first example, the News app, which rolled out in the US first, could be made to appear in other countries by setting your Region to the US. When this was done, the News app, if launched, would still detect what country you were actually in (geolocation-wise), and would make a best-effort attempt at showing news from the few sources Apple had made agreements with so far from that country.

Re: the second example, iOS has social-network "Accounts" integration with Sina Weibo, QQ, etc. just like it has integration with Facebook/Twitter/etc. It just doesn't display these sign-in options unless your phone is set to the China "Region." Because, if you're not in China or from China, why would you ever use these networks? (Note that Apple designs iOS under the assumption that people won't bother to change their Region when they travel; so it really is more of a "where are you from" rather than "where are you now" question.)


This is incorrect. en_GB doesn't mean you're in or from Great Britain. It means you want the device to show English as it is used in Great Britain, with extraneous "u"s and rearranged month and day. A user in the US can request that locale instead of en_US if that is the language they prefer. Locale is for localization of the interface, not for telling where you are from.

See the Australian English example in https://en.m.wikipedia.org/wiki/Locale_%28computer_software%...

Now maybe iOS sets the locale based on where the user is from instead of based on how the user would like their interface localized. If it does, it is doing it wrong. Sending a user's data to Tencent based on a setting instead of based on their location is absolutely wrong.


> This is incorrect. en_GB doesn't mean you're in or from Great Britain.

You misinterpreted. "Region" is a setting in iOS. But iOS "Region" has nothing to do with the "region" part of a locale. Setting your iOS "Region" to "Great Britain" and setting your "Locale" to "English (Great Britain)" are separate things. "Region" is just what iOS happens to call a completely distinct thing. If you like, to lessen your confusion, pretend it is called something different.

> Sending a user's data to Tencent based on a setting instead of based on their location is absolutely wrong.

You wouldn't want your phone to start sending data to Tencent as soon as you cross the border into China, right? And, vice-versa, you would expect a person from China, who thinks Tencent is a great brand, to not want to stop sending their data to Tencent just because they cross the border out of China, right?


> You wouldn't want your phone to start sending data to Tencent as soon as you cross the border into China, right?

You most likely would. Google's service will be unreachable from within China. If it didn't switch providers, you would have no Safe Browsing protection. The key thing is to obtain consent from the user the first time this happens.


What should they do?


They should send to Tencent based on network location. If you're inside the Great Firewall, Google's safe browsing service will be unreachable. If you're outside the Great Firewall, you really don't want to use services through it if possible. https://arstechnica.com/information-technology/2015/04/ddos-...


I think their point is that changing your device language is not the same thing as changing your region. Changing your language is a simple setting, but changing your region involves re-accepting the ToS for that region. So technically they would have to click Agree on the document linked in the tweet in the OP.


The code is checking the region part of the locale, which is CN for china. The language code for chinese is zh.


“en_US” is “American English”, not “English on a Phone in america”. The alternative “zh-*” codes are SG, TW, or HK. It’s checking if the user has their region set to “Mainland Chinese”, not That their phone is “Chinese on a phone in China”.


Actually, american english is "en-US", "en_US" means english with the region set to the US, at least on iOS. But yes, it is checking that their region is set to mainland china.

https://developer.apple.com/library/archive/documentation/Ma...


Didn't realize this, thank you for clarification!


The code appears to be used for fraud related purposes, meaning, to my understanding, Apple would likely argue it has a legitimate interest.

There’s a lot of legal language around this exception, but fraud is directly called out as a legitimate interest and means that the group controlling the data would not need to obtain user consent.

For additional reading, I’d recommend the following post: https://www.gdpreu.org/the-regulation/key-concepts/legitimat...


Is apple the data controller here since it's all happening on the users' device? And does "legitimate interests" extend beyond the data controller's interests? I.e. if it's only about fraud against apple then safe browsing (which is supposed to protect the user from fraud) would not necessarily be a legitimate interest of apple. It might have to be opt-in at least.


Great questions, which I know I'm not equipped to answer authoritatively - prior comment was just my two-cents on how I'd expect Apple to argue the issue (And even that argument may be a losing one).

In opposition to the fraud argument, one could argue they wouldn't reasonable be expected to have their data forwarded to China. The counter-argument to that would likely be along the lines of users who have their localization set to China might have more of an expectation of this. And so the lawyer fees continue to increase in what would be an incredibly interesting case, honestly.


If it's illegal under the GDPR to send the data of EU citizens with browser language set to Chinese to Tencent, it's also illegal to send the data of EU citizens with browser language set to anything else to Google. Chrome, Firefox, Safari and probably all Chromium-based browsers (unless they disable Safe Browsing by default) use Google's API and would be in violation, too.


That's true, but it's probably covered in the privacy notice. It doesn't make a difference that the data is shared outside of EU, it just has to be communicated to the user.

Also, the data shared here is not personal information, unless it's connected with personal information such as IP address or a tracking cookie.

This is pretty gray area. Apple isn't necessarily sharing information with Google, it's just the property of Internet traffic that Google / Tencent can collect the IP address from the request. Same happens when websites include resources from other websites (images, scripts, etc.), and these are not typically taken into account in GDPR privacy notices.


Apple says that they share your IP address and that it “may” be recorded by TenCent and Google.


That's not necessarily true, since the GDPR imposes extra restrictions to sending data to countries not covered by the GDPR (essentially, outside the EEA) or that are deemed by the EU to offer equivalent protection. I don't know where Tencent has these servers, but Google has servers in the EU and managed by an EU-based subsidiary.



Would you mind linking to the upstream repository instead? GitHub doesn’t let you search in forks.


https://github.com/WebKit/webkit/blob/master/Source/WebKit/U...

(For some reason "search in this repo" doesn't work for keyword `malwareDetailsBase` [1], but it's there)

[1] https://github.com/WebKit/webkit/search?q=malwareDetailsBase...


URL to same code search on Sourcegraph (which works): https://sourcegraph.com/search?q=repo%3Awebkit%2Fwebkit+malw...

(Disclaimer: I am the Sourcegraph CEO.)


Unrelated, but I like the fact that you support prefers-color-scheme!


It shows

Search timed out Try narrowing your query, or specifying a longer "timeout:" in your query.

right now.


Sorry about that. There must’ve been a brief blip during a moment of intense load or a redeploy. Is it working for you now?


It literally says it’s going to send links to Google Safe Browsing and Tencent Safe Browsing in the Safari setting page under “Safari and Privacy”


That’s not what it says.


Which one of these comments is right? You both can’t be.


> Before visiting a website, Safari may send information calculated from the website address to Google Safe Browsing and Tencent Safe Browsing to check if the website is fraudulent. These safe browsing providers may also log your IP address.

This is quite different from sending links.


every form of software phone-home is sleazy

we should be linting code to say whether it phones home or not, and what it uploads when it does. plain language privacy policies and ever-changing browser settings are leaving huge gaps.

when the US government bought chinese drones they hired a consultant to prove that the drones never call home.


> we should be linting code to say whether it phones home or not

Is that possible? How do you diffentiate it from expected API calls?

(Not convinced black/white-listing strings is any different from code review in this case - it'll just be changed on demand if if prevents adding what was tried to be added.)


it's theoretically possible. I don't know of any tools that do it (which could be a comment on my research skills rather than the state of the art).

in theory you can do dataflow analysis on all external inputs to the program (geo, filesystem, text) and monitor where that goes in the program. For something more complicated like a browser, you might want to do the analysis per component (URL bar in this case).

wouldn't be perfect, but it's a starting point.

linting is tougher on closed-source software than open-source, but if a company certified a linter output and was found to be lying I'm comfortable with using the law to resolve that.


Except you'd never have a good enough dataflow analysis to work on arbitrary code without burying people with false positives. Especially in C++ code, where things like function pointers just destroy call graph precision (and therefore taint analysis precision).

Linting doesn't even give you this much. All it'd be able to tell you is "where in the program are calls to networking APIs being made" and maybe determining parameters if they are defined in the same function as the call.


Trial use case: a small FOSS codebase in a pointer-less language. The goal isn't perfect safety, it's to be safer than we are now.


Feel free to use any of the dozens or hundreds of such tools developed by the academic community and experience the imprecision yourself.


examples pls


This is where they need to sacrifice some computer security for physical security. By turning this off, a few people who don't follow good security practices might get malware. But no one will be sent to prison or "disappeared".


Apple has done a lot for privacy in its products and its public statements. But I believe that if it has to have a better impact and be trusted, it needs someone dedicated to privacy who will (ensure that it will) publish details of its products, apps and activities in an honest form in an accessible place (and updated more often than a once-a-year OS upgrade cycle). This kind of commitment to more transparency will help the company be trusted and also held up to questions. Said trust is already eroding with recent events. Apple shouldn’t be complacent and stick to its old ways.

Sadly, Apple also has a history of brushing things away or ignoring uncomfortable questions.


Are those Google/Tencent API requests done only when browsing with Safari, or are they done for any SFSafariViewController? That would imply it’s also inside Brave/Firefox/Chrome...


Again I feel like I'm reaching out to be educated here.. but if Safari is attempting to validate URLs for safe browsing using the Google API (which it states it will do, quite openly), and Google products is quite clearly blocked in China so it resorts to Tencents API (which it states it will do, quite openly).. why does this seem to provoke anger?

I mean this in the most equitable way possible, I'm more trying to understand where Apple has done anything wrong here?


We can’t tell whether non-China data goes to Tencent—intentionally or by some bug or adversarial problem.


The code [1] along, with this explanation [2] does seem to show that it only happens for devices with the country code set to CN.

[1]: https://github.com/Igalia/webkit/blob/9777baa3db09cad7ed5b2c... [2]: https://news.ycombinator.com/item?id=21242628


I think the audience in HN are crazy now. Why would you prefer Google than Tencent for same purpose of API? Should all Chinese scare that iPhone would send back all logs to California? Should they scare Tesla sent back all their driving data to US? If you don't trust anything from China, would you destroy any electronics Made In China, including your smartphones, laptop, TV etc, or even some food?


It should be noted that Apple could very well proxy those requests to Google and Tencent to protect their customers' ip address, or even implement safe browsing on their own all together. The fact that they don't means that either they trust Google and Tencent, or that they don't care about privacy.


Wait, Apple is Sending URLs to Google ?


Just elaborating on the method google uses here. The client sends a hash prefix of the url if there is a match in the local db. The server then sends back full url hashes. Other than your IP address, there is not much data that can be collected here.


I think a lot of browsers do for the "Safe Browsing" checks


The Safe Browsing API is deliberately designed to avoid leaking the contents of URLs to Google. You can read about how it works here: https://developers.google.com/safe-browsing/v4/update-api


It's designed to avoid leaking URLs, but I'd be a lot more comfortable if Safe Browsing worked by downloading a list of hashes to my computer and checking locally. That way, data never leaves my device.


That means giving you all the hashes, which is a lot of data, and you'd need to constantly update it because the whole point of Safe Browsing is the dynamism.

Whereas today your browser only needs the prefix list, which is much shorter and so can feasibly be updated more often without awful bandwidth costs. The full hashes in a prefix are only fetched (which is where we get "Apple is sending URLs" by squinting really hard at the facts) if you visit a URL with a hash with a known-bad prefix.


Firefox downloads a big blob of unsafe URLs and checks against that, last I saw.


No, Firefox uses the exactly same Safe Browsing protocol. In fact, the protocol was co-developed by Google and Mozilla.


Interesting. I do remember people complaining about a large file in Firefox profiles, in the context of multi-user host admins wanting to be able to have it in a centrally managed location rather than replicated to every profile. I recall it being a sizable sqlite DB for Safe Browsing. I wonder what that was about, then.


Default configuration of Chrome sends whatever you type, while you are typing it, in the location bar to Google.

And Firefox can do that too, no idea what their default configuration is.


If I remember right, Firefox asks if you want to enable search suggestions right where they would appear, in the drop-down menu.


In Firefox, search suggestions are only enabled by the default in the Search field, not in the URL bar.


Probably this is the page of Tencent safe browsing: https://urlsec.qq.com/ I don’t understand why you trust Google so much. It’s as untrustworthy as Tencent for me.


The safe browsing seems to work in private mode or am I missing something


Why is this more controversial than Apple sending URLs to Google?


Tim Apple better have an explanation for this one.


(edit: the source in question has removed the tweet, so I have too)


> China only.

Based on the twitter conversation, it's NOT China only. It's Chinese localization only. Big difference. That means anyone anywhere in the world who set their computer to Chinese has their data sent. Including Europe which is likely a GDPR violation.


The google servers apparently takes url hash prefix. Does tencent do the same? If so is it still considered a gdpr violation? There is not much info in a url hash prefix.


Suppose peeps going to HN are suspect. Then anyone who often produces hash prefixes that match HN is suspect. When you start getting sequences, you could possible start matching how people navigate a website.

Essentially, a hash-prefix allows you to rule out / semi confirm guesses about browsing behavior.


Looks like the twitter post you referenced was deleted. That user's only recent post is about super mario maker 2...


I'm getting a "Page doesn't exist" for your link FYI.


More details on how this works, for Google at least, here:

https://developers.google.com/safe-browsing/v4/update-api#ch...


This is only one of the APIs. Full doc at:

https://developers.google.com/safe-browsing/v4


I am getting a "Sorry, that page doesn’t exist!" for the twitter link you shared.


It's better than sending URLs to Google, in my view.


Remarkable when people become upset when it is explicitly stated your mobile tracking device sends information to third party servers, but deep down we all know the dangers are in what is not explicitly stated.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: