> 28 MB extra every month shouldn't even be noticeable Parent comment suggests p...

deogeo · on Oct 14, 2019

There's no need to sync every time a new phishing URL is added - only every time a URL is visited by a client.

The delta can be derived just from the version number of the client's URL database, and should be a total of 1 MB in size for a whole day's worth of updates. So ~1 MB for the 1st URL visited in a day, and considerably less afterwards. Compared to average webpage size, that's nothing.

Really, only thing that changes is instead of sending a URL hash, you send the URL DB version, and the reply is the list of changes since that version.

tomxor · on Oct 14, 2019

> Really, only thing that changes is instead of sending a URL hash, you send the URL DB version, and the reply is the list of changes since that version.

Or none at all and a simple confirmation that the list is up to date. Yes this is a way better idea.

Although it's always going to be less efficient. For instance i'm not sure how it would scale into the future. Checking URLs server side is optimal, it's always going to be relatively constant in proportion to the URL size, but with DB deltas each URL is now related to both the URL size and the DB update frequency, i.e as the malicious URL rate increases over time, individual URL lookups will incur greater network cost... this is probably not a big deal for the client, but It would make a significant difference for the provider of the deltas - or maybe network caching would disolve it again? I mean there would be a lot of duplicate deltas flying around every minute... basically a content distribution problem but with a high frequency twist.

pmoriarty · on Oct 14, 2019

Do you really think Tencent is detecting a new phishing site every 154ms?

I'd seriously question how many of the total new phising sites they detect to start off with, and then how frequently they do so.

If a user only downloads the deltas periodically they'd risk being out of sync with the master list (which might not be updated even once a month or at all, for all we know), but that's the price they'd need to pay to have to send any information about their web browsing to parties they don't trust.

One other thing to consider is the likelihood that the URL you happen to be surfing is both a phishing URL to begin with and one of the ones that just appeared since the last delta download you did, compared to the likelihood that it's one of those already in the entire phishing URL database you've already downloaded. I'd expect those odds to be very low.

tomxor · on Oct 14, 2019

> the likelihood that the URL you happen to be surfing is both a phishing URL to begin with and one of the ones that just appeared since the last delta download you did, compared to the likelihood that it's one of those already in the entire phishing URL database you've already downloaded. I'd expect those odds to be very low.

Ignoring the first condition (otherwise why bother with a list at all)... Consider that this information is very transient (average 15hrs), this is pretty simple: deltaT / 54000

This is still horrible, because your safety is determined by how frequently you can sync with the DB.

> If a user only downloads the deltas periodically they'd risk being out of sync with the master list (which might not be updated even once a month or at all, for all we know).

Being updated monthly doesn't match the statistic of average 15hr lifecycle, because it would be useless after that length of time. And while I don't claim to know 15hrs as a fact, it is intuitive that the average will become ever shorter as malicious URL checkers become updated ever faster.

> but that's the price they'd need to pay to have to send any information about their web browsing to parties they don't trust.

Full URL information need not be sent, a hash of the URL domain and path would probably suffice... if that's not enough then it's a dilemma, but that doesn't make continuous syncing a good or fail safe replacement.

pmoriarty · on Oct 14, 2019

"Being updated monthly doesn't match the statistic of average 15hr lifecycle, because it would be useless after that length of time."

And maybe it is useless. We don't actually know, but we should at least recognize that there may be a difference between how frequently phishing sites allegedly appear and how frequently they appear in Tencent's malware URL database.

"This is still horrible, because your safety is determined by how frequently you can sync with the DB."

And being identified by the Chinese government as someone who surfs to forbidden websites might be even more horrible, for some.