
First look at Apple/Google contact tracing framework - dmvaldman
https://twitter.com/moxie/status/1248707315626201088
======
est31
Note that years ago, Moxie has studied a similar problem of how to let users
know if their contacts use Signal or not without uploading the whole address
books like e.g. WhatsApp does [0]. It's similar because in both instances you
want to "match" users in some fashion using a centralized service while
keeping their privacy.

He ruled out downloads of megabytes of data (something that the Google/Apple
proposal would imply) and couldn't find a good solution beyond trusting
Intel's SGX technology, arguably not really a good solution but better than
not adopting it at all [1].

You have kind of a computation/download/privacy tradeoff here. You can
increase the time interval of the daily keys to weeks. Gives you less stuff to
download but the devices have to do more hashes to verify whether they have
been in contact with other devices. You can increase the 10 minutes to an
hour. That means less privacy and more trackability, but also less computation
needed.

My guess to why Google/Apple didn't introduce rough location (like US state or
county) into the system was to prevent journalists from jumping onto that
detail and sensationalizing it into something it isn't (Google/Apple grabbing
your data). Both companies operate the most popular maps apps on the planet as
well as OS level location services that phone home constantly so they are
already in possession of that data.

[0]: [https://signal.org/blog/contact-
discovery/](https://signal.org/blog/contact-discovery/)

[1]: [https://signal.org/blog/private-contact-
discovery/](https://signal.org/blog/private-contact-discovery/)

~~~
tinus_hn
The proposed system requires download of 16 bytes per infected user per day.
Unless this really gets out of hand that’s not in the megabytes range.

~~~
jeremyjh
Yes, this is where OP lost me.

> Published keys are 16 bytes, one for each day. If moderate numbers of
> smartphone users are infected in any given week, that's 100s of MBs for all
> phones to DL.

"Moderate" rate of infections is not millions of new cases per week worldwide.
That would be such a catastrophe that contact tracing would be useless.

~~~
ralfd
Currently there are 1.2 million active infections. Doesn’t this mean every
smartphone in the world would need to download 17 MB per day?

If more cases would be tested in India or Africa or Sputh America a ten fold
increase wouldn’t be unthinkable.

~~~
jeremyjh
No, my understanding is you only would download two weeks worth of keys when a
new infection is reported. There is an assumption in any method of contact
tracing that once people test positive that they are isolating themselves. If
they don't, there is no reason to do the tracing since the virus will simply
spread exponentially.

------
hn_throwaway_99
Regardless of the technical issues with this, I think the "prank" issue Moxie
brings up is much more serious. We've already seen the phenomenon of "Zoom
bombing", I can imagine "tracer bombing" would be a much more serious issue.
The only way I could see this working is that if when you enter a positive
result you have to enter some sort of secret key from the testing authority,
but that's totally not tenable given a lot (most?) testing these days is from
private providers.

~~~
Reelin
Why wouldn't the patient provide their framework info (if they so chose) at
the time of sample collection? Then the medical authority could report it to
the local government on the patient's behalf in the event of a positive test.
Other end users then decide which (if any) "reporting authorities" to pull
data from and check against.

This also seems to address Moxie's concern about public location data being
necessary (unless I've missed something). If I only pull all the positive
tests from my local county or state, that should hopefully be a small enough
dataset to be manageable even on fairly resource constrained low end devices.

~~~
ehsankia
My understanding too was that there was a middleman involved in collecting and
distributing the keys, to avoid people spamming the system. You want to be
100% sure it's a positive, and not put the trust in the user. Otherwise random
people could just say they have it. The local government would have to submit
the keys as you mention and act as moderators for that region.

~~~
Reelin
> The local government would have to submit the keys as you mention and act as
> moderators for that region.

There's a big difference between a centralized and decentralized model here.

* Centralized, there's a single (or only a few) worldwide APIs that you need approval to work with. This also hinders interoperability of different end-user app implementations.

* Decentralized, anyone can set up a distribution server and require whatever authentication they'd like for it. A local government, a hospital, the Red Cross, etc. The framework becomes nothing more than a decentralized protocol that can potentially even be repurposed for other novel uses.

For the decentralized approach, bear in mind that there's nothing preventing a
third party from hosting and managing a distribution server on behalf of
someone else. So (for example) the CDC could host a server (and handle
authentication) for a state or county government that didn't feel up to the
task.

Another example, say the local hospital has their own database (possibly
hosted by the state or Google or whoever). They can feed their (authenticated,
locally collected) data to a local authority (the city or county), which only
needs to accept data from trusted institutions (ie all the hospitals in the
area). They can in turn feed this inherently trustworthy data to a state
system, and so on. If each entity in this hierarchy makes their dataset
publicly available, then users can independently decide which datasets are
relevant to them (perhaps they traveled recently?) and check them on a daily
basis.

~~~
ehsankia
It doesn't really matter who hosts the database. I specifically was talking
about middleman, as in someone who confirms the person is infected and then
takes care of passing 14 days of keys to the server. Where the server is isn't
really relevant here, just that the end-user doesn't have direct access to it.

------
krcz
> So first obvious caveat is that this is "private" (or at least not worse
> than BTLE), _until_ the moment you test positive. > At that point all of
> your BTLE mac addrs over the previous period become linkable.

Linkable over the period of 14 days. Or even linkable during one day - each
day means new key, so linking between these might be attempted only on basis
on behavioral correlations.

What to do with such data? Microanalysis of customer behaviors? It won't be
possible to use such data for future customer profiling, as it won't be
possible to match the history with identifiers after the infection. This data
is practically worthless.

~~~
dbbk
Yes that's the point...?

------
olliej
Let's just answer these

* Use stationary beacons to track someone’s travel path

Doesn't work because there's no externally visible correlation between
reported identifiers until after the user chooses to report there test result.

* Increased hit rate of stationary / marketing beacons

Doesn't work because they depend on coherence in the beacons, and the
identifiers roll every 10 or so minutes. Presumably you'd ensure that any
rolling of the bluetooth MAC also rolls the reported identifier.

* Leakage of information when someone isn’t sick

The requests for data simply tell you someone is using an app - which you can
already tell if they're using app.

The system can encourage someone to get tested, if your app wants to tell
people to get tested, then FairPlay to that app (though good luck in the US).

\- Fraud resistance

Not a privacy/tracking concern, though I'm sure devs will have to do something
to limit spam/dos

~~~
FartyMcFarter
> Doesn't work because there's no externally visible correlation between
> reported identifiers until after the user chooses to report there test
> result.

So you're saying it works after the user reports their test result.

~~~
olliej
I'm not sure what you're saying "works" here.

To be very very clear

* The only things published by someone when they report a positive test result are the day keys for whatever length of time is reasonable (I assume ~14 days?)

* Given those day keys it is possible for your device to generate all the identifiers that the reporter's device would have broadcast.

* From that they can go through their database of seen identifiers and see if they find a match.

That means your device can determine when you were in proximity to the
reporter, so it would in theory be able to know approximately where the
contact happened, but you can't determine anything beyond that.

The server that collects and serves reported day keys doesn't have the list of
identifiers any devices have encountered, so it can't learn anything about the
reporters from the day keys they upload.

Let's say there's a passive fixed beacon (whatever) in a public space, it
can't connect the identifiers to any specific device either, but you could see
it being a useful public health tool - "we saw carriers at [some park] at
[some times]". It still would not know which specific devices were reporting
those keys. Even if that device went through after the day keys were published
there's no way to know that its a device that's been seen before.

Only the server is able to link published day keys together because it
receives them, so presumably knows who published those. The spec explicitly
disallows an implementation from doing this, but assumes a malicious server,
so it works to ensure that the only information it can get are day keys with
no other information.

~~~
FartyMcFarter
It is pretty clear that a single piece of data gathered by this system is
fairly useless. But the more data an entity has gathered, the better it can be
used to paint a whole picture.

If the server is malicious (think a government doing surveillance), it is
possible that the data from passive fixed beacons gets linked with the
identity of the person uploading keys, via IP address (when keys are uploaded)
or facial recognition gathered by cameras next to Bluetooth receivers. This
data can also be linked with data from fixed beacons in other places, which
would allow for tracking someone throughout a variety of places.

------
antpls
Again, this solution _cannot_ work and it is a _threat_ to a permanent loss of
privacy.

This is like the government and the adtech companies sleeping in the same bed,
without any other power opposition in the balance.

1) The "solution" is created by a monopoly of 2 american private corporations.

2) It can only work reliably if everyone wear an (Apple or Android) phone at
all time, and consent to give data

3) You are not necessarily infected if you cross an infected in the street at
5 meters. This will have too many false positives and give fuzzy information
to people

4) It doesn't help people who are infected and _dying_

It just _doesnt make sense_. To me, it looks like electronic voting, but
worse. No one can understand how it works, beside experts.

Today it is reviewed, but then the app will be forgotten and updated in the
background with "new features" for adtech.

We are forgetting what we are fighting : a biological virus. All effort should
go toward understanding the biological machinery of the virus and the hosts,
in order to _cure_ the virus. We should be 3D printing ventilators, analysing
DNA sequences, build nanorobots and synthesis new molecules.

~~~
fabian2k
From looking at the specification, I don't see any serious loss of privacy
there, if this is implemented as stated.

2) You don't need 100%, you only need enough to drop the R0 below 1. You'll
likely need a majority of people using this, which is hard enough, but you
don't need everyone using it.

3) The apps are not supposed to include every single registered contact, only
contacts that are over a bit longer timeframe. A typical value I've heard is
15 minutes close contact, that is considered a high risk contact when contact
tracing.

~~~
vermilingua
Minor nitpick: the R0 is the basic reproduction number, it describes the
infectivity if _no measures are taken_ to control the spread.

You’re looking for R, the effective reproduction number, which is R0 plus all
controls.

------
Reelin
Is there an official document somewhere?

Also, how does it compare to DP-3T?
([https://github.com/DP-3T/documents](https://github.com/DP-3T/documents))
([https://ncase.me/contact-tracing/](https://ncase.me/contact-tracing/))

Edit: Apple's preliminary specification was linked in another HN comment.
([https://covid19-static.cdn-
apple.com/applications/covid19/cu...](https://covid19-static.cdn-
apple.com/applications/covid19/current/static/contact-
tracing/pdf/ContactTracing-CryptographySpecification.pdf))

~~~
tastroder
More technical links in here:
[https://news.ycombinator.com/item?id=22836871](https://news.ycombinator.com/item?id=22836871)

------
pferde
What's it with people making long, split-up twitter threads like this? They're
cumbersome and hard to read. Be an adult, write and publish an article on your
blog.

It feels weird having to criticize Marlinspike about this, but stupid
practices are stupid no matter how prestigious the person doing them is.

~~~
searchableguy
Because it gets more visibility on twitter than a blog and is already
something many people using Twitter do.

You can use threaderapp to get a blog post out of it.

------
femto113
The system doesn't need to ship every key to every phone, much more compact
structures like Bloom filters could be used instead. If we assume about 1000
positives per day and each positive uploading 14 days of keys at 4 keys per
hour that's a bit over 1 million keys per day. A Bloom filter with a false
positive rate of 1/1000 could store that in about a megabyte. Phone downloads
the filter each day and checks its observed keys, and only needs to download
the actual keys if there's a potential match.

~~~
est31
The main issue of bloom filters is this:

> only needs to download the actual keys if there's a potential match.

One of the design constraints of the service was that it should not know your
(suspected) infection status unless you give consent that it should be shared.

> Matches must stay local to the device and not be revealed to the Diagnosis
> Server.

[https://covid19-static.cdn-
apple.com/applications/covid19/cu...](https://covid19-static.cdn-
apple.com/applications/covid19/current/static/contact-
tracing/pdf/ContactTracing-CryptographySpecification.pdf)

The better the bloom filter is, the more likely it is that you have actually
been in contact with a key if the bloom filter is positive.

Furthermore, the bloom filter has to deal with a lot more keys. In fact, in
your example of 1000 positives per day uploading 14 days of keys you only need
to upload 14 keys as they only rotate once per day. At 16 bytes per key (as
the link above specifies), you'd have to download 14 * 1000 * 16 = 224kb, much
less than the bloom filter needs. And this scheme can tell you with 100%
certainty whether there has been a match or not, so at least in your example
it's much better than bloom filters.

The scalability issues that exist only manifest themselves at larger numbers
than 1000 infections per day, say upper tens to lower hundreds of thousands
where it starts becoming a problem.

So yes, rough location as moxie suggests is the best method to improve the
scheme. Instead of checking the IDs of people thousands or hundreds of km away
from you, you could just check the IDs of people in your US state or county.
But it has to be smart enough to recognize movement, as in, you need to
upload/download all areas you've been in and people living at the borders
automatically stand out because they download two or three areas.

~~~
femto113
I see lots of ways to mitigate requesting the keys from disclosing much
information:

\- could set the false positive rate higher than the chance of encountering a
case in the wild (which makes it smaller)

\- phone could be programmed to sometimes randomly request keys even when the
filter doesn't match

\- keys could be distributed across many static mirrors and your phone could
pick one at random if the filter matched

------
zeckalpha
> Published keys are 16 bytes, one for each day. If moderate numbers of
> smartphone users are infected in any given week, that's 100s of MBs for all
> phones to DL.

Seems like a usecase for bloom filters or k-anonymity.

~~~
sneak
16 byte keys for a quarter million people are only 4mb per day.

We aren’t seeing remotely close to a quarter million infections per day. The
data sizes are reasonable, even if you multiply it times n days for the
backward tracing.

I think his post is a little bit more fearmongering than is necessary.

~~~
jeremyjh
> I think his post is a little bit more fearmongering than is necessary.

I think that is being unnecessarily charitable because of his high status. His
math and his assumptions on this point are completely broken. So broken that
if he were some rando, no one would have even read the rest of his thread,
much less commented on it at on the front page of hacker news.

------
daenz
An important question here is: will this framework go away once the pandemic
is over? Something tells me it won't.

~~~
tastroder
To ease on the fear mongering front here: This proposal relies on an app
implementing these protocols, you're free to uninstall the app after the
pandemic - or not install it in the first place. It is furthermore trivial to
check if your device sends out these BTLE packets.

It's not a "can we put the genie back in the bottle" scenario if the genie is
wearing a bright warning vest announcing its presence everywhere. You can
directly measure if it's still there. All other concerns are not technical
ones. If you acknowledge digital contact tracing to be a thing, this is better
for privacy than any other proposal so far. The framework is designed to
prevent abuse even in case it would not go away.

~~~
IanCal
I'm not sure I'd count this as fearmongering. I think I know which way the
tradeoffs work in my mind but there's not an unreasonable set of paths that
lead to this being more permanent.

Given the broad powers passed recently in the UK they could make having this
app a legal requirement to go in any shop if they wanted, and whether apps can
be uninstalled reasonably is down to whoever controls the OS.

Would it not make sense to require everyone who is able to to install and use
this? Or require Google and apple to force install it?

~~~
tcd
> they could make having this app a legal requirement to go in any shop if
> they wanted

If they can legally mandate an "app" they can mandate me having a device to
run said "app".

It'd be absolutely absurd to mandate you having a spy with you at all times to
exist in society.

"Sorry, don't have a phone, kthx", "Sorry, my phone doesn't use gapps, it's
using a custom ROM", and what about these Linux phones?

Yeah, that's not going to work.

~~~
IanCal
The idea that the vast majority of people would not be running a stock OS
seems less likely to me than measures like that being introduced.

------
severine
[https://threadreaderapp.com/thread/1248707315626201088.html](https://threadreaderapp.com/thread/1248707315626201088.html)

------
grumple
Yikes, this is prep for big brother's guilt by association. I wouldn't want to
test positive for anything the state can track (radical ideas? you're now a
positive in this system). Opt out.

~~~
DagAgren
Or, it's just what it says. It's a way to implement test and trace, something
that is absolutely needed to stop a pandemic like this from killing hundreds
of thousands if not millions of people.

Everything isn't a slippy slope. Everything isn't about your privacy.
Everything isn't a grand conspiracy that only you can see and the sheeple are
too dumb to understand.

Sometimes, extreme measures are needed.

~~~
grumple
> Everything isn't a slippy slope. Everything isn't about your privacy.
> Everything isn't a grand conspiracy that only you can see and the sheeple
> are too dumb to understand.

Crises are often used by despots to seize power. That's not a conspiracy,
that's historical fact. In the United States, we've seen it recently - 9/11
was used to degrade our rights across a large number of issues, and we've
never gotten them back.

Implementing systems to track everyone people come into contact with is
absolutely a huge invasion of privacy, and obviously not necessary.

> Sometimes, extreme measures are needed.

Extreme times do not justify all extreme measures.

Every time you lose rights or privacy, assume it's permanent. Our government
is not suited for repealing law.

~~~
DagAgren
Crises are far, far more often NOT used by despots to seize power.

------
themark
Seems like a lot of processing. I wonder how much battery performance will be
affected.

------
kome
that's the new electronic voting: making easy stuff more complicated and
dangerous...

the problem is not not a technological problem, it's a political problem.

------
bobowzki
Goodbye last shred of privacy.

"The road to hell is paved with good intentions" is an expression that comes
to mind.

------
redis_mlc
Can somebody address the issue that we have almost no testing ability in the
US?

~~~
grandmczeb
We’re testing like 140k per day with over 2.5M tests total. Obviously more
would be better but that’s hardly “almost no testing ability.”

~~~
hn_throwaway_99
I think it may differ by region. I can say factually that the testing in Texas
is absolutely abysmal. Many people who have lots of symptoms are being turned
away for testing.

On Tuesday our illustrious governor announced with lots of fanfare that
Walgreen's would be expanding drive-thru testing using Abbott's 15-minute
testing devices. It's now Friday evening and still no word on even where the
locations will be for these testing sites. I'm sick of these BS press
conferences and press releases, just STFU if you're not actually ready to _do_
anything.

There is _no way_ we'll be able to restart our economy without at least a 10X
or more increase in testing. For right now the lack of testing isn't that huge
of a deal when most people are quarantining at home anyway, but it will become
a huge deal when people start going back to work. I'm still kind of amazed I
haven't seen any convincing plan about how this eventually ends. Everything
will just flare back up again once social distancing ends without tons of
robust testing.

~~~
grandmczeb
This may be a bit of an optimistic take, but there's at least some evidence
the IFR is ~0.37%. Given the current number of deaths in NYC, that would imply
at least ~20% of the city population has already been infected, likely more
given the lag between infection and death. If that's true, the best strategy
will probably be to keep vulnerable groups isolated and loosen some
restrictions until herd immunity is reached. How long that would be depends on
the hospitalization rate since ideally we'd have hospitals just barely at max
capacity, but I don't think its implausible by the end of the summer we'd
reach ~75% infected at which point all restrictions could be lifted.

~~~
fyfy18
Further to this, once we have antigen tests rolling out en mass it will give
us a much clearer picture as to how many people have been infected (and are
now hopefully immune). Until then we just need to sit back and wait.

------
Zenbit_UX
No clue who/what a moxie is (presumably some guy) and it makes this threads
title seem even more absurd.

OP feeling like we all need to know what moxie thinks about this reminds me of
this [Chappelle Show skit]([https://www.youtube.com/watch?v=Mo-
ddYhXAZc](https://www.youtube.com/watch?v=Mo-ddYhXAZc)) about getting Ja
Rule's hot take on current events.

~~~
DyslexicAtheist
the person behind the double-ratchet algorithm used in WhatsApp and the
inventor of Signal

~~~
Zenbit_UX
Ah, that makes sense then. Though I do think "Founder of Signal comments on
new Google/Apple contact tracing proposal" would be a far less absurd title
than "Moxies take on...", further adding "private" in quotes is a bit cheeky
and definitely imparts bias into the discussion.

None of us developers here are dumb enough to think an api whose goal is to
literally track and trace human beings is 100% private. The question is
really, is it private _enough_.

~~~
redis_mlc
Moxie is very well-known in the computer security industry.

[https://en.wikipedia.org/wiki/Moxie_Marlinspike](https://en.wikipedia.org/wiki/Moxie_Marlinspike)

------
mc32
Of course Google promises [1]:

“ adhering to our stringent privacy protocols and protecting people's privacy.
No personally identifiable information, such as an individual's location,
contacts or movement, will be made available at any point."

[1] [https://turnto10.com/news/local/privacy-advocates-raise-
conc...](https://turnto10.com/news/local/privacy-advocates-raise-concerns-
about-googles-mobility-report)

------
howmayiannoyyou
Finally a decent use-case for blockchain and nobody is paying attention. Seems
to make a lot more sense to reconcile location and proximity from a shared
user-controlled anonymous ledger.

~~~
tastroder
There's plenty of Blockchain based proposals for the backend of this, none of
which takes off because it's another one of these imaginary use cases that can
just leverage existing centralisation without wasting time on solving problems
the introduction of a decentralised Blockchain architecture brings with it.

------
Uhhrrr
A modest proposal: since almost everyone is going to get this and a much
smaller percentage is vulnerable, perhaps we should just use this system to
track those who choose to register as vulnerable.

