Based on my limited understanding* of differential privacy, it falls short on exactness (of aggregate values) and robustness (against malicious clients). I've lately been studying the literature on function secret sharing and I think it is a better alternative to DP.
Prio: Private, Robust and Scalable Computation of Aggregate Statistics
Data collection and aggregation is performed by multiple servers. Every user splits up her response into multiple shares and sends one share to each server. I've understood how private sums can be computed. Let me explain it with a straw-man scheme.
Example (slide 26):
x_1 (user 1 is on Bay Bridge):- true == 1 == 15 + (-12) + (-2)
x_2 (user 2 is on Bay Bridge):- false == 1 == (-10) + 7 + 3
...
If all users send shares of their data to the servers in this manner AND as long as at least one server doesn't reveal the identities of the people who sent it responses, the servers can exchange the sum of the shares they've received. Adding the three responses will allow the servers to infer that there are _ number of users on Bay Bridge without revealing their identities.
This system can be made robust by using Secret-shared non-interactive proofs (SNIPs). This allows servers to test if Valid(X) holds without leaking X.
The authors also bring up the literature on computing interesting aggregates using private sums: average, variance, most popular (approx.), min and max (approx.), quality of regression model R^2, least-squares regression, stochastic gradient descent.
Bottom line: I found the discussion on deployment scenarios very interesting. Data servers with jurisdictional/geographical diversity, app store-app developer collaborations for eliminating risk in telemetry data analysis, enterprises contracting with external auditors for analyzing customer data, etc.
* - I understand the randomized response and, to some extent, the RAPPOR technique (used for collecting Chrome telemetry data) but the other literature in that community goes over my head.
* * - This technique is a black box to me at the moment.
Take this paper: https://www.henrycg.com/files/academic/pres/nsdi17prio-slide...
Prio: Private, Robust and Scalable Computation of Aggregate Statistics
Data collection and aggregation is performed by multiple servers. Every user splits up her response into multiple shares and sends one share to each server. I've understood how private sums can be computed. Let me explain it with a straw-man scheme.
Example (slide 26):
x_1 (user 1 is on Bay Bridge):- true == 1 == 15 + (-12) + (-2)
x_2 (user 2 is on Bay Bridge):- false == 1 == (-10) + 7 + 3 ...
If all users send shares of their data to the servers in this manner AND as long as at least one server doesn't reveal the identities of the people who sent it responses, the servers can exchange the sum of the shares they've received. Adding the three responses will allow the servers to infer that there are _ number of users on Bay Bridge without revealing their identities.
This system can be made robust by using Secret-shared non-interactive proofs (SNIPs). This allows servers to test if Valid(X) holds without leaking X.
The authors also bring up the literature on computing interesting aggregates using private sums: average, variance, most popular (approx.), min and max (approx.), quality of regression model R^2, least-squares regression, stochastic gradient descent.
Bottom line: I found the discussion on deployment scenarios very interesting. Data servers with jurisdictional/geographical diversity, app store-app developer collaborations for eliminating risk in telemetry data analysis, enterprises contracting with external auditors for analyzing customer data, etc.
* - I understand the randomized response and, to some extent, the RAPPOR technique (used for collecting Chrome telemetry data) but the other literature in that community goes over my head.
* * - This technique is a black box to me at the moment.