Learn more about the Differential Privacy Game.

This is a game designed to better understand the effect of differential privacy on ads measurement. The Private Ads Technology Community Group (PATCG) in the World Wide Web Consortium (W3C) is actively working on a new web standard for private ads measurement that can enable aggregate measurement of ad effectiveness based on cross site activity while preventing user tracking.

There is initial consensus in that group that differential privacy will be part of the solution, and this effort attempts to better understand the effect of that noise on the utility of ads measurement.

Differential Privacy

We often think of privacy as a binary attribute: either a system is private or it is not. Contrary to this idea, differential privacy does not tell us if a system is private or not.

Instead, it's measure of a system that tells us how much information is revealed about individual data points within that system.

If you interested in the mathematical details, check out Wikipedia. Here, we'll instead motivate with an example.

Motivating Example

Suppose we had access to a database that stores information potentially sensitive data about people, such as if they have a certain health condition. The database is operated by some trusted source such as a health care provider, and it only allows aggregate queries, e.g., for the purpose of tracking prevalence among the general population.

Now, let's say that aggregation requirement enforced by the database results in an error if it includes less than 100 people in the query. For a single query, this is a pretty reasonable solution: for example, a query of 100 people could tell us that 7 people have the condition. For everyone in that query, we'd uniformly learn they all have a 7% chance of having the condition.

Example Query Dataset 1

An example of a query of 100 rows of data with people and their health condition status.

Show hidden values

NameHas health condition?Inferred Odds of Condition
1Garrus███7.0%
2Liara███7.0%
3Mordin███7.0%
4Anderson███7.0%
5Bailey███7.0%
............
100Jack███7.0%
Count7

Now, suppose we wanted to figure out if another person, Tali, has this condition. Let's rerun the above query, but with one more person, Tali. For demonstration, the following toggle controls value in the database. Notice how the count changes exactly with her value.

Example Query Dataset 2

An example of a query of 101 rows of data with people and their health condition status.

Tali's status

Show hidden values

NameHas health condition?Inferred Odds of Condition
1Garrus███7.9%
2Liara███7.9%
3Mordin███7.9%
4Anderson███7.9%
5Bailey███7.9%
............
100Jack███7.9%
101Tali███7.9%
Count8

This tells us, definitively Tali's status with respect to the health condition. With just two queries, albeit carefully constructed, we've violated the aggregation requirement.

This is called a differencing attack, where differential privacy gets its namesake. And because we were able to exactly infer Tali's value in the database, we would say that this querying algorithm offers no differential privacy, or ε=∞.

Adding Noise

If we want to make this function differentially private, e.g., with a finite ε, one way is to add noise to the count, sampled from a Laplace distribution. The plot below shows the probability distribution of the count with noise added. For an actual query, we'd get a random value sampled from this distribution.

In our example, we're trying to differential between a count of 7 and 8. A simple rule for this would be to assume that if we see a value less than 7.5, the true value is 7, and if we see a value greater than 7.5, the true value is 8. But because this noise is random, this rule will be wrong sometimes.

Tali's status

Laplace Variance: 1.000
ε: 1.414
Probability(Count < 7.5) = 0.247
Probability(Count > 7.5) = 0.753

Playing around with this tool, you'll noice as the variance decreases (and ε increases), the probabily of making the correct inference converges towards 1. Similarly, as variance increases (and ε decreases), that probability converges towards 0.5, e.g., random chance.

This is the core nature of differential privacy being a measure and not a binary attribute. There is no magic ε where things become private. (Except, one may argue, at ε=0, which is purely random noise. Such a system is certainly private, but it also would provide no utility in an scenario.)