2020 Census and Privacy

This may not be news to many of you that make frequent use of census data, but it came as a surprise to me and to several other folks I’ve talked with. It raises the privacy vs accuracy debate in data that we work with to a new level and likely has implications for many of our data choices.

The 2020 Census will include privacy protection practices that go far beyond those of the 2010 census.

A short background: using the 2010 data released publicly the US Census Bureau has demonstrated that through the use of their publicly released data with commercially available data sets the confidential records collected by the Bureau can be reconstructed leading to the identification of individuals. The Census Bureau interprets this as a violation of their responsibilities under Title 13 to protect the anonymity of respondents to their census and surveys. As a result they’re applying a technique called “Differential Privacy” that quantifies the risk to privacy for all data products. It does this by injecting calibrated levels of noise into the data. There is a total privacy budget (Epsilon) and each statistic uses a portion of that budget. The Census Bureau has applied a draft version of these protections to the 2010 Census data and released them.

A simple summary of implications:

  1. The state totals used to reapportion the House of Representatives will be the actual enumeration. i.e. the total population of each state will be reported exactly as enumerated
  2. All values below that will have Differential Privacy applied to them. Many products have had protections applied in the past.
  3. The Census Bureau is evaluating what data and what portions of the privacy budget will be used for each data element.

Here’s the Census Bureau’s page on it.

https://www.census.gov/about/policies/privacy/statistical_safeguards/disclosure-avoidance-2020-census.html

A National Academies work workshop: https://sites.nationalacademies.org/DBASSE/CNSTAT/DBASSE_196518

The presentation by Beth Jarosz in particular is pertinent for California related to housing and transportation planning and mandates.

Here’s an ESRI Story Map comparing the 2010 Census with the version with Differential Privacy applied.

The National Conference of State Legislatures (NCSL) seems to have pretty good coverage with some additional resources at the bottom of the page. In particular the state responses, as well as that of  Caliper (who make travel demand modeling software dependent on population data) are interesting.

https://www.ncsl.org/research/redistricting/differential-privacy-for-census-data-explained.aspx