By Dr. Stefaan Verhulst

Prepared to inform the Webinar “Meeting the Challenge of Measurement”, a NAPA Technology Standing Panel, CODE, and Bridge Alliance webinar (June 30, 2026). This piece was originally posted on Medium.

Introduction

Official statistics have long served as the bedrock of evidence-based policymaking in the United States, but the ground beneath them is shifting. Survey response rates are falling; collection costs are rising; privacy concerns are mounting; and public trust in government information has eroded just as the questions policymakers must answer—about digital inclusion, financial resilience, climate impacts, mobility, and economic opportunity—have grown more complex and time-sensitive. The statistical systems built for a slower, more uniform economy were never designed to keep pace with a society this dynamic.

Out of this tension has come a quiet but consequential shift: the rise of the re-use of non-traditional data, or NTD. Generated continuously through commercial transactions, digital platforms, connected devices, satellites, financial institutions, mobility services, and online interactions, this data was never collected with statistical production in mind. Yet when it is responsibly governed and woven together with surveys, censuses, and administrative records, it gives statistical agencies something they have always wanted but rarely had: a near-continuous, granular window into economic and social life. The promise is not that non-traditional data will replace official statistics but that it will make them faster, cheaper, and more relevant—while asking less of the public in the process.

As in other countries, the United States has become a testing ground for this hybrid approach. Federal statistical agencies — the Census Bureau, the Bureau of Labor Statistics, and the Bureau of Economic Analysis among them — have increasingly partnered with universities, nonprofits, philanthropies, and private companies to explore what non-traditional data can offer. Outside government, organizations such as Opportunity Insights, the JP MorganChase Institute, Microsoft Research, Mastercard’s Center for Inclusive Growth, and Meta’s Data for Good program have shown that privately held, passively generated data can produce policy-relevant indicators that official statistics alone cannot.

What emerges from these efforts is not competition between data sources but complementarity. Surveys still offer representativeness (“ground truth”) and rich context; administrative records still offer comprehensive population coverage; and non-traditional data contributes something that is often limited in both: timeliness, granularity, and the ability to see change as it happens rather than months later.

The examples that follow provide a snapshot of current experimentation in the United States. They are not intended to be exhaustive, but rather to illustrate the diversity of approaches through which non-traditional data is being integrated with surveys, censuses, and administrative records to strengthen official statistics and inform public decision-making.

A Snapshot of Current Practices

Measuring Opportunity, Connection, and Inclusion

Few examples illustrate this hybrid model better than the work of Raj Chetty and Opportunity Insights. What began as a project linking anonymized IRS tax records with Census and education data has grown into something far more expansive, incorporating Meta’s social-network data, commercial housing information, geospatial datasets, and consumer credit data. The result — tools like the Opportunity Atlas, the Social Capital Atlas, and the Economic Tracker — offers neighborhood-level measures of mobility and connectedness that would simply be impossible to produce from surveys alone, and that gave policymakers a near real-time read on the economy during recent shocks.

Meta’s Social Connectedness Index extends this logic into a domain official statistics have always struggled to capture: social capital. By aggregating anonymized Facebook friendship ties, the index measures how socially — not just geographically — close two communities are, a signal researchers have used to study disaster recovery, labor mobility, and political polarization.

Microsoft’s analysis of telemetry from roughly 40 million Windows devices does something similar for the digital divide, showing that meaningful digital inclusion is about engagement and skill, not merely whether a household has a broadband connection.

Mastercard’s Inclusive Growth Score takes a parallel approach to economic opportunity itself. By blending anonymized payment transaction data with open data on housing, employment, education, and health, it produces a census-tract-level picture of whether growth is actually shared — a question that GDP and household income figures, on their own, cannot answer.

Tracking Markets and Labor in Real Time

Housing and labor markets move faster than most official statistics can follow, which is exactly where non-traditional data has found some of its clearest use cases. Zillow’s indices—drawn from millions of property listings, valuations, and transactions—are updated monthly down to the ZIP code and neighborhood level and have become trusted enough to be disseminated through the Federal Reserve Bank of St. Louis alongside government series.

ADP’s payroll data, covering millions of workers, offers economists an early read on labor-market shifts well before official employment figures are released.

The Census Bureau’s Own Experiments

Perhaps most tellingly, the Census Bureau itself has institutionalized this experimentation. Its Experimental Data Products program tests new methods in the open, with full methodological documentation, before deciding whether they belong in official production. One example pairs high-resolution satellite imagery with computer vision to estimate monthly housing starts. This does not replace field surveys but provides supplemental visibility into the dynamics of housing. The Bureau has applied the same blended logic to retail sales, combining point-of-sale transaction data with traditional surveys and payroll records, and to business conditions through products like the Business Trends and Outlook Survey. Even the Household Pulse Survey, built and deployed within weeks during the COVID-19 pandemic, showed that official statistics can be agile when the moment demands it.

Making the Re-Use of NTD More Adopted and Reliable

For all its promise, the re-use of NTD remains largely additive rather than foundational — useful complement that sits alongside official statistics rather than load-bearing components of the system itself. Turning non-traditional data into a genuinely reliable pillar of national statistics will require resolving several hard problems that none of the case studies above have fully solved.

The first is awareness and understanding. While interest in non-traditional data has grown considerably, many policymakers, statistical leaders, and public officials remain unfamiliar with both the opportunities and the limitations of these emerging data sources. In some cases, non-traditional data sources are viewed as a panacea capable of replacing surveys and official statistics; in others, they are dismissed as inherently biased or unreliable. Neither perspective tells the whole story. Building greater awareness of when non-traditional data types are—and are not—fit for purpose is therefore an essential first step. This includes strengthening data literacy among decision-makers, promoting evidence on successful use casesencouraging collaboration between statistical agencies and external data providers, and fostering a more nuanced understanding of how surveys, administrative records, and non-traditional data can complement one another within a hybrid statistical ecosystem.

The second is representativeness. Windows telemetry says nothing about households that do not own a PC; payment and payroll data say little about the cash economy or the unbanked; Facebook friendship graphs say nothing about people who are not on the platform. Each source carries the demographic fingerprint of whoever generated it, and those fingerprints rarely match the population a national statistic is meant to describe. Building reliable bias-correction methods—benchmarking commercial data against the population totals that surveys and the decennial census still uniquely provide—has to be solved before these sources can stand on their own rather than borrow legitimacy from the surveys they sit beside.

The third is access and continuity. Much of what makes these data sources valuable also makes them fragile: they are privately owned, can be withdrawn, repriced, or restructured at a company’s discretion, and were never built with the multi-decade continuity that a national statistical series requires. Durable data sharing agreements and sustainable partnerships are therefore a prerequisite for treating these sources as dependable public infrastructure as opposed to temporary arrangements.

The fourth is methodological transparency. Official statistics earn public trust because their methods are published, replicable, and open to scrutiny. Proprietary algorithms and commercial data-processing pipelines are often the opposite. Closing that gap will require sufficient transparency around methodology, sampling, processing, and limitations to enable independent validation while protecting legitimate commercial interests.

The fifth is public trust, and social license. Legal authority to access and reuse data is not the same as public legitimacy. Even where data sharing is lawful and privacy-preserving, people may still question whether particular uses align with their expectations and values. Building a social license for the reuse of non-traditional data requires transparency, meaningful public engagement, clear articulation of public value, robust privacy protections, and accountable governance. Without public confidence and legitimacy, technically sound data initiatives may struggle to achieve long-term sustainability.

The sixth is institutional capacity and data stewardship. Statistical agencies built around survey methodology now require additional capabilities in data engineering, privacy-enhancing technologies, machine learning, legal and contractual negotiation, and partnership management. More fundamentally, they need to build data stewardship capacity — the ability to responsibly identify, access, govern, integrate, and reuse data generated outside government in ways that maximize public value while minimizing risks. The Census Bureau’s Experimental Data Products program illustrates how these capabilities can be developed incrementally through experimentation, rigorous evaluation, and public engagement, but scaling this approach across the federal statistical system remains a significant undertaking.

Finally, there is the question of governance and standards. Each initiative described above currently operates under its own definitions, access arrangements, quality frameworks, and governance mechanisms. For non-traditional data to become a genuine third pillar of national statistics alongside surveys and administrative records, the field will require shared quality standards, common frameworks for assessing fitness for purpose, interoperable governance practices, and clearer federal guidance on when and how non-traditional data can appropriately contribute to official statistics rather than simply complement them.

In short, the challenge is not simply to obtain more data. It is to build the awareness, methods, institutions, governance, and public legitimacy needed to make non-traditional data a trusted and durable component of the nation’s statistical infrastructure.

Stefaan Verhulst headshot

Author

Stefaan Verhulst

Course Lead · Data Stewards Founder

Dr. Stefaan Verhulst is Co-Founder of the DataTank and The GovLab and the main lecturer of the data stewardship academy. In addition, he is a Research Professor at the Center for Urban Science and Progress at the Tandon School of Engineering of New York University; and a Senior Advisor to the Markle Foundation where he spent more than a decade as Chief of Research. He is also the Editor-in-Chief of the open-access journal Data & Policy (Cambridge University Press); the Research Director of the MacArthur Research Network on Opening Governance; Chair of the Data for Children Collaborative with Unicef; a member of the High-Level Expert Group to the European Commission on Business-to-Government Data Sharing; and of the Expert Group to Eurostat on using Private Sector data for Official Statistics. In addition he is also a member of the UNESCO Information Ethics Working Group; Researcher at the ISI Foundation (Torino, Italy); Senior Researcher at SMIT (Studies in Media, Innovation and Technology) at the Free University of Brussels (VUB) . In 2018 he was recognized as one of the 10 Most Influential Academics in Digital Government globally (by the global policy platform Apolitical). Previously at Oxford University, he was the UNESCO Chairholder in Communications Law and Policy and co-founded and was the Head of the Program in Comparative Media Law and Policy at the Center for Socio-Legal Studies. He was the Socio-Legal Fellow at Wolfson College, and is still an emeritus fellow at Oxford. He also taught for several years at the London School of Economics and was Co-Founder and Co-Director of the International Media and Info-Comms Policy and Law Studies (IMPS) at the University of Glasgow School of Law. He has published widely - including seven books- and his writings and work have appeared in the Harvard Business Review, Stanford Social Innovation Review, Project Syndicate, Wall Street Journal, and The Conversation (among many other outlets). He is asked regularly to present at international conferences including, for instance, TED, Collision, and the UN World Data Forum. Numerous organizations have sought his counsel on a variety of topics including data and AI governance - including the WorldBank; IDB, CAP, USAID, DFID, IDRC, AFP, the European Commission, Council of Europe, the World Economic Forum, UNICEF, OECD, UN-OCHA, UNDP, UNESCO and several other international and national private and public organizations. He is also a Linkedin Learning instructor seeking to democratize the practice of data stewardship globally.