[{"data":1,"prerenderedAt":63},["ShallowReactive",2],{"page-config-/news/research-radar-statgpt-and-the-fourth-wave-of-open-data":3,"news-post-research-radar-statgpt-and-the-fourth-wave-of-open-data":38},{"id":4,"title":5,"body":6,"description":10,"extension":13,"hero":14,"meta":32,"navigation":15,"path":33,"sections":34,"seo":35,"stem":36,"__hash__":37},"pages/pages/news.md","News",{"type":7,"value":8,"toc":9},"minimark",[],{"title":10,"searchDepth":11,"depth":11,"links":12},"",2,[],"md",{"showHero":15,"title":16,"badge":17,"description":18,"image":19,"imageCredit":20,"backgroundImage":21,"links":22},true,"Recent News & Updates","Latest Posts","Stay updated with the latest news, events, interviews, and videos from our community.","/images/news.webp","Sebastian Pfütze","/images/news-hero-bg.png",[23,28],{"label":24,"to":25,"target":26,"type":27},"Subscribe","https://mailchi.mp/4337f0e3e319/9262kjy9ck","_blank","primary",{"label":29,"to":30,"type":31},"Contact Us","/contact","secondary",{},"/pages/news",null,{"title":5,"description":10},"pages/news","RH8LB8jvO_3y-r_SCCELEeG-ZVvaq9CqOUgvw53iVMY",{"id":39,"title":40,"author":41,"body":42,"category":53,"date":43,"description":46,"duration":34,"endTime":34,"extendedContent":34,"extension":55,"featured":56,"heading":45,"image":54,"location":34,"locationType":34,"mainContent":47,"meta":57,"navigation":15,"path":58,"registrationLink":34,"seo":59,"slug":44,"startTime":34,"stem":60,"tags":61,"videoUrl":34,"__hash__":62},"blogposts/blogposts/research-radar-statgpt-and-the-fourth-wave-of-open-data.json","Research Radar Statgpt And The Fourth Wave Of Open Data","Stefaan Verhulst & Adam Zable",{"date":43,"slug":44,"heading":45,"description":46,"mainContent":47,"author":41,"tags":48,"category":53,"image":54},"2026-04-07T12:00:00","research-radar-statgpt-and-the-fourth-wave-of-open-data","Research Radar: StatGPT and the Fourth Wave of Open Data","Stefaan Verhulst and Adam Zable argue that the biggest challenge in open data is no longer access, but usability, as official statistics remain difficult to find, interpret, and apply. Drawing on the International Monetary Fund’s StatGPT new research, it shows how artificial intelligence could transform access through natural language interfaces, while warning that accuracy and trust depend on retrieving authoritative data rather than generating answers. The article situates this shift within a broader “Fourth Wave of Open Data,” calling for new data systems and governance approaches that make information truly usable and reliable.","\u003Cp dir=\"ltr\">Despite decades of investment in statistical systems and open data initiatives, official data remains difficult to discover, interpret, and apply in practice. The challenge is no longer one of availability, but of (re)usability. This persistent gap underscores a broader paradox at the heart of contemporary data governance: data may be open, yet it remains functionally inaccessible for many intended users.\u003C/p>\n\u003Cp dir=\"ltr\">In this context, the&nbsp;\u003Ca href=\"https://www.imf.org/en/home\" target=\"_blank\" rel=\"noopener noreferrer\">International Monetary Fund\u003Cspan class=\"sr-only\">(opens in new window)\u003C/span>\u003C/a>&nbsp;has been a pioneer in exploring how artificial intelligence and open data can intersect to address this usability challenge. Its&nbsp;\u003Ca href=\"https://www.imf.org/-/media/files/publications/dp/2026/english/saiosea.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">StatGPT: AI for Official Statistics\u003Cspan class=\"sr-only\">(opens in new window)\u003C/span>\u003C/a>&nbsp;report, by&nbsp;\u003Ca href=\"https://www.imf.org/en/blogs/authors/james%20tebrake\" target=\"_blank\" rel=\"noopener noreferrer\">James Tebrake\u003Cspan class=\"sr-only\">(opens in new window)\u003C/span>\u003C/a>,&nbsp;\u003Ca href=\"https://www.linkedin.com/in/el-bachir-boukherouaa-7b378531/\" target=\"_blank\" rel=\"noopener noreferrer\">Bachir Boukherouaa\u003Cspan class=\"sr-only\">(opens in new window)\u003C/span>\u003C/a>,&nbsp;\u003Ca href=\"https://www.linkedin.com/in/jeff-danforth-07781a186/\" target=\"_blank\" rel=\"noopener noreferrer\">Jeff Danforth\u003Cspan class=\"sr-only\">(opens in new window)\u003C/span>\u003C/a>, and&nbsp;\u003Ca href=\"https://www.linkedin.com/in/nira-h/\" target=\"_blank\" rel=\"noopener noreferrer\">Niva Harikrishnan\u003Cspan class=\"sr-only\">(opens in new window)\u003C/span>\u003C/a>, offers a timely and important contribution to this evolving conversation - pointing toward a future where AI can make official data more navigable, interpretable, and actionable.\u003C/p>\n\u003Cblockquote>\n\u003Cp dir=\"ltr\">The data challenge is no longer just about availability, but about (re)usability.\u003C/p>\n\u003C/blockquote>\n\u003Cp dir=\"ltr\">The report provides a detailed account of the friction users face across the data lifecycle. Even highly motivated users must navigate fragmented portals, inconsistent terminology, and siloed datasets, often spending significant time assembling information that should be readily accessible.&nbsp;\u003C/p>\n\u003Cp dir=\"ltr\">The result is a fragmented ecosystem in which metadata and data are&nbsp;\u003Ca href=\"https://www.bfs.admin.ch/bfs/en/home/statistics/catalogue.assetdetail.33487133.html\" target=\"_blank\" rel=\"noopener noreferrer\">distributed\u003Cspan class=\"sr-only\">(opens in new window)\u003C/span>\u003C/a>&nbsp;across institutions and platforms, forcing users to navigate multiple systems and standards&mdash;and to reconstruct context&mdash;before they can assess whether the data is re-usable.&nbsp;\u003C/p>\n\u003Cp dir=\"ltr\">This resonates strongly with broader observations across the open data ecosystem: access alone does not guarantee impact. Without the ability to meaningfully engage with data, openness risks becoming performative rather than transformative.\u003C/p>\n\u003Ch3 id=\"heading-1\" dir=\"ltr\">The Promise and Limits of AI-Mediated Data Access\u003C/h3>\n\u003Cp dir=\"ltr\">In this context, the report turns to generative AI. At first glance, large language models appear to offer a breakthrough. They can enable conversational access to complex datasets, lowering the barriers to entry. Tasks that previously took hours&mdash;searching, filtering, downloading, and combining datasets&mdash;can now be completed in minutes.&nbsp;\u003C/p>\n\u003Cp dir=\"ltr\">This aligns closely with what we have described in our work as the emerging &ldquo;\u003Ca href=\"https://arxiv.org/abs/2405.04333\" target=\"_blank\" rel=\"noopener noreferrer\">Fourth Wave of Open Data\u003Cspan class=\"sr-only\">(opens in new window)\u003C/span>\u003C/a>&rdquo;: a shift from static access to dynamic, interaction-based engagement with data. This builds on earlier phases of open data, from transparency to open-by-default publication to purpose-driven reuse, toward a model in which data is accessed and reused through AI-mediated interaction.\u003C/p>\n\u003Cp dir=\"ltr\">However, the report is equally clear about the limitations of current AI approaches.&nbsp;\u003C/p>\n\u003Cblockquote>\n\u003Cp dir=\"ltr\">When asked to retrieve numerical data, general-purpose models frequently produce results that are not just incorrect, but &lsquo;reasonably incorrect&rsquo; and plausible enough to pass casual scrutiny.\u003C/p>\n\u003C/blockquote>\n\u003Cp dir=\"ltr\">When asked to retrieve numerical data, general-purpose models frequently produce results that are not just incorrect, but &lsquo;\u003Ca href=\"https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">reasonably incorrect\u003Cspan class=\"sr-only\">(opens in new window)\u003C/span>\u003C/a>&rsquo; - values that appear plausible enough to pass casual scrutiny. This outcome is not surprising, since these systems are fundamentally&nbsp;\u003Ca href=\"https://rss.org.uk/RSS/media/File-library/Policy/2026/AI-is-Statistics-FINAL.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">statistical in nature, meaning&nbsp;\u003Cspan class=\"sr-only\">(opens in new window)\u003C/span>\u003C/a>they are designed not to verify whether a number is authoritative or exact, but to predict plausible outputs from patterns in data.&nbsp;\u003C/p>\n\u003Cp dir=\"ltr\">This is a critical insight. In domains such as official statistics, where precision is non-negotiable, even small deviations can&nbsp;\u003Ca href=\"https://mitsloan.mit.edu/ideas-made-to-matter/what-happens-when-us-economic-data-becomes-unreliable\" target=\"_blank\" rel=\"noopener noreferrer\">undermine trust and lead to flawed decisions\u003Cspan class=\"sr-only\">(opens in new window)\u003C/span>\u003C/a>. The report&rsquo;s empirical testing&mdash;showing relatively low accuracy rates across repeated queries&mdash;underscores the risks of relying on generative AI as a source of truth.&nbsp;\u003C/p>\n\u003Ch3 id=\"heading-2\" dir=\"ltr\">StatGPT and the Shift to AI as Interface\u003C/h3>\n\u003Cp dir=\"ltr\">These limitations point to a deeper question about responsibility. If AI systems are not designed to distinguish between plausible and authoritative answers, then who is responsible for ensuring that users receive correct information? Should the companies behind these systems be expected to return authoritative data, or should the onus fall on data producers to ensure their data is structured in ways that AI systems can reliably access and interpret?&nbsp;\u003C/p>\n\u003Cp dir=\"ltr\">In practice, the report suggests that while both sides play a role, the burden increasingly falls on data producers to ensure that official statistics can be accessed and understood correctly in an AI-mediated environment.&nbsp;\u003C/p>\n\u003Cp dir=\"ltr\">This reflects a&nbsp;\u003Ca href=\"https://webtv.un.org/en/asset/k1z/k1z77fpw1i\" target=\"_blank\" rel=\"noopener noreferrer\">broader shift\u003Cspan class=\"sr-only\">(opens in new window)\u003C/span>\u003C/a>&nbsp;in the role of national statistical offices, from passive data producers to active stewards of trusted data, which requires them to design outputs not only for human users but also for machine consumption, strengthen attribution, and engage more actively with technology platforms and AI developers.\u003C/p>\n\u003Cp dir=\"ltr\">It is this reframing away from expecting AI to produce authoritative answers and toward ensuring that it can reliably retrieve them that helps explain the logic behind the proposed solution, StatGPT.&nbsp;\u003C/p>\n\u003Cblockquote>\n\u003Cp dir=\"ltr\">AI becomes an interface layer, not a knowledge producer.\u003C/p>\n\u003C/blockquote>\n\u003Cp dir=\"ltr\">Rather than using AI to generate answers, the system uses AI to interpret user intent and translate it into structured queries against authoritative data sources. In other words, AI becomes an interface layer, not a knowledge producer. This distinction is subtle but profound. It preserves the usability benefits of natural language interaction while anchoring outputs in verified, source-of-truth data.&nbsp;\u003C/p>\n\u003Cp dir=\"ltr\">This approach strongly echoes the direction outlined in our own work on the Fourth Wave of Open Data. In this new phase, data is no longer something users download and analyze separately; it becomes something they interact with through intelligent systems. But for this interaction to be meaningful, data must be &ldquo;\u003Ca href=\"https://blogs.worldbank.org/en/opendata/from-open-data-to-ai-ready-data--building-the-foundations-for-re\" target=\"_blank\" rel=\"noopener noreferrer\">AI-ready\u003Cspan class=\"sr-only\">(opens in new window)\u003C/span>\u003C/a>&rdquo;&mdash;structured, interoperable, richly annotated, and accessible through machine-readable interfaces.&nbsp;\u003C/p>\n\u003Cp dir=\"ltr\">The IMF report reinforces this point by emphasizing the need for robust APIs, improved metadata standards, and harmonized data models.\u003C/p>\n\u003Cp dir=\"ltr\">This is not simply a matter of formatting, but of ensuring that AI systems can reliably connect user intent to the correct data. For AI systems to function effectively, they must be able to map natural language queries to precise statistical concepts, something that depends on rich, consistent metadata and clearly defined data structures. Metadata plays a central role not only in locating data, but in enabling systems to assess its relevance, reliability, and comparability across sources.&nbsp;\u003C/p>\n\u003Cp dir=\"ltr\">Without this, even well-designed systems can return the wrong series, units, or time frames. In practice, gaps in metadata can lead systems to make implicit assumptions&mdash;for example, equating a general concept like &ldquo;inflation&rdquo; with whichever indicator is most readily available&mdash;introducing subtle but significant errors. Formats designed primarily for human navigation, such as spreadsheets or static tables, further limit AI systems' ability to retrieve authoritative values, whereas APIs enable precise, structured queries that return the exact published figure.\u003C/p>\n\u003Cp dir=\"ltr\">These technical limitations, in turn, have important governance implications.&nbsp;\u003C/p>\n\u003Ch3 id=\"heading-3\" dir=\"ltr\">Toward Trustworthy and Governable Data Ecosystems\u003C/h3>\n\u003Cp dir=\"ltr\">As data is increasingly accessed through intermediated AI systems, questions of ownership, attribution, and accountability become more pressing. The current ecosystem already suffers from issues such as&nbsp;\u003Ca href=\"https://www.ibm.com/think/insights/data-quality-issues\" target=\"_blank\" rel=\"noopener noreferrer\">duplicated datasets\u003Cspan class=\"sr-only\">(opens in new window)\u003C/span>\u003C/a>,&nbsp;\u003Ca href=\"https://www.techradar.com/pro/what-is-version-drift-in-ai\" target=\"_blank\" rel=\"noopener noreferrer\">outdated versions\u003Cspan class=\"sr-only\">(opens in new window)\u003C/span>\u003C/a>, and&nbsp;\u003Ca href=\"https://arxiv.org/html/2404.12691v1\" target=\"_blank\" rel=\"noopener noreferrer\">unclear provenance,&nbsp;\u003Cspan class=\"sr-only\">(opens in new window)\u003C/span>\u003C/a>often obscuring the original source of the data and making it difficult for users to verify, interpret, or update what they are using.&nbsp;\u003C/p>\n\u003Cp dir=\"ltr\">In many cases, data is copied and redistributed across platforms in ways that break the chain of ownership, leaving users uncertain about who produced the data or where to turn for clarification. The absence of clear standards for attribution, versioning, and ownership undermines trust in both the data and the systems that rely on it.&nbsp;\u003C/p>\n\u003Cblockquote>\n\u003Cp dir=\"ltr\">The Fourth Wave of Open Data is not only about technological innovation; it is also about rethinking the institutional and governance arrangements that underpin data access.\u003C/p>\n\u003C/blockquote>\n\u003Cp dir=\"ltr\">Addressing this requires making ownership explicit. Systems must not only return data, but clearly indicate who produced it and, where possible, retrieve it directly from the original source rather than from secondary aggregators.&nbsp;\u003C/p>\n\u003Cp dir=\"ltr\">This is where the connection to data stewardship efforts becomes particularly salient. The Fourth Wave of Open Data is not only about technological innovation; it is also about rethinking the institutional and governance arrangements that underpin data access.&nbsp;\u003C/p>\n\u003Cp dir=\"ltr\">Concepts such as&nbsp;\u003Ca href=\"https://one.oecd.org/document/DSTI/CDEP%282022%296/FINAL/en/pdf\" target=\"_blank\" rel=\"noopener noreferrer\">data stewardship\u003Cspan class=\"sr-only\">(opens in new window)\u003C/span>\u003C/a>,&nbsp;\u003Ca href=\"https://blog.thegovlab.org/reimagining-data-governance-for-ai-operationalizing-social-licensing-for-data-reuse\" target=\"_blank\" rel=\"noopener noreferrer\">social license\u003Cspan class=\"sr-only\">(opens in new window)\u003C/span>\u003C/a>, and&nbsp;\u003Ca href=\"https://joint-research-centre.ec.europa.eu/jrc-news-and-updates/data-intermediaries-more-inclusive-data-governance-how-do-they-work-2023-10-04_en\" target=\"_blank\" rel=\"noopener noreferrer\">trusted intermediaries\u003Cspan class=\"sr-only\">(opens in new window)\u003C/span>\u003C/a>&nbsp;become critical in ensuring that increased accessibility does not come at the expense of legitimacy. These concepts matter because increased access alone guarantees neither public trust nor public value; legitimacy and utility both depend on whether data is shared and reused through institutions and practices that are responsible, accountable, and oriented toward solving real-world problems.&nbsp;\u003C/p>\n\u003Cp dir=\"ltr\">From a policy and practice perspective, the report carries several important implications.&nbsp;\u003C/p>\n\u003Cp dir=\"ltr\">\u003Cstrong>First\u003C/strong>, investments in open data must now extend beyond publication toward enabling interaction.&nbsp;\u003C/p>\n\u003Cp dir=\"ltr\">\u003Cstrong>Second\u003C/strong>, AI adoption strategies must be grounded in an understanding of the limitations of current models, particularly in high-stakes domains.&nbsp;\u003C/p>\n\u003Cp dir=\"ltr\">\u003Cstrong>Third\u003C/strong>, building AI-ready data systems requires coordinated action across technical, organizational, and governance dimensions. Achieving this will require collaboration between data producers, technology providers, and standards-setting bodies to ensure that data systems and AI systems evolve together.\u003C/p>\n\u003Cblockquote>\n\u003Cp dir=\"ltr\">The challenge and opportunity lie in designing data ecosystems where usability and trust are not in tension, but mutually reinforcing.\u003C/p>\n\u003C/blockquote>\n\u003Cp dir=\"ltr\">Ultimately,&nbsp;StatGPT&nbsp;reinforces a central insight of the Fourth Wave: the future of data is not simply more openness, but more meaningful and trustworthy use. AI can play a powerful role in unlocking the value of data, but only when coupled with systems that ensure accuracy, provenance, and accountability. The challenge&mdash;and opportunity&mdash;lies in designing data ecosystems where usability and trust are not in tension, but are mutually reinforcing.\u003C/p>\n\u003Cp dir=\"ltr\">In that sense, the report is both a technical proposal and a broader call to action. It invites the official statistics community and the wider data ecosystem to move beyond access as an end in itself toward a model of data engagement fit for an AI-driven world.\u003C/p>\n\u003Cp dir=\"ltr\">\u003Cem>Header image credit: Elise Racine &amp; The Bigger Picture / https://betterimagesofai.org / https://creativecommons.org/licenses/by/4.0/\u003C/em>\u003C/p>",[49,50,51,52],"opendata","datastewardship","AI","StatGPT","blog","https://cms.thegovlab.com/assets/345b82d2-7e2c-40cb-b2e9-c97aaf0fbeaf","json",false,{},"/blogposts/research-radar-statgpt-and-the-fourth-wave-of-open-data",{"title":40,"description":46},"blogposts/research-radar-statgpt-and-the-fourth-wave-of-open-data",[49,50,51,52],"kDuNbTwuH9n3ajf1XP8yLluO_dRRismyfjrL6Uxc3w8",1775568952403]