[{"data":1,"prerenderedAt":63},["ShallowReactive",2],{"page-config-/news/ai-summer-data-winter-what-the-ai-index-reveals-and-what-it-doesnt-yet-measure":3,"news-post-ai-summer-data-winter-what-the-ai-index-reveals-and-what-it-doesnt-yet-measure":38},{"id":4,"title":5,"body":6,"description":10,"extension":13,"hero":14,"meta":32,"navigation":15,"path":33,"sections":34,"seo":35,"stem":36,"__hash__":37},"pages/pages/news.md","News",{"type":7,"value":8,"toc":9},"minimark",[],{"title":10,"searchDepth":11,"depth":11,"links":12},"",2,[],"md",{"showHero":15,"title":16,"badge":17,"description":18,"image":19,"imageCredit":20,"backgroundImage":21,"links":22},true,"Recent News & Updates","Latest Posts","Stay updated with the latest news, events, interviews, and videos from our community.","/images/news.webp","Sebastian Pfütze","/images/news-hero-bg.png",[23,28],{"label":24,"to":25,"target":26,"type":27},"Subscribe","https://mailchi.mp/4337f0e3e319/9262kjy9ck","_blank","primary",{"label":29,"to":30,"type":31},"Contact Us","/contact","secondary",{},"/pages/news",null,{"title":5,"description":10},"pages/news","RH8LB8jvO_3y-r_SCCELEeG-ZVvaq9CqOUgvw53iVMY",{"id":39,"title":40,"author":41,"body":42,"category":53,"date":43,"description":46,"duration":34,"endTime":34,"extendedContent":34,"extension":55,"featured":56,"heading":45,"image":54,"location":34,"locationType":34,"mainContent":47,"meta":57,"navigation":15,"path":58,"registrationLink":34,"seo":59,"slug":44,"startTime":34,"stem":60,"tags":61,"videoUrl":34,"__hash__":62},"blogposts/blogposts/ai-summer-data-winter-what-the-ai-index-reveals-and-what-it-doesnt-yet-measure.json","Ai Summer Data Winter What The Ai Index Reveals And What It Doesnt Yet Measure","Stefaan Verhulst",{"date":43,"slug":44,"heading":45,"description":46,"mainContent":47,"author":41,"tags":48,"category":53,"image":54},"2026-04-09T12:00:00","ai-summer-data-winter-what-the-ai-index-reveals-and-what-it-doesnt-yet-measure","AI Summer, Data Winter: What the AI Index Reveals — and What It Doesn’t Yet Measure","The AI Index Report 2026, released this week by Stanford HAI, offers a compelling portrait of what can only be described as an ongoing AI Summer. The indicators are striking: rapid adoption reaching more than half the population within three years, surging investment, near-human performance across multiple domains, and widespread deployment in science, medicine, and the economy. By nearly every conventional metric — capability, capital, and diffusion — AI is accelerating. \n\nImage copyrights: Picture by Deborah Lupton / https://betterimagesofai.org / https://creativecommons.org/licenses/by/4.0/ ","\u003Cp id=\"6d72\" class=\"pw-post-body-paragraph mw mx ho my b mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt hh bg\" data-selectable-paragraph=\"\">The&nbsp;\u003Ca class=\"z om\" href=\"https://hai.stanford.edu/assets/files/ai_index_report_2026.pdf\" target=\"_blank\" rel=\"noopener ugc nofollow\">AI Index Report 2026,&nbsp;\u003C/a>released this week by&nbsp;\u003Ca class=\"z om\" href=\"https://hai.stanford.edu/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Stanford HAI\u003C/a>, offers a compelling portrait of what can only be described as an ongoing AI Summer. The indicators are striking: rapid adoption reaching more than half the population within three years, surging investment, near-human performance across multiple domains, and widespread deployment in science, medicine, and the economy. By nearly every conventional metric &mdash; capability, capital, and diffusion &mdash; AI is accelerating.&nbsp;\u003C/p>\n\u003Cp class=\"pw-post-body-paragraph mw mx ho my b mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt hh bg\" data-selectable-paragraph=\"\">\u003Cimg src=\"https://cms.thegovlab.com/assets/1e785412-9146-44af-ab54-2b4e682fcaf3.webp?width=1502&amp;height=1128\" alt=\"1 Da Fvd A5 Icm Yv0 Pd Oxkovg\">\u003C/p>\n\u003Cp>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;AI Index Report 2026 &mdash; \u003Ca class=\"z om\" href=\"http://v/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Figure 2.1.1\u003C/a>.\u003C/p>\n\u003Cp id=\"7c76\" class=\"pw-post-body-paragraph mw mx ho my b mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt hh bg\" data-selectable-paragraph=\"\">Yet, embedded within the report is a quieter but more consequential story: the deepening of a \u003Cem class=\"oo\">data winter\u003C/em>. Nowhere is this more clearly articulated than in the report&rsquo;s own section on the potential exhaustion of training data (page 25).\u003C/p>\n\u003Cp id=\"e7ba\" class=\"pw-post-body-paragraph mw mx ho my b mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt hh bg\" data-selectable-paragraph=\"\">The report notes growing concern among leading researchers that we may be approaching &ldquo;peak data&rdquo;&mdash;a point at which access to high-quality human-generated text and web data is effectively exhausted.&nbsp;\u003Ca class=\"z om\" href=\"https://epoch.ai/blog/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Some projections\u003C/a>&nbsp;suggest that this depletion could occur as early as sometime between 2026 and 2032. This is not a marginal issue. Data exhaustion directly challenges the scaling paradigm that has underpinned AI&rsquo;s recent breakthroughs. What appears as exponential growth in capability may, in fact, be approaching a structural ceiling&ndash;not due to limits in compute or model design, but due to constraints in&nbsp;\u003Cem class=\"oo\">data availability\u003C/em>. In other words, the AI summer may be running on finite fuel.\u003C/p>\n\u003Cp id=\"c392\" class=\"pw-post-body-paragraph mw mx ho my b mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt hh bg\" data-selectable-paragraph=\"\">The report further underscores that synthetic data &mdash; often proposed as a solution to data scarcity &mdash; has not yet proven to be a full substitute for real-world data, particularly in pre-training contexts. While hybrid approaches combining real and synthetic data can accelerate training, they do not surpass the performance of models trained on high-quality real data. Purely synthetic training, meanwhile, remains effective only in narrower or smaller-scale settings (e.g., for specialized RAG applications or sector-specific models). The implication is clear: the quality and diversity of real-world data remain irreplaceable at the frontier.\u003C/p>\n\u003Cp class=\"pw-post-body-paragraph mw mx ho my b mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt hh bg\" data-selectable-paragraph=\"\">\u003Cimg src=\"https://cms.thegovlab.com/assets/4eb77131-7cc7-44a6-a143-5d22fc6a2ecf.webp?width=2002&amp;height=1056\" alt=\"1 C Y71vfzyo Cksh Mmz Qd Ffyg\">\u003C/p>\n\u003Cp class=\"pw-post-body-paragraph mw mx ho my b mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt hh bg\" data-selectable-paragraph=\"\">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;AI Index Report 2026 \u003Ca class=\"z om\" href=\"https://documentcloud.adobe.com/gsuiteintegration/index.html?state=%7B%22ids%22%3A%5B%2211--f8eAaRl76UdFtkoPzrBvEOjhBM2t_%22%5D%2C%22action%22%3A%22open%22%2C%22userId%22%3A%22102870410349615907057%22%2C%22resourceKeys%22%3A%7B%7D%7D\" target=\"_blank\" rel=\"noopener ugc nofollow\">Figure 1.1.17\u003C/a>\u003C/p>\n\u003Cp class=\"pw-post-body-paragraph mw mx ho my b mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt hh bg\" data-selectable-paragraph=\"\">At the same time, the nature of available data is changing in ways that further complicate the picture. The report highlights that more than 50% of new online content may now be AI-generated (see Figure). This introduces a feedback loop in which models are increasingly trained on synthetic or derivative data, raising concerns about model collapse, degradation, and the erosion of informational diversity. The internet, once a vast reservoir of human-generated data, is becoming a recursive system of machine-generated outputs.\u003C/p>\n\u003Cp id=\"26b3\" class=\"pw-post-body-paragraph mw mx ho my b mz nb nc nd nf ng nh nj nk nl nn no np nr ns ta nt hh bg\" data-selectable-paragraph=\"\">Faced with these constraints, and often as a direct response to the &ldquo;\u003Ca class=\"z om\" href=\"https://sverhulst.medium.com/the-weaponisation-of-openness-toward-a-new-social-contract-for-data-in-the-ai-era-fb9f49ef6109\" rel=\"noopener\" data-discover=\"true\">weaponisation of openness\u003C/a>&rdquo; that has been associated with data-extractive behavior, data holders are adapting&mdash;not by expanding open data access, but by limiting it. This marks a critical transition: from an era of relatively open web-scale data to one of negotiated, exclusive, and often opaque data ecosystems.\u003C/p>\n\u003Cp id=\"844a\" class=\"pw-post-body-paragraph mw mx ho my b mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt hh bg\" data-selectable-paragraph=\"\">This enclosure is the essence of the data winter. As&nbsp;\u003Ca class=\"z om\" href=\"https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5660451&amp;__cf_chl_tk=Jejtw5pji.xEMo1qTGgYdumfzRuILfxCm2bNNP_ZhNM-1776106595-1.0.1.1-AX4rOuqw0LqdenFAgA5PbhBEaAINgzRK1N2ZyL3ub9Q\" target=\"_blank\" rel=\"noopener ugc nofollow\">I have written elsewhere\u003C/a>, the data winter stems is not from an absence of data but from the shrinking availability of&nbsp;\u003Cem class=\"oo\">accessible, high-quality, and reusable data,&nbsp;\u003C/em>especially for actors outside a small set of well-resourced organizations.\u003C/p>\n\u003Cp id=\"6d0d\" class=\"pw-post-body-paragraph mw mx ho my b mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt hh bg\" data-selectable-paragraph=\"\">While the report meticulously tracks compute growth, model performance, and investment flows, it does not systematically measure these enclosures and erosions of information openness (apart from 3 pages on data bottlenecks, pages 25&ndash;27). Nonetheless, several of its findings point directly to this dynamic. The increasing opacity of frontier models &mdash; where training datasets, methodologies, and even parameter counts are withheld &mdash; signals a broader enclosure of the data ecosystem. The dominance of API-based access models reinforces a shift to ownership and controlled usage. Meanwhile, the limited use of real-world data in domains such as clinical AI underscores how persistent governance and access barriers are already constraining impact, resulting in systems that do not generalize well, miss critical edge cases, and lack robust evidence of real-world effectiveness.\u003C/p>\n\u003Cp id=\"f1ff\" class=\"pw-post-body-paragraph mw mx ho my b mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt hh bg\" data-selectable-paragraph=\"\">All told, the report highlights that the AI ecosystem is experiencing unprecedented expansion in capability and adoption, while simultaneously facing growing constraints in its foundational resource (data). This creates a form of&nbsp;\u003Cem class=\"oo\">asymmetric acceleration,&nbsp;\u003C/em>marked by rapid progress within a concentrated set of actors but increasing barriers for the broader research, policy, and public interest communities.\u003C/p>\n\u003Cp id=\"05db\" class=\"pw-post-body-paragraph mw mx ho my b mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt hh bg\" data-selectable-paragraph=\"\">From a measurement perspective, this suggests the need to include data access more prominently in future AI indices. While the report&rsquo;s current format excels at capturing the dynamics of the AI summer, it lacks equivalent indicators for diagnosing the data winter. These indicators could include:\u003C/p>\n\u003Cul class=\"\">\n\u003Cli id=\"d167\" class=\"mw mx ho my b mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt oq or os bg\" data-selectable-paragraph=\"\">Accessibility of training and evaluation data for independent researchers\u003C/li>\n\u003Cli id=\"62f7\" class=\"mw mx ho my b mz ot nb nc nd ou nf ng nh ov nj nk nl ow nn no np ox nr ns nt oq or os bg\" data-selectable-paragraph=\"\">Availability of high-quality, real-world datasets for public interest applications\u003C/li>\n\u003Cli id=\"9f30\" class=\"mw mx ho my b mz ot nb nc nd ou nf ng nh ov nj nk nl ow nn no np ox nr ns nt oq or os bg\" data-selectable-paragraph=\"\">Degree of openness vs. enclosure in data ecosystems\u003C/li>\n\u003Cli id=\"35f1\" class=\"mw mx ho my b mz ot nb nc nd ou nf ng nh ov nj nk nl ow nn no np ox nr ns nt oq or os bg\" data-selectable-paragraph=\"\">Legal and governance barriers to cross-sector data sharing\u003C/li>\n\u003Cli id=\"e7f2\" class=\"mw mx ho my b mz ot nb nc nd ou nf ng nh ov nj nk nl ow nn no np ox nr ns nt oq or os bg\" data-selectable-paragraph=\"\">The presence (or absence) of data intermediaries enabling reuse.\u003C/li>\n\u003C/ul>\n\u003Cp id=\"af44\" class=\"pw-post-body-paragraph mw mx ho my b mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt hh bg\" data-selectable-paragraph=\"\">Metrics such as these&ndash;and many more&ndash;would provide a more complete picture of the AI ecosystem&rsquo;s true health and trajectory. The&nbsp;\u003Cem class=\"oo\">AI Index Report 2026\u003C/em>, while informative and in many ways accurate, also reveals, perhaps unintentionally, a big blind spot/gap in the researcher&ndash;and practitioner&ndash;landscape. Ultimately, responsible access to AI Ready data may determine whether this AI summer can endure.\u003C/p>",[49,50,51,52],"ai","aiindex","humanagency","measurement","blog ","https://cms.thegovlab.com/assets/b71f02f2-b81f-446d-b9fa-977eb618ce6c","json",false,{},"/blogposts/ai-summer-data-winter-what-the-ai-index-reveals-and-what-it-doesnt-yet-measure",{"title":40,"description":46},"blogposts/ai-summer-data-winter-what-the-ai-index-reveals-and-what-it-doesnt-yet-measure",[49,50,51,52],"lGHKkNT-XoNeDHpfCase6AA8dHjeBuqcegLZRDGKr0U",1776247492678]