The Role of Open Source and Open Data in Responsible Data Stewardship
Data stewards across industries have used a wide array of operational and technical means to enable responsible re-use of private-sector data and to create public value through the creation of data collaboratives. A key function of data stewardship, in fact, involves identifying a fit-for-purpose approach for partnership and community engagement.
Some data collaboratives are highly cooperative and maintain restrictive data access controls—the Gender Gaps in Urban Mobility project launched by The GovLab, UNICEF, Universidad del Desarrollo, Telefónica R&D Center, ISI Foundation, and DigitalGlobe with support from Data2X, is one such example. Among other datasets, that project leveraged call detail records (CDRs) that were analyzed in a secure environment by specific, pre-selected partners. This approach was the most responsible and feasible pathway to gain insight into the gendered aspects of urban mobility in Santiago, Chile, and to assess how those insights could inform more inclusive urban planning.
Other data collaboratives are built on less sensitive data and do not require equally restrictive data access controls. A recent post from Hal Varian, Google’s Chief Economist, reflects on these more open approaches available to data stewards for unlocking the public value of their company’s private-sector data. Varian specifically describes Google’s efforts to make its research, data, and code as “universally accessible and useful” as possible.
“There’s currently an ongoing debate about the value of data and whether internet companies should do more to share their data with others. At Google we’ve long believed that open data and open source are good not only for us and our industry, but also benefit the world at large.
Our commitment to open source and open data has led us to share datasets, services and software with everyone. For example, Google released the Open Images dataset of 36.5 million images containing nearly 20,000 categories of human-labeled objects. With this data, computer vision researchers can train image recognition systems. Similarly, the millions of annotated videos in the YouTube-8M collection can be used to train video recognition.”
Andrew Young is the Knowledge Director at The GovLab, where he leads research efforts focusing on the impact of technology on public institutions. Among the grant-funded projects he has directed are a global assessment of the impact of open government data; comparative benchmarking of government innovation efforts against those of other countries; a methodology for leveraging corporate data to benefit the public good; and crafting the experimental design for testing the adoption of technology innovations in federal agencies.