The Role of Open Source and Open Data in Responsible Data Stewardship

Data stewards across industries have used a wide array of operational and technical means to enable responsible re-use of private-sector data and to create public value through the creation of data collaboratives. A key function of data stewardship, in fact, involves identifying a fit-for-purpose approach for partnership and community engagement.

Some data collaboratives are highly cooperative and maintain restrictive data access controls—the Gender Gaps in Urban Mobility project launched by The GovLab, UNICEF, Universidad del Desarrollo, Telefónica R&D Center, ISI Foundation, and DigitalGlobe with support from Data2X, is one such example. Among other datasets, that project leveraged call detail records (CDRs) that were analyzed in a secure environment by specific, pre-selected partners. This approach was the most responsible and feasible pathway to gain insight into the gendered aspects of urban mobility in Santiago, Chile, and to assess how those insights could inform more inclusive urban planning.

Other data collaboratives are built on less sensitive data and do not require equally restrictive data access controls. A recent post from Hal Varian, Google’s Chief Economist, reflects on these more open approaches available to data stewards for unlocking the public value of their company’s private-sector data. Varian specifically describes Google’s efforts to make its research, data, and code as “universally accessible and useful” as possible.

From the post, Open Source and Open Data:

“There’s currently an ongoing debate about the value of data and whether internet companies should do more to share their data with others. At Google we’ve long believed that open data and open source are good not only for us and our industry, but also benefit the world at large.

Our commitment to open source and open data has led us to share datasets, services and software with everyone. For example, Google released the Open Images dataset of 36.5 million images containing nearly 20,000 categories of human-labeled objects. With this data, computer vision researchers can train image recognition systems. Similarly, the millions of annotated videos in the YouTube-8M collection can be used to train video recognition.”

Read more here.