Grants

Wikimedia Foundation

To transform Wikipedia Commons' media files from free text into machine-readable, structured data, enabling new uses for millions of media files on Wikipedia and across the web

  • Amount $3,015,000
  • City San Francisco, CA
  • Investigator Katherine Maher
  • Year 2016
  • Program Digital Technology
  • Sub-program Universal Access to Knowledge

Wikimedia Commons is the world's largest repository of freely licensed educational media, with 34 million photo, video, and audio files, and is growing by some five million files a year—faster than Wikipedia itself—as people submit photos and image-rich institutions their collections. Unfortunately, most of those files are not accessible either to Wikipedia text searches or to the rest of the internet because they lack good metadata. To address the lack of metadata, the Wikimedia Foundation has launched the Structured Data on Commons Project, an ambitious attempt to create infrastructure and tools that will transform all the media files on Wikimedia Commons into an accessible form—known as structured, linked data—that is machine readable and will enable easy search of the Commons by Wikipedia readers and contributors; by educational, cultural, and scientific organizations; and by anyone with access to the web. Once cleaned and integrated, the structured data for each file can be understood by machines and linked to other content on the wider internet. The structured data can also be instantly available in any language, answering a huge need for the 289 languages that comprise Wikipedia and facilitating greater interoperability among language communities. Structured data will also allow developers both within and outside Wikipedia to create software tools to help with use and reuse of these files. It will help contributors more effectively illustrate Wikipedia content and it will enable readers to more quickly and easily find the right media and share it. It will also allow for more partnerships with content providers and provide incentives for these providers to structure their media when releasing it to the public.

Back to grants database