Grants

Scuola Superiore di Studi Universitari e di Perfezionamento Sant'Anna

To design and implement in Software Heritage new algorithmic solutions for compression and code-based searches

  • Amount $624,947
  • City Pisa, Italy
  • Investigator Paolo Ferragina
  • Year 2025
  • Program Technology
  • Sub-program Open Source in Science

This initiative aims to enhance the Software Heritage (SWH) archive, the world's largest collection of software source code, by developing advanced compression and search capabilities. This work is particularly timely, given the increasing use of generative AI in software development and maintenance, but also in the use of code to train better-performing generative AI tools. SWH currently stores code using basic compression and search functionalities, which limit users to search only by metadata rather than within the actual code. The project will pursue two parallel research tracks: creating more efficient lossless compression methods and developing novel "code-to-code" search functionality. These improvements would enable researchers to perform sophisticated searches within the SWH codebase and make its computational infrastructure more sustainable and scalable. The enhancements could empower diverse applications. Beyond improving SWH itself, this research has the potential to benefit the broader open-source ecosystem by lowering barriers to using SWH in training next-generation AI models, providing essential tools for verifying provenance of code (possibly AI-generated), and supporting new cybersecurity applications.

Back to grants database
We use cookies to analyze our traffic. Please decide if you are willing to accept cookies from our website.