The report summarizes the state of the art of research software infrastructures, identifies best practices, as well as open problems.  Concrete recommendations are formulated for a global architecture of infrastructures that will allow EOSC to put software source code on a par with articles and data.  

Scholarly infrastructures for research software

Report from the EOSC Executive Board Working Group (WG) Architecture Task Force (TF) SIRS

Edited by: the EOSC Executive Board

December 2020

Executive Summary

The TF on Scholarly Infrastructures of Research Software, as part of the Architecture WG of the European Open Science Cloud (EOSC) Executive Board, has established a set of recommendations to allow EOSC to include software, next to other research outputs like publications and data, in the realm of its research artifacts. This work is built upon a survey and documentation of a representative panel of current operational infrastructures across Europe, comparing their scopes and approaches.

This report summarizes the state of the art, identifies best practices, as well as open problems, and paves the way for federating the different approaches in view of supporting the software pillar of EOSC.

As the fuel of innovation, the engine of our industries and a fundamental pillar of academic research, software is a necessary component of modern scholarly research. Hence, software developments emerge across many fields and disciplines. Unfortunately, often forgotten is the important fact that software is actually a special form of knowledge, designed by humans to be read by humans, executed by machines, in the form of software source code. Software source code allows the description of data visualisation, data analysis, data transformation, and data processing in general with a level of precision that goes way beyond what can be achieved in scholarly articles. It is now well recognized that without access to the software used in research projects, it is extremely difficult to reproduce scientific results, and to build upon the results obtained by other researchers.

Over the past decade, awareness has been raised about the importance of software in the scholarly world. Several infrastructures have started to be built, or adapted, to address some of the following key challenges that need to be tackled to put software on equal footing with other research outputs in the scholarly world:

  1. Archiving software to ensure research software artifacts are not lost.
  2. Referencing software to ensure research artifacts can be precisely identified.
  3. Describing software to easily discover and identify research software artifacts.
  4. Crediting all authors to ensure their contributions are recognized.

To start addressing these challenges, the TF was formed by representatives of the EOSC Architecture WG together with representatives from current operational infrastructures across Europe (presented in Section 2.2 Infrastructures Participating in the TF). The TF covers the full spectrum of archives, publishers, and aggregators (including catalogues) and is considered a representative panel based on their wide-ranging experience in addressing some of the challenges involved in building the four pillars.

The TF considers that addressing these needs will require establishing standards, developing tools, improving and interconnecting infrastructures, training, outreach, and involvement with the publishing community. Proper funding will need to be provided both for the development, communication, and outreach efforts, and for the operational costs.

The TF concretely delivered a set of recommendations that emerged from the analysis of the current needs and state of the art, and the design of the future architecture. They include short term actionable items, broader policy recommendations for the EOSC, as well as a longer-term perspective.

Short term recommendations are foreseen to be turned into concrete development projects in a 2–4-year time-frame. The concrete recommendations detailed at the end of this report have the objective to (i) strengthen interactions between archives, publishers, and aggregators, (ii) adopt metadata standards, (iii) generalize the use of extrinsic and intrinsic identifiers for software, (iv) ensure appropriate citations for research software source code, (v) foster standardization through policy and guidelines, and (vi) ease adoption of the processes and tools for the research community at large.

The TF foresees that the EOSC has a key role to play in ensuring the overall architecture will be built in a way to best cater to the needs of the research community. To ensure openness, transparency, and good governance, the EOSC should elaborate a set of criteria of excellence, incorporating these principles, for its participating infrastructures, and should provide concrete recommendations. Additionally, the EOSC should actively get involved with the key infrastructures for software, take part in their strategic evolution and earmark proper funding to ensure their long-term sustainability.

The longer-term perspectives include objectives that should be taken up in the roadmap to be addressed over a 4–7-year horizon. Of importance is the development of advanced technology, such as open plagiarism detection technology and advanced search engines for software source code. Moreover, technology and tools should be explored to address a proper integration between different research outputs: articles, data, and software.

Lastly, the TF strongly recommends including a clause in all future research funding programs to request research software is made available under an Open Source license by default, and that all deviations from this default should be duly motivated. While EOSC subscribes to the general statement that all research output should be “as open as possible, as closed as necessary” it is believed that stimulating this default is needed for software to be put on equal footing with other research outputs.

The consultation period ran from October 21 until November 10. All comments received were considered.

 

European Commission
Directorate-General for Research and Innovation

© European Union, 2021

Table of Contents

1 Executive summary

2 Introduction

2.1 Scope and Goals

2.1.1 Archive, Reference, Describe, Credit: The Four Pillars

2.2 Infrastructures Participating in the TF

2.2.1 Archives
2.2.2 Publishers
2.2.3 Aggregators

3 State of the Art

3.1 Survey on Related Initiatives and Related Works

3.1.1 Archives
3.1.2 Publishers
3.1.3 Aggregators

3.2 Summary of the State of the Art Presentations in the Group

3.2.1 Archives
3.2.2 Publishers
3.2.3 Aggregators

3.3 Best Practices and Open Problems

3.3.1 Best Practice Principles for Archives
3.3.2 Best Practice Principles for Publishers
3.3.3 Best Practice Principles for Aggregators

3.4 Cross-cutting Concerns

3.4.1 Metadata
3.4.2 Identifier
3.4.3 Quality and Curation
3.4.4 Metrics
3.4.5 Guidelines
3.4.6 Tools and Workflows

4 The Road Ahead

4.1 General Requirements

4.1.1 Archive
4.1.2 Reference
4.1.3 Describe
4.1.4 Cite/Credit
4.1.5 Easing Adoption

4.2 Exemplarity Criteria for Participating Infrastructures

4.2.1 Accommodating Innovation

4.3 Possible Workflows

4.3.1 Self-Archiving
4.3.2 Scholarly Publication with Associated Source Code
4.3.3 Aggregators

5 Recommendations

5.1 Funding Development of Tools, Standards, and Guidelines

5.1.1 Interactions
5.1.2 Metadata About Software
5.1.3 Identifiers
5.1.4 Credit
5.1.5 Policy/Guidelines
5.1.6 Easing Adoption

5.2 Broader Policy Recommendations for the EOSC

5.2.1 Criteria of Excellence, and Sustainability of the Architecture

5.3 Longer Term Perspectives

5.3.1 Advanced Technology Development
5.3.2 Policy

6 Annexes

6.1 Glossary

6.2 Bibliography

6.3 Task Force Participants

6.3.1 Roberto Di Cosmo (Chair TF SIRS)
6.3.2 Jose Benito Gonzalez Lopez (Co-Chair TF SIRS)
6.3.3 Jean-François Abramatic (Chair WG Architecture)
6.3.4 Kay Graf
6.3.5 Miguel Colom
6.3.6 Paolo Manghi
6.3.7 Melissa Harrison
6.3.8 Yannick Barborini
6.3.9 Ville Tenhunen
6.3.10 Michael Wagner
6.3.11 Wolfgang Dalitz
6.3.12 Jason Maassen
6.3.13 Carlos Martinez-Ortiz
6.3.14 Elisabetta Ronchieri
6.3.15 Sam Yates
6.3.16 Moritz Schubotz
6.3.17 Leonardo Candela
6.3.18 Martin Fenner
6.3.19 Eric Jeangirard