Recherche Data Gouv, the federated national research data platform

News from committee
15/09/2021

Recherche Data Gouv: a national federated research data platform will be available in the first premier quarter of 2022 as the Minister announced on the occasion of the launch of the Second National Plan for Open Science.

To respond to the challenges linked to opening data and a minima to citable data, the strategy is to rely on French strong points namely: disciplinary ‘big science’ research infrastructures with a strong international presence supported in the framework of Equipex+ (EBI, GEO, CDS-ESO, etc.); five establishments with strong experience in institutional data repositories particularly the National Research Institute for Agriculture, Food and Environment (INRAE).

The aim is to provide all researchers who do not have a trusted repository for their data with a multi-disciplinary repository. This sovereign solution will provide an alternative for researchers who have no other choice than to deposit their data in the repositories of the publishers of their articles. This initiative also aims to be part of the international landscape which is currently evolving because of work on structuring the EOSC and the considerable investments in research data made in Germany and the Netherlands.

A controlled sovereign solution for data dissemination
Support services for researchers

The Recherche Data Gouv platform is made up of 5 modules aimed at providing researchers with:

  • A deposit and dissemination service (repository) dedicated to data for which none of the existing disciplinary repositories could provide a suitable solution; this service will be hosted in a labelled national data centre;
  • A catalogue of French research data deposited in national or international thematic and disciplinary repositories;
  • Data support services:
    • Data workshops: these are set up in the framework of site policies and provide researchers with a first level of expertise and services for the preparation and dissemination of data;
    • Thematic reference centres: these support the data workshops by designing and implementing thematic/disciplinary reference systems for individual scientific fields. They are supported by national research organisations and/or infrastructures. The reference centres provide the disciplinary expertise required to determine the best practices defined by the scientific community, such as the embargo period, data description standards, disciplinary/thematic repositories, etc., in compliance with the international ecosystem;
    • Resource centres attached to Recherche Data Gouv: these provide various services linked to the national generic data repository, the catalogue, e-learning, the assignment of unique identifiers for datasets (DOIs), data management plan tools, etc.

Services available as of March 2022

The project has a 3-year working schedule:

  • From the start of 2022: the progressive launch of data workshops according to their creation on different sites throughout France
  • March 2022: centralised services
    • Opening of the data repository and catalogue
    • Resource centre for the repository and catalogue
    • Resource centres for support services, training, communication about the appropriation of services for data deposit and identifiers, etc.
  • 2022-2023: data feeding campaigns run by the data workshops and publishing and open archive actors
  • 2023-2024: harvesting of national and international data repositories to flag disciplinary/thematic data

A system developed by and for the research community

The INRAE has been asked to manage the ‘repository’ and ‘catalogue’ modules as it is the actor with the most experience in the generic warehouses in France. The INRAE has developed multidisciplinary expertise by disseminating data from different scientific fields. Over the past 5 years, it has used Dataverse, the open source solution developed by Harvard and widely adopted by the French community.

The repository and catalogue modules will be based on the service currently provided within the INRAE which will be adapted to the needs of the whole community.

The INRAE team of 8.2 full time equivalent (FTEs) employees provides expertise for the national project alongside 5 FTEs from other institutions that also have repository expertise -University of Grenoble Alpes, University of Lille, Université de Lorraine, Université de Paris, University Paris Nanterre, University of Strasbourg and the CNRS.

These institutions will be part of the governance of the project.

The data workshops will be based on existing initiatives or developed by data experts who work closely with the researchers concerned and will offer a first level of expertise at the local scale. They will be managed in association with universities, schools and research organisations in the framework of site policies. The first call for expressions of interest will be launched in September 2021 with the first data workshops scheduled to be labelled in January 2022. Three calls per year will then be organised in 2022 and 2023.

Subsidiarity, trust, sharing and visibility: the guiding principles of Recherche Data Gouv

  • Principle of subsidiarity with existing national or international disciplinary repositories
  • Principle of subsidiarity in providing support for researchers and in the moderation of repositories: data workshops on the sites
  • Principle of recognition of data producers
  • Depositing research data whether associated with publications or not
  • Quality platform: data curation
  • Trust platform (Core Trust Seal)
  • Pooling resources for the modules: warehouse, catalogue and resource centres
  • The visibility of the production of institutions (logo, etc.), research structures and researchers
  • Monitoring and reporting indicators (number of data sets deposited, downloaded, etc.) by institution, scientific field, etc.
  • International visibility
  • Integration into the national and international research data ecosystem and the French contribution to the EOSC.