Monitoring the open access policy of Horizon 2020
Athena Research & Innovation Center, PPMI and UNU-MERIT
Open access to publications, as well as the Open Research Data Pilot (ORDP), have been key policies throughout Horizon 2020. To further strengthen Open Science and integrate it into all programmes within Horizon Europe, the European Commission has commissioned a study to: (i) measure the compliance of the existing policy under Horizon 2020; (ii) investigate which aspects of the policy have worked and which have not, in order to plan future interventions; and (iii) pilot all aspects of a monitoring mechanism, providing lessons learnt that can be used to potentially optimise the European Commission’s internal monitoring platform.
The key findings of this study indicate that the European Commission’s leadership in the Open Science policy has paid off. Uptake has steadily increased over the past four years, achieving an average success rate of 83% in Horizon 2020 for open access to scientific publications, which places the European Commission at the forefront globally. What is also apparent from the study is that monitoring – particularly with regard to the specific terms and requirements of the policy – cannot be achieved by the reporting alone, or without the European Commission collaborating closely with other funding agencies across Europe and beyond, to agree on and promote common standards and common elements of the underlying infrastructure. In particular, the European Open Science Cloud (EOSC) should encompass all such components that are needed to foster a linked ecosystem in which information is exchanged on demand and eases the process for both researchers (who only need to deposit once) and funders (who only need to record information once).
A key objective in the study was to use an open, transparent and re-producible methodology, as a driver to improve the operationalisation of the open access monitoring in the EC, with rules that can be shared with and accepted by the research community. To authoritatively assess the open access compliance among Horizon 2020 peer-reviewed publications and datasets, and to assess the specificities of Article 29.2 and 29.3 of the MGA, this study has explored and combined a number of public and proprietary datasets. For the first time, open data sources were considered as the primary sources for such monitoring (OpenAIRE https://www.openaire.eu/, Unpaywall https://unpaywall.org/, CrossRef https://www.crossref.org/, OpenAPC https://openapc.net/, DataCite https://datacite.org/, ORCID https://orcid.org/, DOAJ https://doaj.org/, re3data https://www.re3data.org/ to name a few). These were then validated against proprietary databases (Scopus https://www.scopus.com/home.uri and WoS https://webofknowledge.com) as secondary sources, when necessary.
Despite the fact that working with and merging open sources often proved to be a painstaking process, it was one that ultimately proved flexible and agile, and allowed the study team to interact with the community and propose changes to the underlying public infrastructure. Moreover, most of the data and metadata contained in the open sources proved to be of good quality, justifying the adoption and curation of open, community-driven standards.
Fully in line with Open Science practices, the study produced an authoritative and open access database covering all aspects related to Horizon 2020 publications, accompanied by a data management plan and detailed documentation. All three are deposited in Zenodo under a CC-BY license and are also published in the European Commission’s portal. The Zenodo links are as follows: database https://zenodo.org/record/4899767, documentation https://zenodo.org/record/4900100, and data management plan https://zenodo.org/record/4900110.
Efficiency of the Horizon 2020 open access policy
Overall, the estimated level of compliance to the open access mandate for scientific publications under Horizon 2020 stood at 83%, which is within the top open access success rates of funders globally. Compliance and uptake of open access to research data have a success rate of 95%. Compliance refers to adherence to the regulations set out in Article 29.3 for projects that had to comply to the article (those in the ORDP), and uptake refers to compliance to Article 29.3 by all projects, whether they had to comply or not. This achievement is doubly impressive when considering the context in which the policy is implemented: a decentralised European environment in which Member and Associate countries have different policies and infrastructures (or lack thereof). With a clear upward trend in publications (from 65% in 2014 to 86% in 2019), and a commitment to the policy from projects that participated in the Open Research Data Pilot (ORDP) the potential exists to reach 100% within the early stages or midway through Horizon Europe.
When we compare it with other research funders, Horizon 2020 is in the top of funders in terms of the level of open access achieved. In terms of the percentage of publications that are openly accessible, Horizon 2020 came 12th out of the 47 non-discipline specific funders included in the analysis. On average, Horizon 2020 performs better than some of the largest non-discipline specific research funders in Europe (Switzerland, Sweden, Germany, Italy, Spain, Ireland, Portugal) and some of the largest in the US (e.g., the National Science Foundation [NSF]). At the same time, the percentage of publications under Horizon 2020 that were openly accessible was somewhat lower compared with some of the largest research funders in the Netherlands, Hungary, Denmark, Austria and Belgium, which have a similar tradition in open access policies but accompany this with well-established and connected national infrastructures.
In terms of article processing charges (APCs), we estimated the average cost of a ‘gold’ open access article to be around EUR 2,200. ‘Hybrid’ open access articles, a category that will no longer be reimbursed under Horizon Europe, have a higher average cost of EUR 2,600. Our analysis of six large research funders showed that, on average, APCs under Horizon 2020 were similar to the average for other funders in Europe and USA for which the required data was available.
Qualitative evidence also revealed some key sources of inefficiencies, as well as potential areas for improvement in the efficiency of the Horizon 2020 open access policy. To increase open access to research outputs, some beneficiaries expressed a need to fund the article processing charges (APCs)/book processing charges (BPCs) for post-project publications that resulted from the grant activities. In many cases, a publication based on Horizon 2020 activities are actually published after the project has formally ended (this is particularly common in the humanities and social sciences, where books and book chapters are common research outputs). In addition, one of the key sources of financial cost-inefficiencies relates to a lack of awareness and knowledge on the part of beneficiaries with regard to Horizon 2020’s open access requirements. In some cases, project budgets were used to cover APC costs because at the time, beneficiaries were unaware of alternative open access routes. The available evidence also confirms that excluding APCs for hybrid journals from eligible costs under Horizon Europe may prove to be a measure to increase the cost-efficiency of the programme’s open access policy. This is something that needs to be closely monitored, as even though current data indicates that hybrid options incur considerably higher average APCs compared with fully open access journals, future costs will heavily depend in shifts in the publishing market power brought in by the transition of closed/hybrid journals to Open Access and new publishing platforms such as Open Research Europe. https://open-research-europe.ec.europa.eu/
Our study produced specific indicators to assess the full openness and ‘FAIR-ness’ of Horizon 2020 results:
- Licensing: 49% of Horizon 2020 publications were published using Creative Commons (CC) licences, which permit reuse (with various levels of restrictions) while 33% use publisher-specific licences that place restrictions on text and data mining (TDM). Another 18% of open access publications (mainly in institutional repositories) come with no licence, which effectively translates into non-legal use for TDM purposes. This calls for further policy action, as real open access should not place any obstacles in the way of both human and machine readability. Concerning research data, things are more straightforward (no publishers in the mix) with a compliance level to depositing datasets with an open license of 65% (CC licences).
- Accessibility and interoperability: Institutional repositories have responded in a satisfactory manner to the challenge of providing FAIR access to their publications, amending internal processes and metadata to incorporate necessary changes: 95% of deposited publications include in their metadata some type of persistent identifier (PID); a rate of 73% accessibility and interoperability has been observed – i.e., correctly identifying a full text from the metadata (accessibility), and being able to fetch it via a known protocol (interoperability). Datasets in repositories, on the other hand, present a low compliance level as only approximately 39% of Horizon 2020 deposited datasets are findable, (i.e., the metadata includes a PID and URL to the data file), and only around 32% of deposited datasets are accessible (i.e., the data file can be fetched using a URL link in the metadata).
Effectiveness of the Horizon 2020 open access policy
On average, the open access rate among Horizon 2020 publications has increased steadily over the duration of the programme, from just over 65% of peer-reviewed publications being open-access in 2014, to 86% in 2019. The effectiveness of the policy, however, differed somewhat between Horizon 2020 programmes. The highest shares of open access publications were found in the European Research Council (ERC) and ‘Science with and for Society’ programmes, while the lowest shares were in ‘Euratom’, ‘Industrial Leadership’, and ‘Spreading Excellence and Widening Participation’. Evidence also confirms that open access under Horizon 2020 varied according to scientific fields and specific disciplines. The percentage of open access publications was highest within medical and health sciences, as well as natural sciences, but lower within the agricultural and veterinary sciences, engineering and technology, social sciences, as well as humanities and arts. In some cases, variation under Horizon 2020 also existed at the level of specific disciplines within particular scientific fields.
On the ORDP front, our findings indicate an uptake and compliance success rate of 95%. Variations exist in compliance between programmes, although in most cases the level remains well above 90%. The three pillars with the most significant production are Societal Challenges (in proportion to the number of projects, this pillar generates twice as many datasets as the others); Excellent Science; and Industrial Leadership.
Qualitative evidence also reveals that, in general, Horizon 2020 projects become increasingly compliant with open access requirements over the project’s life cycle. This is mainly due to effective communication, feedback and support provided by project officers to beneficiaries, which helps them to meet the open access requirements by the time the project ends.
Study evidence shows that the key result and benefit of the Horizon 2020 open access policy is wider outreach and dissemination of research work across different fields and to the general public. Furthermore, the policy led to learning effects: fulfilling their open access obligations under Horizon 2020 led to increased awareness and knowledge among beneficiaries with regard to the concepts and principles that underpin Open Science, and improved their related skills. Lastly, at organisational and system level, the Horizon 2020 open access policy has produced spill-over effects by encouraging other European research funders and institutions to adopt similar open access policies and measures.
Monitoring open access
One of the objectives of the study was to identify the Horizon 2020 open access monitoring workflow, including the key steps, tools and actors involved in the monitoring process. The Horizon 2020 open access monitoring workflow is based on two essential instruments and data sources: automated monitoring and tracking of metadata on research outputs through the OpenAIRE platform, and the continuous self-reporting procedures followed by Horizon 2020 Project beneficiaries using the SyGMa portal.
Our quantitative analysis and interviews with stakeholders identified gaps in the existing Horizon 2020 open access monitoring data, which pose further difficulties in assessing compliance. More specifically: key metadata were not systematically provided by repositories (e.g., peer-review status of publications/ publication release dates/submission history/publication versioning); data displayed to grantees are in many cases of poor quality, mainly due to the lack of consistent and rigorous data entry practices among many publishers and repositories; partial coverage of emerging repositories and publishers, particularly in specialised sectors/domains; non-clarity of the different versions of the same publication; delays of appearance of open access publications in OpenAIRE and SyGMa.
Self-reporting by beneficiaries also highlighted a number of issues relevant to compliance checking and the assessment of indicators, mainly focusing on the facts that (i) some publications are not reported at all – particularly as beneficiaries do not keep reporting after a project has ended, and (ii) the poor quality of metadata entered by beneficiaries which makes them unreliable and unusable. The latter includes lack of the systematic use of valid digital object identifiers (DOIs) and other valid PIDs; missing links between publications and datasets; data on embargo periods for both publications and datasets being poorly provided or unclear; as well as missing information about the tools and instruments at the disposal of the beneficiaries and necessary for validating the results. One of the main reasons for this is that researchers are very often not fully aware of the semantics and the scope of many open access-related concepts, such as the differences between ‘gold’ and ‘green’ open access; embargo periods; DOI; repository links, etc. In many cases, it is impossible for project officers to check if a deposited publication has been made open access within the maximum allowable time limit (at most 6-12 months).
Gaps and challenges relating to the monitoring of (open access) research data resulting from Horizon 2020 projects largely resulted from a lack of data management skills and knowledge among beneficiaries. Beneficiaries are often not methodical or meticulous about precisely what type of data to open up (raw vs. annotated vs. processed); what accompanying documentation should be included; and what existing data protection regulations apply. Frequently, data management plans (DMPs) are very rudimentary because researchers do not understand some of the key underlying principles, such as FAIR. In addition, datasets may sometimes be very large and complex. Storing them and maintaining them in an openly accessible form might require a great deal of storage space and/or qualified staff, which may pose significant financial burdens on the research teams.
In addition to the development of a comprehensive list of open access indicators for both publications and datasets, one of the key inputs required to re-engineer the existing open access monitoring framework was the identification of key principles and stakeholder expectations regarding the next-generation Horizon Europe open access monitoring framework. One key expectation is that the Horizon Europe monitoring framework should allow the possibility of checking in real-time the publications resulting from it (including, for example, filtering information by type of publication, discipline, etc.). The scale of the next-generation Horizon Europe open access monitoring framework is also expected to be expanded, incorporating more diverse types of research outputs in addition to publications (e.g., software, prototypes, etc.). Its scope is also expected to expand beyond the direct outputs of the programme, to include medium-term and long-term indicators focusing on the uptake of open access outputs and their impacts on the creation of new research networks.
Based on an analysis of gaps in the previous monitoring framework, the study has prepared a number of recommendations that address various issues relating to gaps in open access data /monitoring process. These include recommendations (listed below) on improving the integration of OpenAIRE into the European Commission’s SyGMa reporting tool, addressing the processes relating to open access self-reporting by beneficiaries, and regarding the monitoring of open data.
- Update the OpenAIRE guidelines for repositories and increase the adoption of the OpenAIRE metadata standard among repositories.
- Streamline internal procedures within OpenAIRE Graph to reduce delays in transferring data to the SyGMa reporting tool.
- Organise training sessions for beneficiary principal investigators, focusing on the general principles underpinning open access in Horizon Europe, as well as the requirements and reporting process.
- Prepare a concise ‘one-stop source’ manual/guidelines for beneficiary principal investigators/project managers/support staff, explaining the key steps in the Horizon Europe open access reporting process.
- In the case of manual self-reporting by beneficiaries, implement technical safeguards at the data submission stage in the SyGMa reporting tool, to address the issue of beneficiaries incorrectly filling in metadata fields when self-reporting.
- Deliver regular reminders to the project beneficiaries for several years after the project has ended, calling on them to report the project outputs on the Participant Portal, to increase the level of post-project open access reporting.
- Improve the quality of open research data management in Horizon Europe projects, by encouraging the inclusion of skilled personnel and by providing guidance and common templates.
- Disseminating the existing DMP good practice examples to beneficiaries at the beginning of their projects.
- Develop clear and comprehensive guidelines describing what type of data should be opened up (raw vs. processed), and what documentation should accompany open access research datasets.
Luxembourg: Publications Office of the European Union, 2021
© European Union, 2021