Over the past decade, as part of overall movement towards Open Science, there has been growing policy interest and public investment in enhancing access to data and in data intensive research. This parallels the rapid development of information and communication technologies and digital tools, which are being adopted across the scientific enterprise as a whole. With this comes the promise new scientific breakthroughs, greater transparency and reproducibility in scientific results and increased innovation with overall benefits for science and society. This promise is beginning to be realised but is also limited in many areas by the need for an appropriately skilled scientific workforce.
This report was commissioned by the OECD Global Science Forum to identify: the skills needs for data intensive science; the challenges for building sustainable capacity as these needs evolve; and, the policy actions that can taken by different actors to address these needs. The principle focus is on the requirements of public sector/academic research, with it being recognised that many of the skills that are required for data intensive research are transferable across different sectors, including business and industry. The focus is also on the specific requirements of post-graduate scientific research and related training rather than generic digital skills and the roles of undergraduate and school education. The role of education in relation to digital skills and competencies is addressed in many other reports from OECD and other organisations and this work is not repeated here.
Whilst this report was being finalised, the world underwent a dramatic transformation with the COVID-19 pandemic threatening the lives and livelihoods of millions of people. This has emphasised the increasing dependency of citizens, businesses, public authorities and scientific research on digital tools and data. Those with access to these tools and the skills and capacity to use them are generally better equipped to cope with the massive disruptions that the pandemic is creating. Data-intensive science is proving to be critical for mapping the course of the pandemic and informing policy-making. Integration of data and information from many different scientific domains and new software development are important for modelling and assessing the longer-term socio-economic effects. Digital tools are playing a major role in fundamental research and international collaboration to understand the functioning of the virus and develop diagnostics, therapies and vaccines. At the same time the capacity of some of these tools to collect, link and analyse personal data raises new ethical and legal concerns, with important implications for science and society.
The global research community has responded rapidly to COVID-19 to develop digital platforms that facilitate access to research methods and outputs, not only to other researchers but also to government, industry and the broader community. In as much as these initiatives are new, they also demonstrate that in other contexts the infrastructure required to provide open access to critical research is not yet in place. The success of the scientific response to COVID-19 largely depends on the existence of resources in appropriately curated, interoperable, and preserved states. A skilled workforce is needed to create and utilise these resources. This requires both digitally skilled research support professionals and digitally skilled researchers. Several of the case studies included in this report, including the ELIXIR bioinformatics network, are at the forefront of the research response to COVID-19 and there are important lessons that can be learned from their experiences in strengthening digital skills and capacity.
The COVID-19 pandemic highlights the importance and potential of data intensive science. All countries need to make digital skills and capacity for science a priority and they need to work together internationally to achieve this. To this end, the recommendations in this report are even more pertinent now than they were when they were first drafted in late 2019.
As we move into a new era of data-intensive science, expectations are high. Big data analysis and access to new forms of data are already providing important and novel scientific insights and driving innovation. As the ability to combine and analyse data from different domains increases, complex societal challenges, including those embedded in the United Nations’ Sustainable Development Goals, are becoming more amenable to scientific analysis. Data-intensive science has the potential to generate the new knowledge that is necessary to inform transformations to more sustainable and prosperous futures.
Digitalisation is transforming all fields of science, as well as other fields of scholarly research. In addition to opening up important new directions and avenues for research, it presents an important opportunity to increase the transparency, rigour, and integrity of research. Digital technologies and access to data are driving change, but human digital capacity and skills are likely to be the critical determinant of scientific success in the future.
This report builds on recent work from OECD and other organisations that analyses the digital workforce capacity that will be required across different industrial sectors, and examines the specific requirements of data intensive science. This encompasses all fields of science, including social sciences and humanities.
The analysis, which includes thirteen in-depth case studies, explores what actions to date have led to improvements in digital workforce capacity for data intensive research, and what further actions are required. This includes an assessment of how the digital workforce requirements for science differ from other sectors of society and the economy, concluding that there are unique conditions in science that are reflected in specific skills requirements. Some of these requirements are generic for science as a whole and some are specific to disciplines or domains of research.
There is a need for both digitally skilled researchers, who have a common set of foundational digital skills coupled with domain-specific specialised skills, and a variety of professional research support staff, including data stewards and research software engineers. Increasingly research is conducted in teams and the distribution of competencies within a team is highly variable, so it is difficult to be prescriptive as to what should be expected of researchers and what can be best provided by support services. This will vary across different research domains. However, it is increasingly recognised that data intensive science requires not only technical skills but also people-focused skills, such as communication and team working. In many fields there is also a need for ethical and legal expertise, particularly when sensitive data is being used.
There are five key action areas that need to be addressed in parallel in order to build and maintain digital workforce capacity for science (see FIGURE ES1 below). Multiple actors, including governments, research funders, science associations, research institutions, and universities, need to work together across these areas.
Figure ES1. Five key action areas and goals for digital research workforce capacity development
Important actions that national governments can take include:
Actions that research agencies can take include:
There are also a range of broader policy actions that national governments and/or research funders can take, with regard to education, open science, scientific integrity and research evaluation and assessment which provide an enabling environment for data intensive science and reinforce efforts to strengthen the digitally skilled research workforce.
Universities are the main centres of tertiary education, training, and public research in most countries and hence have a central role to play in building sustainable digital research capacity. Whilst universities have considerable autonomy, they are also responsive to the mandates and incentives that governments and research agencies provide. Universities can take a number of actions in each of the five key areas that need to be addressed to strengthen digital workforce capacity and skills for data-intensive science. These include:
A more detailed description of actions that universities can take is given in chapter 7. Some of these actions can be built on existing structures, e.g. university libraries can provide a focus for facilitating the development of data management skills and computing departments can help to propagate software and coding skills across the research endeavour. Other actions will require more systemic structural and cultural changes.
A number of other actors, including science associations and academies, research institutes, and research infrastructures have an important role to play, particularly in relation to community building and training provision and more detailed recommendations for all actors are included in Sections 6 and 7 of this report. There is also an important role for private sector actors to play both in the provision of training and in working together with public sector partners to define and address digital research capacity needs.
Many countries and institutions are already implementing some of these recommendations and there are considerable opportunities for mutual learning. A general recommendation for all actors is to engage in international collaboration in this area and share materials and experiences.