Data Needed to Identify Plan S Compliance
Delta Think, Inc., Jisc, Science Europe, cOAlition S – 6 February 2020
This report was commissioned by Jisc on behalf of Science Europe. It is the result of a project which examined:
- the data needed for authors to identify Plan S-compliant publication venues;
- open questions about data needed for compliance as of December 2019;
- the readiness of key data sources to provide the data; and
- a view of gaps in the short and medium term, and how to fill them.
The project’s scope covers the data needed by a third party to produce an author-facing tool that will allow an author to identify publication venues before submission. The tool itself is out of scope, and is being looked into via a separate project running parallel to this one. This report and its accompanying analysis are intended to feed into the tool’s Statement of Requirements (or similar). The Appendix – Project Terms of Reference recaps this report’s terms of reference. A Project Steering Group comprising representatives from Jisc and cOAlition S oversaw the project, and included Science Europe as an observer.
Plan S requirements indicate a clear aspiration, but not all are sufficiently detailed in providing for a technical specification. In analysing the data requirements, we therefore encountered questions about some of the details behind them. We used our regular contact with the steering group throughout the project to clarify what approaches we should use, or to capture open questions. We had a limited number of hours available for the work, so we agreed a prioritised list of stakeholders we interviewed as part of our investigations. This report represents the results of our discussions, the process of analysis we undertook, and response to the steering group’s request for our independent views. The work was undertaken in the last quarter of 2019, and so its results represent a snapshot of activities at that time. Details about Transformative Journals, and Information Power’s report about price transparency were published during the course of the project. The implementation of Plan S continues to evolve.
The report consists of five major sections. “Assumptions and Questions” summarises the results of our discussions with the Project Steering Group, covering recommendations and questions arising. The next two sections are technical: “Analysis of Data Needed” identifies common threads and generic structures; the “Data Specification” section goes into specific details of the data. The specification should be read in conjunction with the detailed spreadsheet accompanying this report, JISC Plan S Data Spec.xlsx. “Assessment of Sources” examines what data sources are available, based on the priorities we were given. “Conclusions and Recommendations” presents our requested views on next steps.
A summary of our findings is as follows.
The data about compliance should allow multiple levels of detail for a given publication venue. As well as indicating whether a venue is Plan S compliant overall, the data structures capture how it measures up for each of the four routes to Plan S compliance, and in turn how each specific requirement contributes to each route. For example, a journal may be compliant via the fully OA route because it is in the DOAJ, has appropriate editorial policies, offers the correct licenses, and so on. A tool built on this data structure then has flexibility in how much detail it presents to the end user.
No data sources currently include all the data needed to determine plan S compliance. Some key requirements can therefore not be measured without further work, and some are ambiguous. We suggest that cOAlition S should take a phased approach to enforcing requirements where data sources are currently unworkable:
- No industry-standards or sources exist for information about publishing statistics.
- No industry-standards or sources exist for information about publishing prices and costs. The Plan S requirements are unclear about exactly what information is required. (Although we note the work by Information Power and the Fair Open Access Alliance.)
- Requirements specifying “in the process of being registered” in the DOAJ or OpenDOAR would not be workable in practice. The sources do not implement such a process, and handling rejections may prove complicated. Our discussions suggested that these requirements were anticipating a surge in demand, so might better be addressed by ensuring the data sources have sufficient interim resources to manage demand.
- Requirements stating “at no additional cost” do not specify the baseline against which the cost is calculated.
- Formats for metadata are not specified for several requirements where things like PIDs, “quality metadata” and “machine readable metadata” are mentioned. The data specification therefore simply flags absence or presence of metadata. However, without standards in place such metadata may offer limited value. We suggest that priority should be given to specifying a limited taxonomy for license information embedded in articles.
Given the tight implementation timescales desired by cOAlition S, we ratify the Compliance Task Force’s approach of nominating a few key data sources with a view to scaling them.
- The approach of multiple whitelists is an efficient way to analyse the key publication venues. If a publication is present in a whitelist, and passes various checks, it can be deemed compliant. Its absence implies non-compliance, without need for further data.
- Curation of the data is delegated to the whitelist operator, with cOAlition S trusting the operator’s judgement.
- We recommend that cOAlition S quickly clarifies its policies and priorities with whitelist operators, and works with them to make resources available to cover any gaps.
- In order to balance rigour against prohibitive costs, we recommend a mix of proactive publisher deposition of compliance data into the whitelist(s), complemented by random spot checking by the whitelist operator to verify accuracy.
- We recommend running a focus group involving whitelist operators and publishers to clarify the best balance between voluntary or mandated deposition of compliance data, responsiveness and rigour of data validation. The results could be used to set expectations and foster understanding between all stakeholders.
- We recommend that cOAlition S produces a draft timeline for phasing in requirements that are not prioritised for the tool’s initial launch, so all stakeholders have clear expectations and can make appropriate plans.
- We have focused on the data specification here. However, data ownership and governance must be considered. For each route we suggest that it is important that one source only is deemed to have authority and offer a “single version of the truth” allowing for unambiguous compliance assessment. In principle, the requirements for open licenses mean that any data collected could be transferred to alternative providers in the future.
- We recommend that cOAlition S decides on the following before inviting tenders for the compliance checking tool: which mandatory requirements are needed for launch; policy details about mandating compliance data deposition (or not) and data verification; rules for multi-author papers and handling policy exceptions.
- We anticipate the following details would be handled by the tool’s developer: the process for escalating and resolving questions about the data; details of engagement with data providers, end users and publishers (if applicable); data update frequency and processes; specific metadata taxonomies.
The key data sources (whitelists) can be analysed by Plan S compliance route. cOAlition S has made clear its need for speed of implementation, so we have prioritised the most mature data sources in our assessment. Timing is already tight for 2020 implementation, so we also recommend that cOAlition S quickly agrees budgets and expectations with the key curated sources (e.g. DOAJ, Sherpa, ESAC), so they can proceed with any necessary implementation.
- The DOAJ is the clear choice as a whitelist for fully OA journals. It is mature and robust, and the team are already working on plans to add details for Plan S. Further analysis is needed to estimate an anticipated spike in registrations, and agree how this is best addressed.
- Sherpa (RoMEO and OpenDOAR) is the clear choice for whitelists for the Subscription/Repository route. (Other sources exist, but have significantly less coverage.) cOAlition S would need work with Jisc to agree priorities, address issues of perceived unresponsiveness, and make relevant data available under CC0 licences. (Note that very few of the data requirements for repositories are currently tracked by anyone. Data about Repositories was de-prioritised during the course of this project.)
- A centralised database of Transformative Agreements (TAs) needs to be built, to map agreements to institutions and individual journals. Note the difference between curation and collation. We discussed that individual consortia should curate their own agreements with their suppliers, and be responsible for ensure up to date accurate lists of applicable journals. (So, in essence, each consortium maintains its own whitelist.) A central database would then collate the locally-curated data into a central resource. ESAC currently tracks only data for the agreements as a whole. A “Plan S compliant” indicator is not currently implemented. A database to resolve to the individual journal level for each TA and institution would require significant extra work. cOAlition S would need to work with a provider (e.g. ESAC, or the Netherland’s SURFmarket) to specify the work needed, and agree resourcing.
- Likewise, a centralised database of Transformative Journals (TJs) needs to be built. We discussed that cOAlition S might curate an approved list. Adding an indicator per-journal to RoMEO might be a logical starting point from which to collate the results. cOAlition S would need to work with Jisc (or other 3rd parties) to specify the work needed, and agree resourcing.