Saltar a PREreview

Structured PREreview of Leveraging metrics to drive data sharing at the Science journals

Publicada
DOI
10.5281/zenodo.20560857
Licencia
CC BY 4.0
¿Cómo calificarías la calidad de este conjunto de datos?
Regular
This review is the result of a virtual, collaborative live review discussion organized and hosted by PREreview and Future of Research Communication and e-Scholarship (FORCE11). The discussion was joined by 11 people in total, including: two facilitators, five review authors, two discussion participants (Ava Chan and Jennifer Miller), and three listeners (one facilitator is also a review author). The authors of this review have dedicated additional asynchronous time over the course of two weeks to help compose this final report using the notes from the Live Review. We thank all participants who contributed to the discussion and made it possible for us to provide feedback on this dataset. This review represents the views of the authors and not those of Future of Research Communication and e-Scholarship (FORCE11). This dataset provides article-level data for 2680 Science papers published between 2021 and 2024. It includes article metadata, indicators related to data and code generation/sharing, and information about preprint posting. The Dryad deposit also includes summary statistics from two other publishers for comparative purposes. Overall, we found the dataset well-structured, reasonably well-documented, and shared in a non-proprietary CSV format. The inclusion of article-level records and comparative summary statistics increases its potential value for metascience and open science research. However, we also identified several concerns, including limited transparency surrounding the workflow used to generate key variables, potential biases and interpretive limitations in the dataset, and possible inconsistencies across publication cohorts. These issues are discussed further below.
¿Este conjunto de datos sigue los principios FAIR y CARE?
Parcialmente
The dataset largely aligns with the FAIR principles. It is findable through a persistent DOI and a well-established repository, openly accessible in reusable CSV and Markdown formats, and accompanied by structured documentation. Interoperability could be improved by using standardized identifiers and controlled vocabularies, such as ISO country codes, OpenAlex concept URIs, and repository identifiers from registries like re3data. The CARE principles are not directly applicable because this dataset does not involve Indigenous or community-owned data. Nonetheless, ethical considerations remain important. The dataset is heavily skewed toward articles with first authors from the US and other developed countries (First_Author_Country variable). The documentation does not discuss how differences in infrastructure, policy environments, or resourcing may shape open science practices across regions, particularly in the Global South. A short guidance note acknowledging these limitations and cautioning against simplistic country-level comparisons would improve the dataset’s responsible reuse.
¿El conjunto de datos tiene suficientes metadatos?
The dataset includes relatively strong metadata and documentation. The README explains the file structure, provides clear variable-by-variable definitions, and describes the general approach used to identify data- and code-sharing practices. This level of documentation makes the dataset easier to understand and reuse than many comparable deposits. The metadata could be further strengthened by providing (1) a more detailed explanation of how the `Data_Generated` variable was constructed, particularly given the large shift in value between 2021/2022 (Yes accounting 74.9% and 78.3%) and 2023/2024 (Yes accounting for 96.2% and 97.4%); (2) a machine-readable schema or data dictionary (e.g., Frictionless, CSVW, or JSON-LD); (3) documentation or citation of the specific DataSeer algorithm used; and (4) provenance information for the OpenAlex snapshot, as OpenAlex classification may change overtime.
Does this dataset include a way to list or track changes or versions? If so, does it seem accurate?
No
While Dryad supports deposit-level versioning and records upload timestamps, the dataset itself does not include explicit version tracking information. As the underlying DataSeer algorithm may have changed during the study period, we recommend including a changelog, top-level version/date statement in the README, and/or documentation of the DataSeer pipeline version and processing date associated with each cohort.
Does this dataset show signs of alteration beyond instances of likely human error, such as censorship, deletion, or redaction, that are not accounted for otherwise?
No
The dataset does not show any obvious signs of alteration or human error. There is one acknowledged alteration: the DAS column has been replaced by a constant string (“Data and materials availability”) to remove personal emails. This is openly disclosed in the README. More broadly on the dataset, however, there is a lack of transparency in the workflow and it is not easily possible to reproduce the original list of articles. Providing a search strategy and inclusions and exclusions at each stage for the corpus of research articles published in Science between 2021 and 2024 would have been a useful addition.
Is the dataset well-suited to support its stated research purpose?
Parcialmente
This dataset is used to support a blog post. The dataset provides sufficient information for the type of analysis presented in the blog post, albeit the degree of documentation would not support its use for a full scientific study. For example, there are some observable shifts in year-on-year trends that may represent confounding variables, and for a full study, explanation of these would be required.
Does this dataset support the researcher’s stated conclusions?
Parcialmente
The headline figures (69% sharing data overall for articles in Science, 56% in a repository, 41% of code-generating papers sharing code) are recomputable from the dataset and appear supported. However, there are some presentational and methodological issues that mean the headlines give some appearance of being chosen to present a particularly optimistic view rather than one that is strictly neutral. First, the abstract's framing of data-sharing is selective. The pairing "69% shared … (6% did not generate or share data)" leaves out the 25.3% of the corpus (679 papers) that generated data but did not share it. This is arguably the most policy-relevant group. The statement that "6% did not generate or share data" is technically correct, but the word "or" is doing a lot of work: overall, 31% of articles did not share data, and among studies that generated data, 679/(679+1638) = 29% withheld it entirely. Second, there is also some asymmetry in the reporting in the blog post. For data, sharing rates are reported against the full corpus; for code, only against papers that generated code. The "Data and Code Sharing" graph grouped by publisher could easily give the false impression that all bars share the same denominator, but they do not. Second, cross-publisher comparisons are potentially misleading. The T&F "sharing data overall" figure appears to be a repository-sharing metric rather than an overall sharing metric, making the side-by-side "69% / 74% / 24%" framing an unfair comparison. Also, the three datasets span different timeframes: Science 2021–2024, PLOS 2018–2025, and T&F 2020–2023 (82.8% from 2023 alone). Restricting PLOS to 2021–2024 already shifts repository sharing from 26% to 29% and code sharing from 29% to 33%. No validation between pipelines is presented, and this makes it difficult to tell whether the stated conclusions are as even-handed as they could be. More discussion of these limitations would have benefitted the transparency of the work.
Is the dataset granular enough to be a reliable standard of measurement?
Parcialmente
The dataset is granular at the level of individual articles, rather than being aggregated in any way. This is sufficiently granular for general-level scientometric analyses. However, for more detailed analyses, especially those intended to support a research article, additional detail may have been needed, particularly in terms of variable categories per article (where the granularity is more limited).
¿Está el conjunto de datos relativamente libre de errores?
The dataset appears largely error-free, though not without a few loose threads. Core structural checks performed well: DOIs were unique and correctly formatted, required Yes/No variables used consistent coding, publication years were complete and plausible, and no stray whitespace issues appeared in key categorical fields. Some gaps remain. A small number of rows were missing publication dates, field classifications, or first-author country information. The README also contains minor typographic inconsistencies. More notably, the shift in the Data_Generated instrument between 2022 and 2023 introduces a measurement inconsistency across the corpus. That does not look like a conventional coding error, but it weakens comparability. No quantified estimate of NLP labelling error was provided either. A few internal logic checks also raise questions. A number of records tagged as “Online” for data or code location lacked corresponding URLs, repository details, accessions, or DOIs. One article (10.1126/science.abg9868) included an NCBI accession and repository entry while being labelled only as “Suppl Material.” These do not necessarily indicate errors, though they suggest incomplete linkage between metadata fields and should be clarified. Overall, the dataset is reasonably clean and internally consistent, but only partly free from error.
¿Es probable que este conjunto de datos interese a los investigadores de su campo de estudio, a la mayoría de los investigadores o al público en general? ¿Qué tan relevante les parecerá a esas audiencias?
Algo relevante
The dataset is likely to be somewhat consequential within a specialised but active research community. It speaks directly to ongoing debates around open science practice, research data management, transparency, and policy compliance. Researchers in metascience, scholarly communication, library science, and research integrity would probably find it useful, particularly alongside publishers, funders, and policy organisations interested in monitoring data-sharing behaviour. Its strongest value lies in comparative and longitudinal analysis. The dataset creates room to examine how sharing practices shift over time, differ across publishers, or respond to policy interventions. That said, the current time span and limited explanation of policy breakpoints reduce the depth of those analyses. A longer series and clearer contextual framing would make the dataset more consequential. The broader public is unlikely to engage with the dataset directly. Its appeal is technical and domain-specific rather than general interest. Even so, for researchers working on open science and reproducibility, this is the kind of infrastructure-oriented evidence base that carries practical and policy relevance.
¿Este conjunto de datos está listo para ser compartido?
The dataset appears ready to be shared. It has already been deposited in Dryad with a DOI, and the files are available in open, non-proprietary formats that support accessibility and reuse. Obvious privacy concerns were also handled appropriately, including the removal of personal email addresses from the data availability statements. That said, “ready to share” and “fully prepared for seamless downstream reuse” are not quite the same thing. Some documentation inconsistencies, missing metadata fields, and unresolved questions around internal logic still limit frictionless reuse at scale. The dataset is suitable for public release and scholarly use, but users would still need to approach parts of it with caution and interpretive care.
What else, if anything, would it be helpful for the researcher to include with this dataset to make it easier to find, understand and reuse in ethical and responsible ways?
More transparency around the DataSeer pipeline is also needed. The dataset currently functions somewhat like a black box because there is no published information on model versions, processing dates, or validation against a human-coded sample. Precision and recall estimates by indicator would help users judge reliability. The apparent discontinuity between cohorts processed under different classifiers should also be addressed, either by reprocessing earlier cohorts or by publishing a stability analysis across time periods. Cross-publisher comparisons need tighter contextual framing as well. A concise table explaining how “sharing” was defined across Science, PLOS, and Taylor & Francis would reduce the risk of overinterpreting differences that may stem from incompatible definitions rather than behaviour. Standardised identifiers would improve interoperability. ISO country codes, OpenAlex concept IDs, and recognised repository identifiers such as re3data or FAIRsharing would all strengthen reuse potential. A brief ethical guidance note would also be worthwhile, particularly cautioning against simplistic country-level rankings without accounting for infrastructure, policy context, or disciplinary norms.

Conflicto de intereses

The authors declare that they have no competing interests.