Comentários
Escrever um comentárioNenhum comentário foi publicado ainda.
The study represents a significant effort toward evaluating open bibliometric data's potential. It compares open metadata from hoaddata (derived from Crossref and OpenAlex) with proprietary bibliometric databases from Scopus and Web of Science to examine the effect of transformative agreements on hybrid open access. Findings from this article suggest that between 2019 and 2023, the time period being studied, transformative agreement accounted for a majority of open access articles in hybrid journals, driven by a few large commercial publishers and European nations. Major findings remained consistent across the three investigated data sources, demonstrating the potential use of open metadata for hybrid open access bibliometric analysis. While the study’s comparative methodology is commendable, several weaknesses hinder the conclusions’ robustness. Improved clarity concerning data handling methods, more rigorous statistical analyses, further exploration of drivers of observed differences and correlations, along with reduced emphasis on correlations between datasets would significantly increase the strength and validity of the conclusions.
The comparative methodology—using hoaddata alongside Scopus and Web of Science—is a significant strength. It directly addresses the reliability of using open metadata for large-scale OA studies. The use of multiple authorship attribution methods (first vs. corresponding author) further strengthens the approach. However, the complexity of the data processing, reliance on proxy measures (lack of direct invoicing data for TAs), and the limitations of the data sources themselves (inaccuracies in OpenAlex, Unpaywall parsing errors) remain significant weaknesses. The study acknowledges these, but a more thorough sensitivity analysis to assess the influence of these imperfections on final results is missing.
The study’s strength is the large-scale observational design. A substantial dataset (over 13,000 hybrid journals) is used, providing a statistically robust foundation. However, a weakness arises from reliance on passively collected data and limited capacity to explore causality regarding impact from transformative agreements versus factors inherent to journals already predisposed toward OA publishing, e.g., existing patterns within certain journals prior to agreements.
The conclusion regarding the usefulness of open metadata sources, when combined with proprietary bibliometric data sources for improved analysis, is well-supported by the findings. However, the limitations need further emphasis; concluding that open metadata sources alone can suffice is overstated. The nuanced conclusion acknowledging uncertainties, and proposing suggestions based on limitations in their results, would create stronger confidence in their recommendations. The conclusions could improve further by incorporating potential explanations of causal relationships driving correlation, beyond the broad correlation summaries already contained. It would also be helpful to discuss the kind of studies and conclusions that could be considered reliably from open data, and those where the limitations might still be a hindrance (for example, summarising across regions or countries could be done reliably, but not enough precision for the institutional level).
The footnote to the Scopus blog is a broken link (https://blog.scopus.com/posts/scopus-filters-for-open-access-type-and-green-oa-full-text-access-option)
“By April 2025, the ESAC Transformative Agreement Registry…” It might be helpful to detail a few examples and walk newcomers through the nuances for social sciences, life sciences, math, CS, etc.
Please add the URL at the first menion of hoaddata in the Data and Methods section, although the data sources section is very useful.
Figure 1 is very helpful!
“13,000 hybrid journals” – can you comment more on which fields were covered?
“Scopus defined hybrid open access consistently as content available under CC licenses” – This is debatable: not everything that's CC is open access (some would dispute -ND licenses, for example) and there are many other open licenses. I suspect, on balance, that this leads to under-measurement of OA content.
“random sampling of 50 pairs revealed an error rate of 22% for Web of Science (11 mismatches) and 6% for Scopus (three mismatches).” – Do these estimates of accuracy need some error bars/CIs?
“UpSet graph” – Please clarify this term and teach me how to read the x-axis.
The ordering of the panels in Figure 2 and 3 is a little unconventional and confusing. Consider moving the hoaddata, Scopus, and Web of Science matrix to the top, and ordering the panels A, B, C top to bottom.
Figure 6: y-axis labels of the scatterplots are confusing. Consider using the same labeling on the Huggingface dashboard (https://huggingface.co/spaces/najkoja/hoa_replication), which is a lot more intuitive.
“Open questions remain as to whether this uneven distribution reflects temporary implementation gaps, inherent inequities in the transformative agreement model, or deliberate avoidance of such agreements.” – This is a great point, and an opportunity for further studies
“OpenAlex's native ROR-ID integration offers a distinct advantage” – It's worth noting that this is dependent on matching of the metadata, which can introduce errors, rather than getting the ROR IDs directly from source. Very few publishers provide ROR identifiers for author institutions to Crossref at present.
Paragraph beginning: “The database comparison revealed…” – Useful section explaining limitations, however there's something the authors don’t discuss here: confusion about the definition of open access is and whether a given license is open or not. There isn't consensus in the community or a registry of open licenses. This is likely to lead to inconsistency and errors.
“initiatives in support of negotiations with publishers, particularly through the ESAC initiative and Barcelona Declaration on Open Research Information, the situation is likely to improve” – Is this likely to have the largest effect in Europe? How does the author think it will transfer to other regions?
Martyn Rittman is an employee of Crossref.
Nenhum comentário foi publicado ainda.