Revisión en vivo

PREreview del Comparing the outputs of intramural and extramural grants funded by National Institutes of Health

por Teena Bajaj, Rosario Rogel-Salazar, Shailee Rasania, Prasakthi Venkatesan, Morlai Sesay, Sayan Mitra y Uday Kumar Chalwadi

Publicado: 15 de octubre de 2025
DOI: 10.5281/zenodo.17353888
Licencia: CC BY 4.0

This study fills a significant gap in the literature to date by conducting a comprehensive, side-by-side comparison of products arising from intramural and extramurally funded NIH research projects. The authors constructed a vast dataset consisting of 98,648 projects (97,054 extramural and 1,594 intramural) from 2009 through 2019, correlating with more than 621,000 publications. Through the application of modern bibliometric metrics, such as, -i.e., publication number, Relative Citation Ratio (RCR), and clinical translation indices (APT, clinical citations),-the study sought to determine which funding modality has a greater influence and cost-effectiveness in pursuing the NIH mission. One notable finding was that while extramural research showed greater cost-effectiveness in producing academic publications and citations, intramural research showed greater efficiency in producing clinically relevant outputs. This conclusion is significant given the fact that intramural projects were not overwhelmingly focused on human research; they still showed a stronger association with clinical outcomes. One of the main virtues of the study is the enormous cale of the dataset —(98,648 projects linked to over 621,000 publications),— which makes the findings robust and influential. The use of updated bibliometric measures also adds depth and credibility to the analysis. The study’s implementation of robust, contemporary metrics presents a valuable framework with potential applications for future research impact evaluations.

However, the 10-year window (2009-2019), limited by intramural data availability, might not be sufficient for truly measuring long-term impact. The reliance on simplistic classification scheme based only on activity codes may introduce errors, while excluding jointly funded publications could underestimate the impact of collaborative work. Additionally, the wide confidence intervals in the data leave room for uncertainty. Nevertheless, by showing how different types of NIH funding shape not just scientific output but clinical relevance, the study adds valuable nuance to a debate that has surprisingly received little attention in recent decades.

major concerns and feedback:

This description of the propensity score matching approach does not include enough detail to aid in reproducibility. Crucial information is missing, such as the actual algorithm used to match (e.g., nearest neighbor or optimal matching), the method followed with respect to replacement while matching, the distance applied to the caliper, and the plan followed for handling multiple match or sparse match cases.
The authors are advised to describe the matching algorithm used predominantly, to define the caliper width (e.g., 0.2 of the standard deviation of the logit of the reproducibility score), and to define the matching ratio (e.g., 1:1 or 1:k). They should also describe the process followed to handle ties and report whether there were any projects dropped because of an insufficiency of appropriate matches.
The handling of propensity score matching is insufficiently described. For example, the authors have not clarified how multiple matches were dealt with, how data was managed when matches were unavailable, or how criteria were prioritized (tier 1 mandatory vs. tier 2 flexible). Similarly, the statistical power calculation for a 10-year dataset should be better justified. Regression methods, while mentioned, remain unclear and require further explanation
It is advisable to include the matching ratio used (e.g., 1:1 or 1:k) and explain whether matching was done with or without replacement, as this detail is critical for reproducibility.
The choice of a 10-year window (2009-2019) is presented as a data availability constraint rather than a methodological choice. This short timeframe is a significant limitation for assessing "long-term impact," particularly for clinical translation, which can take decades. The conclusions about impact and cost-effectiveness may be premature.
It is advisable to the authors should more strongly frame this as a key limitation in the discussion. They could also conduct a sensitivity analysis on a subset of projects from the earliest years (e.g., 2009-2011) to see if impact metrics differ for older projects, providing some insight into time-based trends.
It is advisable to emphasize in the Discussion section that the restricted 10-year window represents a major limitation for assessing long-term impact, and that this constraint may influence interpretation of the findings.
The analysis spans only a 10-year window (2009–2019), primarily because intramural data was only available from 2008. This may be too limited to truly capture long-term impact, especially given the long gestation period of clinical research outcomes. The authors should justify why this timeframe is sufficient or discuss how it constrains conclusions.

The claim that "intramural work is more aligned with NIH’s mission" is an interpretive leap not fully supported by the data presented. The study shows intramural research is more clinically oriented, but the NIH mission is broad, encompassing fundamental discovery, training, and public health. This overstates the findings.
The suggestion is that the language should be moderated to precisely reflect the results. For example: "Our findings suggest that the intramural program demonstrates a particular strength in producing clinically relevant outputs, which is one critical component of the NIH's broader mission."
It is advisable to include the moderated rephrasing example here directly, e.g., “Our findings suggest that the intramural program demonstrates a particular strength in producing clinically relevant outputs, which is one critical component of the NIH’s broader mission.” This helps make the feedback more actionable.
The claim that intramural research is more aligned with NIH’s mission feels overstated. Given the smaller number of intramural projects and their wide confidence intervals, the evidence does not convincingly support such a broad assertion. A more cautious interpretation would strengthen the paper’s credibility.

The exclusion of collaboratively funded publications is a major methodological decision that may systematically bias the results. It could disproportionately undervalue the contribution of one funding mechanism (likely intramural, which may rely more on collaborations) and misrepresent the collaborative nature of modern science.
The suggestion is that while re-inclusion may be complex, the authors must discuss this limitation more thoroughly, explicitly stating how this exclusion might have skewed the comparisons of output and impact between the two funding types.
The excluding jointly funded outputs might disproportionately affect intramural projects and should be discussed as a potential source of bias in the limitations section. Consider adding a brief sensitivity analysis including these publications if data permit.

minor concerns and feedback:

The research's specific aims are badly phrased in the last section of the introduction. We suggest the authors state a definite sentence or a short paragraph enumerating systematically the main and secondary aims of the research (e.g., "The aims of the study are: 1) To compare the bibliometric output. 2) To assess the cost-effectiveness.)
The regression approaches, as stated, are not discussed with enough clarity across a general profile of people. The authors should include in the methods section a brief, accessible description of the objective of each regression model applied, with corresponding technical statistical definitions.
There are certain inaccuracies and potential for greater clarity, such as an incompatibility in color representation in a figure legend (green/red compared with green/blue) and unclear axis labels. The authors should: 1). Modify the Figure 2 legend so that it exactly matches the colors used. 2). Resize the Y-axis title of Figure 1 to something more descriptive, such as "Intramural: Extramural Proportion Ratio," with an explanation that a value >1 indicates an intramural focus and <1 means extramural focus. Figure 2’s caption incorrectly refers to green and red when the figure uses green and blue. In Figure 1, the Y-axis label could be simplified—for example, “Intramural:Extramural Proportion Ratio” would make the metric immediately interpretable (>1 intramural focus, <1 extramural focus)
The manuscript is brief and could be enhanced through greater clarification of the summary of results, interpretation in the framework of what has been written before, and limitations and strengths. The authors should redesign the text with clear subheadings (e.g., "Main Results," "Comparison with Previous Studies," "Advantages and Disadvantages," "Implications") to increase readability and ensure each section attains adequate prominence.
While ethically low-risk, the manuscript does lack an official statement declaring that ethical approval was not required. The authors should include a very short sentence in the methods section: "This study drew only on publicly available, aggregated data concerning research grants and publications, and conducted no research involving human subjects; accordingly, ethical approval was not necessary.". Although the study uses only retrospective, publicly available data, an explicit statement clarifying that ethical approval is not applicable would be useful.
The much smaller number of intramural projects means that their results are inherently less precise. This limitation should be emphasized more directly, along with the risk of overinterpreting findings. Reiterate that this limitation should be highlighted within the Discussion to avoid overinterpreting the relative performance of intramural projects.
The exclusion of jointly funded publications could underestimate collaboration and impact, particularly for intramural research. This limitation is acknowledged but deserves more explicit discussion in terms of how it biases the results.
The limitations are listed but not fully unpacked. The authors could strengthen this section by explicitly addressing how each limitation (e.g., exclusion of collaborations, smaller intramural sample) impacts robustness and by clarifying what steps were taken to reduce potential bias.
The Discussion is concise but too compressed. Clearly sSeparating the strengths from the limitations, adding short summaries after each results section, and slightly expanding the implications would make the manuscript more easier to read and interpret.

Competing interests

Rosario Rogel-Salazar was a facilitator of this call and one of the organizers

Use of Artificial Intelligence (AI)

The authors declare that they did not use generative AI to come up with new ideas for their review.

Adenda

Teena Bajaj was also a facilitator of this call and one of the organizers.

Comentarios

Escribir un comentario

No se han publicado comentarios aún.