PREreview of Towards pharmacokinetic profile predictions for monoclonal antibodies using sequence based machine learning derived parameters and compartmental modeling

by Andoni I Asencor, Jonah Frazier, James Fraser, and Brian Cheung

Published: May 23, 2026
DOI: 10.5281/zenodo.20357886
License: CC BY 4.0

Jost and Cordes introduce a physics-informed neural network (PINN) to predict the pharmacokinetic profiling of monoclonal antibodies (mAbs) based purely on their amino acid sequences. The novelty of this PINN, as noted by the authors, is in its ability to perform in silico prediction of the PK of therapeutic mAbs prior to antibody production and subsequent in vitro and in vivo validation. There is a pressing need in the field of mAb-based pharmacotherapeutics to reduce in vivo reliance on scientific validations, as highlighted by recent FDA priorities. Advances in this area are poised to decrease the financial (monetary expenses incurred by in vivo mouse experiments) and scientific burden (time spent) in devising such treatments. While the authors have presented an interesting method to address this need, we have concerns about the generalizability of this PINN across different antibodies, the test/train/validation split, and questions regarding the architecture of the PINN.

We have the following major comments:

The authors state, “A comprehensive PK dataset was assembled, comprising 118 conventional monoclonal antibodies […] evaluated during preclinical investigations conducted at Sanofi.” The preprint would likely benefit from more details about the provenance of these mAbs (targets antigens, methods of selection, CDR diversity relative to each other/marketed mAbs, etc). We are especially interested in whether these 118 mAbs are representative of marketed (sequence unblinded) mAbs and whether predictions of PK can be further generalized to publicly available data outside of this test set. We assume the test set contains proprietary sequences and therefore this generalization is an important test.
- In particular, it is unclear whether these 118 mAbs all target a singular antigen, 118 unique antigens, or a selected array of antigens. If the scope is quite restricted to particular antigens, we have concerns about the generalizability of this PINN to antigen targets, which would likely be addressed by the inclusion of some on-the-market mAbs.

We wonder why the authors chose Catpred, since this package was developed for enzyme sequences and not mAbs, with a general protein-associated pLM. The authors go on to address this in the discussion and say it might be better to use the Mazrooei et al., mAb-trained pLM. This prompts us to ask for further clarification from the authors regarding their choice of pLM. Additionally, it is unclear why the authors employed sequence attention features on top of the pLM. It might be understood that the sequence attention features applied to the antibodies give them more weight when combined with the general protein pLM context. However, it was confusing if this is indeed the case, or if this was redundant when considering the pLM should do many of the same things on its own. The value could be shown with an ablation test.

In the discussion, the authors rightfully highlight the exciting importance of and future application of this work in “identifying sequence fragments that exert significant influence on PK behavior across modalities”. However, they could have initiated exactly this by describing the sequence differences in the three mAbs that significantly failed CL prediction. Yet, there is no commentary or hypothesis on why those three in particular escaped prediction. If there are IP issues, possibly applying as well to comment 1, more generalized commentary could have been made.

There are inconsistencies in the data presentation between Figures 2 and 3. In Figure 2, circles are used for “Training,” triangles for “Validation,” and squares for “Test.” In Figure 3, triangles are used for “Training,” squares for “Validation,” and circles for “Test.” These significant discrepancies can lead to unintentional misinterpretation of their findings.

We have the following minor comments:

The figures in the paper have a descriptive title, but could benefit from an elaborated caption which would better guide the reader through what is being shown beyond the scope of the short statements written in the body of the paper.
- The figure legends could benefit from a larger font size to make it easier to interpret.
- The title for Figure 2 says “Figure 1.”
- In Figure 3, some of the outlier data points go beyond the final axes markings in both plots

The paper’s verbal description of linear PK vs dosage-dependent, TMDD-initiated nonlinear PK might be confusing to some readers. A small graph comparing concentration vs time of small molecule linearity contrasting mAb trajectory could be very helpful.

In the introduction, the authors state “In contrast, in vivo testing is no high throughput screening such that in vitro / in vivo correlation (IVIVC) efforts together with in silico approaches have been developed to make in vivo CL predictions based on in vitro assay outcomes.” After reviewing the cited reference, we believe “no high throughput” is a confusing typo, and the “no” should be “not”.

In Figure 2, we are unclear about whether the test, train, and validation splits graphed include numerous concentration time points for each respective biologic. If the train, validation, test splits are split into disjoint sets of mAb sequences, that would illustrate a strong form of generalization, supporting the capabilities of the proposed approach. Our current interpretation of Figure 2 is that the same mAb sequences are present in the train, validation, test splits and the splits are defined based on timepoints. It could also be clearer to just map the single C0 values for each of the 118 mAbs. Alternatively, the authors could describe and cleanly label which time points are being recorded if a more holistic representation is still desired, though the reducing approach is recommended.

Finally, upon careful review of this preprint, we would like to propose a potential future direction of the PINN. Admittedly, this is a non-trivial extension of the PINN and certainly goes beyond the scope of the present preprint, but we believe the authors are within reach of this significant expansion to include a third parameter in their framework.

Beyond lab-created monoclonal antibodies, we envision significant translatability of the PINN to novel autoantibody characterization, whereby isolated autoantibodies from human biofluids could be analyzed through a modified version of this method to identify what the pathogenic target might be. Though not discussed in the preprint, the targets of these antibodies are likely also known by the authors and could be included in their neural network. In this scenario, the framework could be updated to include mAb amino acid sequence and target as the known inputs, and PK profiles as the output. Then, a further test of the framework would assess if mAb amino acid sequence and PK profile could then predict the mAb target.

As the field of autoimmunity currently stands, validation of novel autoantibodies often requires target-agnostic screening, whereby patient-derived peripheral blood mononuclear cells (PBMCs) can be loaded into an optofluidic screening instrument to identify antibody-secreting B cells. Further in vitro display technologies using bacteriophage or yeast then require a library of potential peptide sequences which would ideally fine tune the target(s) of the mAbs. Cell-based overexpression assays (CBA) are used as an additional in vitro validation of the display findings. An immunostaining in murine tissue then serves as a final validation of the CBA, with the hope that there is enough homology between the human and mouse antigen targets to illustrate a clear result without off-target effects.

The PINN presented in this preprint is poised as potentially transformative as an in silico autoantibody identification pipeline. The mAb amino acid sequence and PK properties assessed in Tg32 mice could bypass the need for display technology, a CBA, and immunostaining validation in mouse tissue, and large volumes of patient biofluids. Together, this could alleviate the need for complicated and expensive diagnostic technologies while simultaneously leading to faster patient diagnoses and improved patient outcomes. We kindly invite the authors to discuss this point further in their final manuscript.

Competing interests

The authors declare that they have no competing interests.

Use of Artificial Intelligence (AI)

The authors declare that they did not use generative AI to come up with new ideas for their review.

Comments

Write a comment

No comments have been published yet.