PREreviews of “How Does Sampling Affect the AI Prediction Accuracy of Peptides’ Physicochemical Properties?”

Skip to preprint details Skip to PREreviews

How Does Sampling Affect the AI Prediction Accuracy of Peptides’ Physicochemical Properties?

by Meiru Yan, Ankeer Abuduhebaier, Haojin Zhou, and Jiaqi Wang

Posted: February 2, 2025
Server: bioRxiv
DOI: 10.1101/2025.01.29.635451

Abstract

Accurate AI prediction of peptide physicochemical properties is essential for advancing peptide-based biomedicine, biotechnology, and bioengineering. However, the performance of predictive AI models is significantly affected by the representativeness of the training data, which depends on the sample size and sampling methods employed. This study addresses the challenge of determining the optimal sample size and sampling methods to enhance the predictive accuracy and generalization capacity of AI models for estimating the aggregation propensity, hydrophilicity, and isoelectric point of tetrapeptides. Four sampling methods were evaluated: Latin Hypercube Sampling (LHS), Uniform Design Sampling (UDS), Simple Random Sampling (SRS), and Probability-Proportional-to-Size Sampling (PPS), across sample sizes ranging from 100 to 20,000. A sample size of approximately 12,000 (7.5% of the total tetrapeptide dataset) marks a key threshold for stable and consistent model performance. This study provides valuable insights into the interplay between sample size, sampling strategies, and model performance, offering a foundational framework for optimizing data collection and AI model training for the prediction of peptides’ physicochemical properties, especially for prediction in the complete sequence space of longer peptides with more than four amino acids.

Read the preprint

1 PREreview

Write a PREreview Request a PREreview

PREreview by harshraj bhoite

Authored by harshraj bhoite

Peer Review of "How Does Sampling Affect the AI Prediction Accuracy of Peptides’ Physicochemical Properties?"
Short Summary of the Research’s Main Findings
This paper explores how different sampling strategies and sample sizes influence the predictive accuracy of AI models for peptide…

Read the PREreview by harshraj bhoite

PREreviews of How Does Sampling Affect the AI Prediction Accuracy of Peptides’ Physicochemical Properties?

1 PREreview

Peer Review of "How Does Sampling Affect the AI Prediction Accuracy of Peptides’ Physicochemical Properties?"