Saltar al contenido principal

Escribe una PREreview

How Does Sampling Affect the AI Prediction Accuracy of Peptides’ Physicochemical Properties?

Publicada
Servidor
bioRxiv
DOI
10.1101/2025.01.29.635451

Accurate AI prediction of peptide physicochemical properties is essential for advancing peptide-based biomedicine, biotechnology, and bioengineering. However, the performance of predictive AI models is significantly affected by the representativeness of the training data, which depends on the sample size and sampling methods employed. This study addresses the challenge of determining the optimal sample size and sampling methods to enhance the predictive accuracy and generalization capacity of AI models for estimating the aggregation propensity, hydrophilicity, and isoelectric point of tetrapeptides. Four sampling methods were evaluated: Latin Hypercube Sampling (LHS), Uniform Design Sampling (UDS), Simple Random Sampling (SRS), and Probability-Proportional-to-Size Sampling (PPS), across sample sizes ranging from 100 to 20,000. A sample size of approximately 12,000 (7.5% of the total tetrapeptide dataset) marks a key threshold for stable and consistent model performance. This study provides valuable insights into the interplay between sample size, sampling strategies, and model performance, offering a foundational framework for optimizing data collection and AI model training for the prediction of peptides’ physicochemical properties, especially for prediction in the complete sequence space of longer peptides with more than four amino acids.

Puedes escribir una PREreview de How Does Sampling Affect the AI Prediction Accuracy of Peptides’ Physicochemical Properties?. Una PREreview es una revisión de un preprint y puede variar desde unas pocas oraciones hasta un extenso informe, similar a un informe de revisión por pares organizado por una revista.

Antes de comenzar

Te pediremos que inicies sesión con tu ORCID iD. Si no tienes un iD, puedes crear uno.

¿Qué es un ORCID iD?

Un ORCID iD es un identificador único que te distingue de otros/as con tu mismo nombre o uno similar.

Comenzar ahora