Write a PREreview

Distilling Protein Language Models with Complementary Regularizers

by Edward Wijaya

Posted: February 18, 2026
Server: bioRxiv
DOI: 10.64898/2026.02.17.706304

Large autoregressive protein language models generate novel sequences de novo , but their size limits throughput and precludes rapid domain adaptation on scarce proprietary data. We distill a 738M-parameter protein language model into compact students using two protein-specific enhancements, uncertainty-aware position weighting and calibration-aware label smoothing, that individually degrade quality yet combine for substantial improvement. We trace this complementary-regularizer effect to information theory: smoothing denoises teacher distributions while weighting amplifies the cleaned signal at biologically variable positions. Students achieve up to 5× inference speedup, preserve natural amino acid distributions, and require as little as 170 MB of GPU memory, enabling deployment on consumer-grade hardware. When fine-tuned on protein families with as few as 50 sequences, students generate more family-matching sequences than the teacher, achieving higher sample efficiency and Pfam hit rates despite their smaller capacity. These results establish distilled protein language models as superior starting points for domain adaptation on scarce data.

You can write a PREreview of Distilling Protein Language Models with Complementary Regularizers. A PREreview is a review of a preprint and can vary from a few sentences to a lengthy report, similar to a journal-organized peer-review report.

Before you start

We will ask you to log in with your ORCID iD. If you don’t have an iD, you can create one.