Protein Design with StructureGPT: a Deep Learning Model for Protein Structure-to-Sequence Translation
- Posted
- Server
- bioRxiv
- DOI
- 10.1101/2024.06.03.597105
Motivation: Protein design, crucial for understanding and engineering protein functionalities, has traditionally been challenged by the reverse translation of complex protein tertiary structures into sequences. Existing computational tools have focused predominantly on sequence-to-structure predictions, with less attention given to structure-to-sequence processes. Our research introduces StructureGPT, a novel deep learning model that employs advanced natural language processing techniques to translate complex protein tertiary structures into their corresponding amino acid sequences. This model addresses critical gaps in protein engineering, particularly improving solubility and stability, which are essential for pharmaceutical development and industrial applications. Results: StructureGPT demonstrates the capability to autoregressively generate amino acid sequences from detailed structural inputs, enhancing the design of proteins with specific functionalities. By leveraging the linguistic parallels between protein structures and human language, our model not only predicts sequences with high accuracy but also suggests modifications that could lead to improved protein properties. The application of StructureGPT in multiple protein design tasks showcases its utility in various biomedical and biotechnological contexts. Availability: The source code for StructureGPT is freely available at https://github.com/StructureGPT DOI: 10.5281/zenodo.11065607