PREreview del Zero-shot design of a de novo metalloenzyme
- Publicado
- DOI
- 10.5281/zenodo.20318681
- Licencia
- CC BY 4.0
This manuscript presents dEVA, a novel multi-objective framework that combines contemporary deep learning tools with a genetic algorithm to iteratively optimize physicochemical features of designed proteins towards a user-defined goal. The authors demonstrate the capabilities of their approach by designing de novo metalloproteins and metalloenzymes without reliance on predefined motifs or evolutionary information. These de novo metalloenzymes were then validated and shown to have catalytically active zinc ions that function comparably to naturally occurring zinc-based hydrolases. In theory, dEVA can be extended to other protein design challenges by simply adapting the objective functions to align with the target protein's function. However, deep learning-based objective functions, such as those used in this manuscript, often depend on the existence of diverse, expertly annotated datasets, which may not be available for all design problems. This may limit the generalizability of this method to arbitrary protein design challenges, such as the design of de novo enzymes with new-to-nature activities.
Major points:
Given that dEVA is a primary contribution of this work, the main text would benefit from a more in-depth discussion of the dEVA algorithm. Considering moving Figure S1 to the main text or creating another figure that captures the more fine-grained details of the dEVA framework would be a valuable addition to enhance reader comprehension.
Further clarity on the order in which LigandMPNN/Caliby and Metal3D/Metal3D-Cat tools were run during dEVA optimization, and clarifying when tools were used to design sequences/place metal ions vs score designs is necessary.
For example, the supplementary material states that “4 mutations were introduced in each sequence,” following the mixing of parent sequences via fragment crossover. However, the main text says that “[a]t each iteration of the dEVA design protocol, LigandMPNN proposes mutations.” This is confusing because sampling new sequences/mutations with LigandMPNN differs from random mutations.
If the LigandMPNN/Caliby score is computed before updating the location of the metal ion with Metal3D/Metal3D-Cat, the score might be different, possibly influencing how the top-performing candidates evolve over time.
Early on in metalloprotein design, the authors state that they chose to generate a library with Protpardelle with non-all-helical structures. Why were all-helical structures avoided in the initial library screen? The rationale behind this decision was not clear, and a brief discussion explaining this choice would help clarify some of the initial protein design.
The authors showed the distribution of enzymatic reactions used to train Metal3D-Cat in Figure 3I, which were fairly diverse. The use of this dataset created the logical issue of not knowing which enzymatic reaction would be performed by the resulting de novo metalloenzyme. Since the author's intended goal was to generate a structure with a defined function, would there be a way to train Metal3D-cat to produce enzymes targeted towards a single specified reaction rather than a variety of potential reactions? If so, why was this not attempted?
The authors explain that there is a significant portion of enzymes with bi- and tri-nuclear active sites, while there are hydrolytic enzymes that function with just one zinc. They fail to substantiate the rationale for choosing to do the seemingly more complicated task of generating a metalloenzyme with two zinc atoms. Why was the metalloenzyme designed with two coordinated zinc atoms instead of one?
The dEVA protocol and results are impressive and significant, however they biased their model by filtering for “nature-inspired scaffolds” (TIM Barrels) that replicate real-world enzymes. They generate a coordination site that mimics known metallohydrolases, but with unique residues. We agree this is very different from the active-site "transplantation” approaches of earlier enzyme design and the “diffuse around fixed known active site residues” of more modern approaches. The text leans heavily on the non-nature-inspired aspect in a way that seems at odds with the way the design and filtering are geared towards “nature-inspired” scaffolds and geometries. More nuanced discussion in the context of enzyme design (both de novo and nature-inspired) could reconcile this.
We were confused by the organization of Figure 3. Panels A–D included validation of the negative control, panels E–F interrogated how the Metal3D training data and development of Metal3D-Clean, and panels H–J focused on the enzymes present in MAHOMES II. Altogether, these panels are a bit overwhelming. The figure could benefit from being split into two figures. The first might describe the negative control for probing the edges of the pareto front. The second could tie together approaches used to clean Metal3D training data and the subsequent training processes for Metal3D-Clean and Metal3D-Cat.
It is well known that ProteinMPNN and LigandMPNN have biases towards particular residue types. Were known biases, such as LigandMPNN’s propensity for acidic residues, considered when selecting objective functions to be optimized by dEVA? Perhaps the propensity of LigandMPNN to predict acidic residues could give rise to naturally emerging metal-binding sites that could improve Metal3D’s confidence.
We thoroughly enjoyed seeing how dEVA enabled the design of de novo metalloproteins and metalloenzymes. Given this tool’s potential for generalizability, what other design challenges could this be applied to, and what would the associated tools and objective functions be for these design challenges? This would be a nice addition to the discussion section.
The authors focus on zinc-based metalloenzymes. A discussion of why zinc is an ideal metal for this application and how it compares to other metals in enzymes would help orient non-enzymologists to the overall narrative.
Minor Points:
In Figure 1C, consider highlighting the residues interacting with the metal with a different color. Otherwise, they are a bit difficult to pick out.
Multiple acronyms are used without being fully written out or explained (i.e. mRMSD in Figure 2A, A335/A365 in Figure 2C, MRE in Figure 2D, E.C. in Figure 3I, etc.).
It was unclear what native protein truly contained the most similar metal-binding site to DesH2C2. In Figure 2A, the blow-out comparison shows the metal-binding site of 6VTX_A as the nearest neighbor, while Figure S5 indicates that the active 3SXX is the nearest neighbor. This discrepancy is confusing, and clarification on why these figures compare the metal-binding site of DesH2C2 to two different “nearest neighbors” should be addressed.
References for Figure 2 in the text are misaligned with the data. Thermal stability in the presence of zinc is attributed to 2C, even though the data are actually in 2D. Likewise, the Kd calculation is attributed to 2B when the data and competition assay are in 2C.
Figure 3E lacks a legend for the two bar colors.
The Figure S4 caption includes “a) AF2 structures (orange) predicted in single-sequence model.” However, there is no section “a” or orange structures in the panel.
Competing interests
The authors declare that they have no competing interests.
Use of Artificial Intelligence (AI)
The authors declare that they did not use generative AI to come up with new ideas for their review.