PREreview of Targeting protein-ligand neosurfaces using a generalizable deep learning approach

by Maple N Chen and Kevin Alexander Estrada Alamo

Published: May 14, 2024
DOI: 10.5281/zenodo.11188199
License: CC BY 4.0

Peer Review: Targeting protein-ligand neosurfaces using a generalizable deep learning approach

Maple N Chen and Kevin Alexander Estrada Alamo, University of California San Francisco

Summary

In this study, the authors introduce MaSIF-neosurf, which extends the functionality of their previously published model, MaSIF. MaSIF is a neural network that can (i) identify patches that are likely to be protein-protein binding interfaces and (ii) match patches to peptide “seeds” from an extensive database that are likely to facilitate binding to the protein surface patch. It does this by reading the chemical and geometric characteristics of protein surface patches to generate fingerprints for each patch, which can be compared to other patches in a database of seeds. In this study, the authors show that by calculating the same chemical and geometric characteristics for ligand atoms on a protein-ligand “neosurface”, MaSIF can analyze neosurfaces like it can analyze surfaces composed of only peptides. MaSIF’s approach is unique because it bridges methods that rely solely on machine learning and completely physics-based methods by feeding biochemical characteristics to a neural network. In this work, the authors show that MaSIF-neosurf can predict ligand-dependent binding on 14 ternary complexes. Then, they design de novo protein binders for three neosurfaces and characterize them through binding assays, biochemical and structural studies, and split complementation assays in cell-free and mammalian cell systems.

The major success of the paper was introducing a new method for creating binders to protein-ligand complexes, an outstanding problem in protein design. We see it as particularly useful for developing molecular glues, synthetic biology, and protein design. The data presented were well-controlled and well-contextualized. The major weakness was not including yeast display data in the main text, which would help benchmark MaSIF-neosurf’s design abilities and bridge the paper’s story between design and characterization.

Major Points

A large component of this study involves the de novo design of binders to pre-existing protein-ligand complexes. The authors use MaSIF-neosurf to find seeds, or peptide fragments, to promote binding to their protein-ligand neosurfaces of interest. Then, the authors graft the seeds into protein scaffolds, creating about 2,000 designs for each of their three neosurface targets. The diversity of both seeds and grafted scaffolds is shown in Figure 2. In Figure 3, the authors characterize binding between the complex and the design for only one design per target. Although the authors describe doing yeast surface display between figures 2 and 3 to filter their candidate library from 2,000 to one, we thought it would be informative to see the yeast display data as a figure in the main text. This would inform the design pipeline success rate by seeing the number of hits per target and build the story of how the binders were chosen for characterization. Along with the yeast display data, we think it would be informative to show design models of all the designs that showed binding on yeast, even if only one design is characterized through binding affinity determination in later figures. Showing these designs would help show whether MaSIF-neosurf can create successful binder that are diverse, or whether certain topologies are easier to make into binders. One attractive option could be highlighting the designs that show binding on yeast in Figure 2C, similar to how the biochemically characterized designs are shown with a star.

After determining the binding affinities of the designed binders described above, the authors seek to improve the binders’ affinities through site-saturating mutagenesis. The results of the site-saturating mutagenesis study are depicted in Figure 4, indicating that the binders bind tighter after mutagenesis. We are curious whether MaSIF-neosurf could detect the mutations introduced by mutagenesis. Suppose MaSIF-neosurf can score the mutated binders more favorably than the original binders. In that case, it is possible that sequence sampling or perhaps diversity of the seeds is a limiting process in design. If not, then the nature of the mutations could suggest ways to improve MaSIF-neosurf. Characterizing this aspect of MaSIF-neosurf can help inform on the limiting process in designing high-affinity binders, as both a reflection of MaSIF-neosurf and a reflection of current challenges in de novo design.

Minor Points

Figure 1

In Figure 1B, the authors show that 70% of the 28 complexes could have their binding partner recovered by MaSIF-neosurf. For the ~8 partners that could not be recovered, were there similarities in structure, ligand contact surface area, ligand identity, size, or other characteristics? Analysis of the trends in recoverable vs non-recoverable binding partners would be informative in describing the limitations and successes of the model.

In Figure 1B, the authors show a plot describing the number of 28 for which MaSIF-neosurf could recover a complex’s binding partner in the top k outputs. We thought the figure axis led to confusion since it was unclear that all 28 samples were plotted at once and what “solved complexes” meant in the vertical axis label. We recommend replacing the vertical axis label with a phrase like “number of correctly identified partners identified in top-k”. We also recommend describing the figure’s visualization more in the caption or main text.

Figure 3

Figure 3D shows the results of the yeast display binding screens for the designs and shows the selectivity of the designed binders to the intended ligand. Testing if ligand analogs could also induce binding is valuable since it provides evidence of the selectivity of the designed binders. A similar analysis of the promiscuity of the mutated binders would be helpful since the mutated binders are the final designs. Such an analysis would support the orthogonality of the final designs for further applications and support the role of mutagenesis in improving designs in vitro. This would inform the promiscuity of the designs and whether SSM altered them. If the authors have not done this experiment, the authors could also use computational methods like docking or rigid body superposition based on the author's previous predictions to verify the predicted promiscuity of the mutated designs.

Figure 4

In figures S8 and 4C, the authors show the binding curves of the first-generation and second-generation binders with the associated Kd values in the presence of ligands, respectively. Including the error associated with the Kd values would be beneficial because this would display the distribution of the technical replicates of the assay.

Currently, Figure 4A shows the results from the site saturation mutagenesis (SSM) experiment, where the label of the color bar has low to high mutational sensitivity. Under the section titled “Biochemical characterization and structural validation” in the first paragraphs, the authors state that they “computed the average enrichment score of each mutation when comparing binding versus non-binding populations on yeast display experiments”. Additionally, the authors used mutants from the SSM to improve the binding affinity of the designs. Therefore, the extremes in the color bar should be loss-of-function and gain-of-function of ‘binding fitness’ or something similar.

Figure 5

In Figure 5, the authors used the progesterone complex for the in vitro transcription CID, the Venetoclax complex for the GEM CID, and the Actinonin complex for the split NanoLuc CID. We are curious how the authors chose ternary complexes for each CID system. We assume the authors would have tried each ternary complex in each CID system. Were there any unexpected challenges?

The final designs’ Kds for the Venetoclax, Progesterone, and Actinonin complexes are 96 nM, 18 nM, and 446 nM, respectively. The EC50s for the CID systems using these complexes are 0.31 nM, 1.2 μΜ, and 27 nM. We found this large discrepancy between the Kd and EC50 values concerning– could the authors expand on this more in the main text? We found the explanation for the Venetoclax CID system which stated it was “likely due to the co-localization of the sensing modules in the cell membrane” insufficiently detailed. This can also connect to the previous point: if authors tried each ternary complex in each system, it could provide further insight into the discrepancies such as system-specific effects. It may also be informative to show error bars for the EC50 values.

Supplement

In supplemental figures S10C and S11C, the authors plot the negative controls and the designs on separate graphs. That made the figures more confusing. To make it more legible, overlaying DBPro1156_2 (with Pro)/DBAct553_2 (with Act) on the plot that has DBPro1156_2 (no Pro)/DBAct553_2 (no Act), respectfully, could alleviate the confusion.

Competing interests

The authors declare that they have no competing interests.