PREreview of High-throughput computational discovery of inhibitory protein fragments with AlphaFold

by Priyanka Bajaj and 2 other authors

Published: May 7, 2024
DOI: 10.5281/zenodo.11130729
License: CC BY 4.0

Summary:

In a previous paper, the authors expressed fragments of peptides to identify sequences that would act as “dominant negative” inhibitors of the parent protein. However, screening peptides for inhibitory effects in cells by bulk competition and deep sequencing has limitations. Due to their limited size, peptides can potentially have multiple non-specific or off-target consequences such as multiple target binding, cytotoxicity, or non-specific binding to the target protein of interest. Validating that a peptide is a true inhibitor is critical, however, determining the mechanism of inhibition across multiple fragment sequences can be very time consuming. Genetically this could be done testing fragments in overexpressed target protein backgrounds to help confirm that the interaction between a given fragment and the desired target is due to target inhibition but, for libraries of fragments, a more high throughput method would be desirable. While in some cases (such as with GroEL and GroES) the authors inferred that the inhibitory effect of fragments were specific interactions due to the concentration of the target protein (i.e. correlation between higher expression of target and inhibition), the mechanisms of inhibition for several proteins were not definitively proven to occur through native interactions which leaves an open question as to whether these fragments are true inhibitors. Here they develop a computational screen to increase the confidence that inhibitory peptides work via the desired on-target mechanism, the authors have now developed a computational tool that is built upon AlphaFold called FragFold. FragFold structurally predicts the fragment bound to the target protein, with inhibitory fragments identified experimentally having a predicted high number of contacts between fragment and target.

The major strength of this paper is in developing a method that could be used to identify regions of proteins that are involved in PPI’s based on evolutionarily related sequences that recapitulate the native binding interfaces. The major weakness of this paper is that the underlying method of MSA concatenation is not clearly explained (see Major point 1). Why is the discontinuous unpaired strategy optimal relative to other AlphaFold-multimer-like strategies? Overall, the paper demonstrates the power of AlphaFold to closely recapitulate the structures of experimentally determined fragment binding interfaces by working exclusively in sequence space in a high-throughput manner. Further, in the absence of experimental structures, the authors present plausible AlphaFold predictions of fragment bound structures that are supported by biochemical and genetic data which could further contribute to the utility of this method in studying known PPI’s.

There are a few points we would like to bring to the attention of the authors to strengthen the manuscript further.

Major points:

The authors state that they generated multiple sequence alignments for both the fragments and target protein prior to running AlphaFold2 to minimize computation time. Although it is not clearly explained, the authors claim they concatenated these two MSA’s into a single MSA. We interpret this to mean that the input MSA’s were not directly pairing the fragment sequences to the target sequences but leaving either side of the fragment sequence and the target sequence blank to force AlphaFold to co-predict the structure of the fragment with the target protein by treating it as a single discontinuous sequence. While we were surprised by the simplicity of this method and the ability to remain in sequence space based on evolutionary similarity of fragment and target protein sequences, there are several questions we have regarding this implementation From the explanation provided with figure 1A, the fragment sequence appears to be directly paired to the target sequence, however in fig 1B the method appears to work by using the evolutionary information of many orthologs and related proteins of both fragment and target to co-predict their structures using a discontinuous input sequence. Is this correct? Could the authors provide a clearer description of how they are concatenating the MSA’s? We were also curious to know how different concatenation strategies affect the accuracy of predictions. For example, could the authors also try concatenating directly paired fragment-target sequences from the same species of origin (or even try this as a single continuous sequence)? If the sequences are continuous, does adding linker regions between the fragment and target alter the results? Does the order of concatenation affect the results (concatenating from the N or C-termini of the target?).

In Figure 2A, the peaks indicating inhibitory activity correlate positively with the observed peaks in calculated weighted N_contacts, which forms the basis for all inferences. However, the data reveals an interesting observation in the initial segment of the protein fragment (0 to 100 aa) that while there is a peak indicating predicted binding by the AlphaFold model, there is no corresponding inhibitory activity observed for the protein in that region. Any comments on this discrepancy?
The authors introduce (f_native,pairwise) and (f_native,binding) to quantify similarities between experimentally solved structures and AlphaFold2 models. We found the explanation of these metrics to be confusing, one refers to the fragment and the target site but the other refers to the native binding site bound by the fragment. Is one referring to the contacts made in the experimental structure and the other the contacts in the AlphaFold model? Further clarification of what these precisely correspond to would be helpful for discerning the similarities and differences between the two.
In both figures 2 and 3 the authors show structures of the experimentally solved complexes and the predicted AlphaFold models side by side. We were curious to know whether the AlphaFold models were able to recapitulate the sidechain conformations. In addition we were also curious to know whether the AlphaFold model recapitulated any key contacts made between the binding site and the fragment (ex: salt bridges, electrostatic interactions between charged amino acids, pi-pi stacks, hydrogen bonds between sidechains).
The authors explain their use of tiling to generate inhibitory fragments and that overlapping fragments generate greater predicted binding peaks. Have the authors attempted to use smaller fragments in their program (i.e. what is the smallest fragment size that AlphaFold can still predict to be correctly binding?). We are curious to know if effects vary by changing the fragment sizes. Further, can this method be expanded upon to study multiple different fragments that bind to different sites on the target protein simultaneously?

Minor points:

The orange and yellow lines used to show contacts between the fragment and target are difficult to distinguish from each other. Consider a different set of colors?

Figure 4C+E. The use of black in the model makes it difficult to distinguish the sidechains and interactions with the fragment.

Reviewed by CJ San Felipe, Priyanka Bajaj and James Fraser (UCSF)

Competing interests

The authors declare that they have no competing interests.