PROSS aims to provide a solution to experimentalists who work with challenging proteins. In a nutshell, the algorithm gets a protein sequence (and structure) as input and provides as output several mutated sequences that are expected to be more stable.
For a detailed explanation about the method and exemplary results, check out our recently published paper in Mol Cell, July 2016
In citing the PROSS server please refer to the above publication.
This page discusses the following topics/questions:
What do we mean by stability?
PROSS aims to alleviate problems of low expression levels, difficulties in expression in E. coli or other heterologous systems, low solubility, misfolding (the protein is soluble and folded but in an inactive conformation), aggregation, short half-life in-vitro or in-vivo, low Tm, dependency in MBP-tag and others.
If your protein suffers from one or more of these problems, PROSS may help you a lot!
To submit a PROSS query you must provide a template X-ray structure of the protein target or a model (the latter only under very specific conditions).
For more details see below and here
Which proteins are suited for stabilization by PROSS?
PROSS must receive an X-ray structure as input.
If your protein does not have a solved X-ray structure, you may use as input a structure of a close homolog.
The sequence homology between the protein of interest and the input structure must be at least 40%.
Obviously, as the homology becomes lower, results become less accurate.
For more details see here
What are PROSS main advantages over other stability methods currently available?
- PROSS solution requires low throughput validation. The final output is 7 designs.
Each design has a certain number of substitutions, typically a number between 2%-12% of the total protein.
The experimental validation is straightforward: select 2-5 designs, order the full genes, clone them, express and test the selected designs to see whether the original problem is alleviated.
- PROSS is easy to use. You can't make fundamental errors. You just need to know your protein.
What do the result files include?
When results are ready you will get an email with a zip attachment that includes the following files:
**The refined structure is a pdb file of the input structure after relaxation in PROSS.
Minor conformational changes are expected compared to the original structure.
- A pdb file of the refined structure (WT sequence)**
- 7 stabilized protein models (also termed designs). For each model you will receive:
- A pdb file of the model
- The amino acid sequence of the model in fasta format
- A multiple sequence alignment (MSA) file that was generated by PROSS and used during the process
- A readme file which includes PyMOL commands for visualization of the mutated residues on PyMOL
To proceed to experimental validation you only need the sequence files (more details below).
The pdb files are for viewing the results and comparing to the WT (more details below).
The MSA was used during PROSS calculations. You can see there which and how many homologs contributed to the calculation.
How can I view the results and what do I expect to see?
In principle, you can go ahead and order the DNA sequences (see recommendations below) without examining the results. However, we do recommend that you take a look at the models, see that they make sense to you and that mutations are not too close to active sites.
2 ways of viewing the results
- Our graphical browser - you will get a link to it with the results email. You can use it to move, rotate or zoom in and out on the structure. You can also click on any mutation to zoom in on it. The browser may respond quite slowly especially for large proteins.
- PyMOL visualization
- Open a PyMOL session and load the refined structure (WT), then load design_7.
- Open the readme file and find the sentence: "Commands to select all mutated positions" and copy the command line for design_7.
Note that the command begins with "select resi".
- Paste the command in the PyMOL command line and press enter. You should see the mutated residues selected on both structures.
- Go to "sele" on the right side of PyMOL, click on "S" and then on "sticks".
- At this point you should see the WT structure in one color, design_7 in another color and the mutated residues presented in sticks while all other residues are in lines. I like to present the mutated amino acids in a different color. To do this use the same command line used above but mention a specific structure. For example: select resi 64+65+81+91+119+130+131+136+139 and 3WCY_designed_7. Then go to "sele", click "C" (for color) and choose a different color.
- You can now use different PyMOL options to compare the WT and the design like showing cavities, hydrogen bonds etc.
When viewing the structures you would expect to see some/all of the following features:
- Most mutations would be on the surface and only a small fraction in the core.
- Surface charge and polarity will typically increase. In addition, charge distribution may change and in some cases the protein pI might change dramatically. New salt bridges and surface hydrogen bonds may be observed.
- You may observe some mutations to Proline on loops or on helices termini/kinks.
- Some mutations improve secondary structure propensities. Others may be involved in better helix capping.
- Mutations in the core either improve packing / eliminate unsatisfied hydrogen bond donors or acceptors / make hydrogen bonds / improve the secondary structure propensity / alleviate repulsion.
Note that design_7 is the most permissive one and the other 6 designs will typically be sub-sets of design_7. Therefore, it is enough to view design_7 to get a feeling about what PROSS is doing.
How to select sequences for experimental validation?
PROSS automatically provides 7 stabilized models. Design_7 is the most permissive one and contains the highest number of substitutions when compared to WT. Design_1 is the strictest one and contains the lowest number of substitutions.
In some cases designs are very similar to each other and in other cases are very different than one another.
No reason to invest experimental work in very similar designs. Therefore our recommendation is to select 3 designs that are most different than one another (in terms of the number of substitutions per design):
How to proceed towards experimental validation?
- In prinicple, you would like to include the most permissive design (design_7). It is the most aggressive one and therefore holds the highest potential for a significant effect on stability. If it is too aggressive in your opinion take design_6 instead.
- On the other hand, permissive designs are more risky so you would like to include a 2nd modest design (1 or 2 or 3).
- For a 3rd design pick another design in the middle.
- If you can't find 3 designs that are different enough than one another consider testing only 2.
- If even design_7 is already very modest (<10 mutations), consider ordering this one only and test it first. According to the results consider your next step.
- All mutations are predicted to be independently stabilising so, if a specific mutation does not make sense to you at all, you can usually exclude it with no worries. However, if it is close to other mutations you may want to re-consider or exclude the close ones also.
A low number of sequences in the multiple sequence alignment
- Select 2-5 designs for experimental testing. In a paragraph above I address the question of how to select them out of the 7 designs PROSS provided.
- For each selected design, align its amino acid sequence (provided to you in the results email) with the WT sequence. The main thing you want to verify is that no residue is missing. Sometimes there is missing density in crystal structures leading to gaps in the primary sequence. PROSS aims to detect these gaps and complement them however, in some cases this is impossible due to incorrect information in the rcsb website.
The bottom line: YOU MUST VERIFY THE AMINO ACID SEQUENCES BEFORE ORDERING. ERRORS WILL BE PAINFUL (in terms of money and time)
- Back-translate the amino acid sequences to DNA sequences (optimized for E.coli in case of standard bacterial expression).
For this purpose I like to use the following website: dnaworks. This site allows you to optimize for different organisms and exclude in advance undesired sequences. Another optional web is EMBOSS
- Order the full genes (we strongly recommend to order the full gene rather than inserting mutations one by one). Among the companies providing this service are Gen9 (require large quantities), IDT (oligos order) and Genscript.
- Calculate the pI values of the ordered designs. If these are significantly different than WT consider changing the buffer pH. pI calculator
- Express your protein and use an appropriate assay to examine whether PROSS designs are more stable than wild type with respect to the problem that made you use PROSS originally.
PROSS generates a multiple sequence alignment (MSA) that is then used for calculations.
In standard cases the number of sequences in the MSA ranges from hundred to thousands of unique sequences, typically few hundreds.
If the number of sequences in the MSA is < 50 you will get a notification about that in the results email.
In such a case we suggest to try and run more queries with alternative MSA parameters to increase the number of sequences (read the help pages of each parameter before). If other parameters do not yield better MSAs you may go ahead and test the designs but remember that they are considered less reliable. Specifically, if the number of sequences is <15 we suggest to avoid experimental testing. PROSS is not the solution for you.
If at this point you are feeling lost, send us an email email@example.com.