Upload your own Multiple Sequence Alignment file.
The query sequence to be used to derive your MSA:
If you submit a published structure (from the RCSB web), use the amino acid sequence as given in the FASTA sequence file on the web.
If you upload your own PDB+SEQUENCE files, use the query sequence from your SEQUENCE file.
DON'T EXTRACT THE QUERY SEQUENCE DIRECTLY FROM THE COORDINATES FILE.
File format should obey the following rules:
- The MSA file should be in FASTA format
- The first sequence should be the query sequence
- The query sequence name must contain the term THIS_IS_QUERY
- Make sure there aren't any special characters in the MSA (such as *, &, $ etc)
- Point mutations between the query sequence in the MSA and the protein are not allowed
- Note that the query sequence in the MSA can be longer than the sequence in the coordinates file (i.e. PDB file), due to missing density, but it cannot be shorter
When is it recommended to use my own MSA file?
In general, we recommend to avoid this option and let the algorithm generate its own MSA file.
In rare cases you may want to use your own MSA, typically due to 3 main reasons:
- If there is a need to exclude specific sequences that might be included in our automatically generated MSA.
- If only a small number of homologues is expected to be found for the protein of interest, and playing with the parameters that are open in PROSS does not solve the problem. PROSS uses blastp against the non-redundant database. If for any reason you think that using CSI/PSI-BLAST instead, or blasting against a different database would help, consider uploading your own MSA.
- PROSS clusters redundant homologues if they share more than 98% sequence homology. If you have a reason to think that this is too strict, make your own multiple sequence alignment.