|
|
|
|
Sequence Harmony - input format
|
The SequenceHarmony program reads in a multiple sequence alignment. Therefore the first step is to paste in
or upload a (complete) multiple alignment. The sequence format should be FASTA.
NB. Make sure that all sequences belonging to the same subfamily are grouped together in the alignment.
Next, the identifier of first sequence of the second subgroup must be indicated. This identifier must be exactly
the same as the one in the multiple alignment.
Finally the identifier of a reference sequence may be included (optional), making it easy to retrieve all the
subtype specific sites of the sequence of interest.
Optionally, one can also choose the offset of the reference sequence, which is to say the starting position of the
reference sequence.This is very useful in case when the given sequence is a part of the whole protein sequence.
Example
Let's take the following simple example alignment:
>A1
RELAAAKK
>A2
RELAAFKK
>A3
REAAAYRK
>A4
REAAAFRK
>B1
HNVAAYRK
>B2
HNVFFYRK
>B3
HNFFAFKK
>B4
HSFFFYRK
>B5
HSMFFFKK
>B6
HTMFFYKK
We would then enter 'B1' as identifier for the second
subgroup. If sequence B2 corresponds to, e.g., a PDB
file where the first 'F' (Phe) is number 160 (because we're looking at
a domain from a larger protein), we would enter 'B2' as
reference sequence identifier, and '160' as the starting position. If
we're happy with numbering sites by their alignment position, no
reference sequence or starting position are needed.
Sequence Harmony home
(c) IBIVU 2024. If you are experiencing problems with the
site, please contact the webmaster.
|