Sequence Harmony - input format


		Sequence Harmony - input format

The SequenceHarmony program reads in a multiple sequence alignment. Therefore the first step is to paste in
or upload a (complete) multiple alignment. The sequence format should be FASTA.
NB. Make sure that all sequences belonging to the same subfamily are grouped together in the alignment.

Next, the identifier of first sequence of the second subgroup must be indicated. This identifier must be exactly
the same as the one in the multiple alignment.

Finally the identifier of a reference sequence may be included (optional), making it easy to retrieve all the
subtype specific sites of the sequence of interest.

Optionally, one can also choose the offset of the reference sequence, which is to say the starting position of the
reference sequence.This is very useful in case when the given sequence is a part of the whole protein sequence.

Example

Let's take the following simple example alignment:
>A1 RELAAAKK >A2 RELAAFKK >A3 REAAAYRK >A4 REAAAFRK >B1 HNVAAYRK >B2 HNVFFYRK >B3 HNFFAFKK >B4 HSFFFYRK >B5 HSMFFFKK >B6 HTMFFYKK
We would then enter 'B1' as identifier for the second subgroup. If sequence B2 corresponds to, e.g., a PDB file where the first 'F' (Phe) is number 160 (because we're looking at a domain from a larger protein), we would enter 'B2' as reference sequence identifier, and '160' as the starting position. If we're happy with numbering sites by their alignment position, no reference sequence or starting position are needed.

Sequence Harmony home

(c) IBIVU 2025. If you are experiencing problems with the site, please contact the webmaster.