FASTA
The fasta format is used to represent the sequences. Every sequence starts
with a ">" symbol followed by a arbitrary string which is called the header of a
sequence. The header should not be longer than one line. The next lines until
represent the sequence alignment till another line starting with the ">" symbol
indicates the beginning of a new sequence. Optionally, comments can be included
by lines starting with ";". Those lines will be removed during the
preprocessing. A valid FASTA sequence alignment can look like this:
>1wet
;comment
ATIKDVAKRANVSTTTVSHVINKTRFVAEETRNAVWAAIKELHYSPSAVARSLKVNHTKS
IGLLATSSEAAYFAEIIEAVEKNCFQKGYTLILGNAWNNLEKQRAYLSMMAQKRVDGLLV
MCSEYPEPLLAMLEEYRHIPMVVMDWGEAKADFTDAVIDNAFEGGYMAGRYLIERGHREI
GVIPGPLERNTGAGRLAGFMKAMEEAMIKVPESWIVQGDFEPESGYRAMQQILSQPHRPT
AVFCGGDIMAMGALCAADEMGLRVPQDVSLIGYDNVRNARYFTPALTTIHQPKDSLGETA
;comment
FNMLLDRIVNKREEPQSIEVHPRLIERRSVADGPFRDYR
>Hi_purR_purR
ATIKDVAKMAGVSTTTVSHVINKTRFVAKDTEEAVLSAIKQLNYSPSAVARSLKVNTTKS
IGMIVTTSEAPYFAEIIHSVEEHCYRQGYSLFCVTHKMDPEKVKNHLEMLAKKRVDGLLV
MCSEYTQDSLDLLSSFSTIPMVVMDWGPNAN--TDVIDDHSFDGGYLATKHLIECGHKKI
GIICGELNKTTARTRYEGFEKAMEEAKLTINPSWVLEGAFEPEDGYECMNRLLTQEKLPT
ALFCCNDVMALGAISALTEKGLRVPEDMSIIGYDDIHASRFYAPPLTTIHQSKLRLGRQA
INILLERITHKDEQYSRIDITPELIIRKSVKSI
>Vc_purR_purR
ATIKDVARLAGVSTTTVSHVINKTRFVAETTQEKVMEAVKQLNYAPSAVARSLKCNTTRT
IGMLVTQSTNLFFSEVIDGVESYCYRQGYTLILCNTGGIYEKQRDYIRMLAEKRVDGILV
MCSDLTQELQDMLDAHKDIPKVVMDWGPETS-HADKIIDNSEEGGYLATKYLTDRGHTEI
ACLSGHFVKAACQERIQGFRRAMAEAKLTVNEDWILEGNFECDTAVLAADKIIAMDKRPT
AVFCFNDTMALGLMSRLQQKGIRIPEDMSVIGYDNIELAEYFSPPLTTVHQPKRRVGKNA
FEILLERIKDKEHERRIFEMHPEIVERDTVKDLT
|