multi-RELIEF Feature Weighting Method

Algorithm

Finding functional specific sites is a major problem in protein sequence analysis. Multi-RELIEF is a feature weighting algorithm for multiclass classification problems and can be therefore used to identify functional specific sites in a multiple sequence alignment.

Method

Based on RELIEF¹, a well known machine learning technique for feature weighting in two-class problems, multi-RELIEF can be used also in multiclass problems such as multiple sequence alignments. The principle behind the algorithm is to measure the quality of an attribute, e.g. a site, in merging neighboured sequences within the same class and separating neighboured sequences within different classes. The neighbour of a sequence is defined as any other sequence with minimal distance from it according to the 1-norm:

Example:

Sequence 1	A	A	B	C
Sequence 2	A	B	C	D
Sequence 3	A	A	B	D
Sequence 4	A	B	B	B

In the example, the nearest neighbour of Sequence 1 is Sequence 3 with distance 1. Sequences 2 and 4 are having distance 3 and 2 respectively. Using this definition RELIEF performs the following steps:

For each iteration of the algorithm an instance R is randomly selected and the two neighbours, hit(R) and miss(R), are calculated. hit(R) is the neighbour of R having the same class as R. miss(R) is the neighbour of R having the opposite class. The next step is to update the weight vector initialized to zero according to this formula,

where diff(a,b) is defined as 0 if a equals b and 1 otherwise. Normalization the differences with m ensures that the resulting weights range from -1 to 1 after m iterations of the algorithm. The multi-RELIEF algorithm extends this approach to multiclass problems by sub-sampling from pairs of classes. For every iteration of multi-RELIEF two classes are randomly selected and the RELIEF algorithm performed on them. The final weights for each attribute are calculated the following way:

If there had been positive weight assignments for attribute a:

compute average of all positive weight assignments

If there had been negative weight assignments for attribute a (no single positive weight assignment):

compute average of all negative weight assignments

If there had been just zero weight assignments set the weight of attribute a to zero:

Boosting multi-RELIEF's estimates with 3d contacts

In order to possibly improve the results of multi-RELIEF, a heuristic to boost the weights of a site using information about its neighbours has been developed. The idea is fairly simple but nevertheless effective:

Every sited gets updated by the weights of its neighbouring sites. Thus, sites with high ranked neighbours are boosted.

Performance of multi-RELIEF

The mulit-RELIEF algorithm shows good overal performance in comparison with other recent methods for selecting functional specific residues.