view article

Figure 1
Example of a completed dynamic programming H matrix for two short sequences ADLPQ and ALPQ aligning with the simple identity matrix, which scores +1 for a match of two amino acids and 0 for a mismatch. As explained in the text, for proteins, a more sophisticated scoring matrix is normally applied. The aim is to find the maximum score for the alignment of the two sequences allowing for deletions in either sequence. The H matrix is filled by starting at H0,0 and proceeding one row at a time. At each cell, equation (1)[link] is applied. In the 0th row and column, there is only one possible predecessor cell and that corresponds to aligning each residue with a gap. In all other cells, there are three possible predecessor cells. (i) A diagonal move corresponds to a substitution. (ii) A horizontal move corresponds to alignment of a residue in B with a gap. (iii) A vertical move corresponds to alignment of a residue in A with a gap. For example, at H4,3, the comparison of P at A4 with P at B3, the score for the cell is the maximum of 1 + 1 (diagonal move), 0 − 1 (horizontal move), and 1 − 1 (vertical move). The arrows show the cells that contributed to each cell. Some cells have more than one arrow pointing into them since there can be ties for which cell gives the maximum value in equation (1)[link]. The cell in the bottom right-hand corner (H5,4) contains the best score for the alignment of the two sequences. Following the bold arrows backwards to H0,0 gives an alignment with the best score of 3. In this example, there is only one possible alignment with this score since there is no ambiguity on the path leading to H5,4.

Journal logoSTRUCTURAL
BIOLOGY
ISSN: 2059-7983
Follow Acta Cryst. D
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds