Introduction to KSD


Kinase Sequence Database is a collection of protein kinase sequences grouped into families by homology of their catalytic domains. The aligned sequences are available in MS Excel format, as well as in HTML.

After a recent update, the database features a total of 287 families, which contain 7128 protein kinases from 948 organisms.

Residues contacting ATP in the active site are marked green. The "gatekeeper" position, most frequently used to introduce space-creating mutations, is highlighted in red.

Excel spreadsheets are available in two versions: for PC's (Microsoft Office 97/2000) and Macs (Microsoft Office 98). The PC version offers interactive display of residue numbers, i.e., in order to determine the absolute position of a residue in the complete sequence of the kinase of interest, click a control button in the corresponding row, and the ruler will be displayed in the top row of the table. We have not yet been able to offer the same functionality for Mac users, because of limitations of the MS Office 98 implementation of Visual Basic, which is used to provide interactive features.

On the other hand, the PC files are larger , so if you are not particularly interested in convenient access to absolute residue numbers, you could speed up download times by using the Mac version. The latter has offsets from the start of each sequence printed in the beginning of each row, and common ruler is displayed at the top of the table, so you could calculate the actual number manually.

Note for PC users: when opening the files, you will be asked by Excel if it's ok to enable macros. Choose "Enable macros" in order to be able to use interactive numbering display.

Searching: You can find the kinase of interest by its name (e.g., c-Src) or by its GenBank or SwissProt accession number. Sequences can also be found by description and organism. Search results are presented as a list of links to the Excel files containing the hits with names of sequences that matched query parameters. Both PC and Mac versions will be shown.

Perhaps, the most effective way to search for a sequence is with description fields, because only a fraction of proteins represented in the database have conventional names assigned (~ 400).

Phylogeny: the trees that have been pre-generated for each family, are available in JPEG, PDF and PostScript formats. JPEG is more convenient for fast viewing of rather small families (up to about 30 members). Beyond that, labels become unintelligible. In this case, you should use PDF formatted documents.



Note: If you find any inaccurate or outdated links or experience problems with using this site please contact the webmaster.