The new kinesin phylogenetic tree is a re-evaluation of the kinesin microtubule motor protein family (Kim and Endow, 2000) (see also Miki et al., 2001; Lawrence et al., 2002), inspired by the recent completion of the genome sequences of humans and several model organisms. The kinesin motors hydrolyze ATP as they move along microtubules, transporting vesicles and organelles (Hirokawa, 1998) and performing essential roles in chromosome motility and spindle assembly and function (Inoué and Salmon, 1995; Endow, 1999; Sharp et al., 2000). The new kinesin tree includes 155 proteins from 11 species and focuses on the organisms Plasmodium falciparum, Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens and Arabidopsis thaliana – a protist, a yeast, two invertebrates, a vertebrate and a higher plant. The focus on humans and selected model organisms provides a look at the evolutionary relationships of the kinesin proteins from several well-studied species. A notable feature of the new tree is the emergence of several new groups consisting only of Arabidopsis proteins, which suggests that the kinesin motors may have a broader range of functions in higher plants than in other organisms.
The tree was built by parsimony methods using PAUP v. 4.0b10 (Swofford, 2002) from a sequence alignment of kinesin motor domains made with CLUSTAL W (Thompson et al., 1994) and refined manually. Tree-building trials used heuristic search methods with random stepwise addition, `tree-bisection-reconnection' (TBR) branch swapping and maximum parsimony settings. The tree includes founder proteins from previously identified kinesin groups and is arbitrarily rooted using ScSmy1 as the outgroup protein. ScSmy1 is not known to be the ancestral kinesin, but has a highly divergent motor domain sequence compared with that of other kinesin proteins. The numbers adjacent to nodes are the percentages of 1810 bootstrap trials performed in PAUP using full heuristic methods in which the proteins to the right grouped together. Bootstrap values of ≥90% provide confidence that the proteins in the group are orthologs or paralogs of one another, whereas lower values indicate less certainty. The horizontal branch lengths correspond to the number of changes needed to explain the differences in protein sequences, as indicated by the scale bar at the bottom. The tree is one of two optimal trees found in 600 tree-building trials and has an overall length of 17,867.
Trees were also built in PAUP using a neighbor-joining algorithm. Distance methods are less stringent than parsimony methods, but have the advantage of being less demanding computationally for analysis of large data sets. The neighbor-joining trees showed the same groups as the maximum parsimony trees, but some groups did not include proteins previously classified as members of the group. For example, AtKCBP, a minus-end-directed kinesin motor with a C-terminal motor domain (Song et al., 1997), did not fall into the C-terminal motor group in the neighbor-joining trees but was included in the group in the maximum parsimony trees. We therefore show a maximum parsimony tree, rather than a neighbor-joining tree, as representative of the evolutionary relationships of the kinesin proteins in the alignment.
Five groups in the tree, the KRP85/95, ChrKin/KIF4, BimC, C-terminal motor and CENP-E kinesins, had bootstrap values of <90% by parsimony analysis. The bootstrap values by neighbor-joining analysis of two of these groups, KRP85/95 and ChrKin/KIF4, were 100% and 95%, respectively, which supports their classification as groups. The bootstrap value of the BimC group was only 85% and 86% by parsimony and distance analysis, respectively. Analysis of the new proteins in the group showed high sequence identity (>50%) between their motor domains and that of AnBimC, the founding member of the group (Table 1), which supports their classification as members of the group. A new P. falciparum kinesin, PfMAL3P6.13, has only 37.0% sequence identity to AnBimC in its motor domain, but is shown as a member of the group on the basis of the branching pattern. Its assignment to the BimC group will ultimately rely on functional properties and structural features of the protein, when this information is available. The C-terminal motor group, which contains all the known minus-end-directed kinesin motors, in contrast to the plus-end-directed kinesins outside the group, may lack a high bootstrap value because of divergence within the group. The proteins in this group have in common a C-terminal motor domain and a conserved neck region (Table 1). Two Arabidopsis proteins in the group, AtT12M4.14 and AtF15A18.10, have N-terminal instead of C-terminal motor domains, which is either a new feature of some proteins in the group or due to misassembly of the deposited sequences. The assignment of the two proteins to the C-terminal motor group is supported by high sequence identity to AtKatD and a conserved neck in AtF15A18.10 (Table 1).
The new CENP-E group has the kinetochore motor HsCENP-E as its founding member. Two A. thaliana proteins, AtZCF125 and AtF14P13.22, are supported by a 95% bootstrap value as members of the CENP-E group by the neighboring-joining but not the maximum parsimony analysis. The Drosophila proteins DmCmeta and DmCana are thought to be related to HsCENP-E, on the basis of sequence identity and functional analysis (Yucel et al., 2000), but assignment of these two proteins to the CENP-E group is not currently supported by either neighbor-joining or maximum parsimony analysis, possibly owing to the lack of intermediate taxa. Classification of the Drosophila proteins as members of the group is based on functional similarity and sequence identity (in the case of DmCmeta) to CENP-E (Table 1).
The new tree includes 60 A. thaliana proteins, 21 of which fall into the C-terminal motor group. Arabidopsis is thought to lack axonemal and cytoplasmic dyneins (Lawrence et al., 2001) and minus-end-directed kinesins may perform functions carried out by dyneins in other organisms. Two new A. thaliana groups, referred to here as At1 and At2, as well as several smaller groups, emerged on the new tree with bootstrap values of 95-100%. At1 and At2 contain 8 and 5 proteins, respectively. The proteins in these groups have N-terminal motor domains with a high sequence identity to an early branching member of the group (Table 1).
Kinesins that group together are thought to perform similar functions, although this idea may change as more kinesin functions become known. The groups are color coded according to function (red represents chromosome/spindle motility and green represents vesicle/organelle transport). The MCAK/KIF2 group (orange) includes proteins thought to perform both functions. The At1 and At2 protein functions are not yet known; the groups are therefore not colored. The color coding of groups in the tree implies that previously unclassified proteins have roles similar to those of proteins with determined functions, but analysis of cellular functions was not undertaken.
Further methods used in our tree building, including manual refinement of the alignment and analysis of partial motor sequences, are detailed elsewhere (Goodson et al., 1994; Moore and Endow, 1996). The tree does not include all known kinesin proteins and is not meant to be exhaustive, but includes all the presently known kinesins (as of July 2003) with reliable sequences found in database searches for the model organisms P. falciparum, S. cerevisiae, C. elegans, D. melanogaster, H. sapiens and A. thaliana, whose genomes have now been completely sequenced. We could not control for misassembled sequences deposited into the public databases or entries from contaminating DNA in the sequenced genomes. Some of the proteins that were analyzed showed divergence of conserved sequence motifs (see Table 2). Alternative names for proteins and protein groups have been published previously (Miki et al., 2001; Vale and Fletterick, 1997) and can also be found at the Kinesin Home Page [www.proweb.org/kinesin (E. A. Greene, S. Henikoff and S. A. Endow, 1996)], which has links to protein and DNA databases together with additional information about the kinesin proteins.
This work was supported by grants from the NIH and HFSP to S.A.E.
Species abbreviations: An, Aspergillus nidulans (Emericella nidulans); At, Arabidopsis thaliana; Ce, Caenorhabditis elegans; Cf, Cylindrotheca fusiformis (diatom); Dm, Drosophila melanogaster; Hs, Homo sapiens; Lm, Leishmania major; Mm, Mus musculus; Pf, Plasmodium falciparum; Sc, Saccharomyces cerevisiae; and Spo, Schizosaccharomyces pombe.
Websites used in searches for kinesin proteins: GenBank, www.ncbi.nlm.nih.gov/Genbank/; P. falciparum, www.sanger.ac.uk; D. melanogaster, www.fruitfly.org and flybase.bio.indiana.edu; H. sapiens, www.genome.gov; A. thaliana, www.arabidopsis.org
- © The Company of Biologists Limited 2004