Intracellular signal transduction occurs through cascades of reactions involving dozens of proteins that transmit signals from the cell surface, through a crowded cellular environment filled with organelles and a filamentous cytoskeleton, to specific targets. Numerous signaling molecules are immobilized or transiently bound to the cytoskeleton, yet most models for signaling pathways have no specific role for this mesh, which is often presumed to function primarily as a scaffold that determines cell mechanics but not information flow. We combined analytical tools with several recently established large-scale protein-protein interaction maps for Saccharomyces cerevisiae to quantitatively address the role of the cytoskeleton in intracellular signaling. The results demonstrate that the network of signaling proteins is intimately linked to the cytoskeleton, suggesting that this interconnected filamentous structure plays a crucial and distinct functional role in signal transduction.
The normal functioning of a cell requires constant interaction with its extracellular environment and with other cells, and these interactions lead to changes in cell physiology, cell shape and gene expression. Signals from neighboring cells and the extracellular matrix are perceived by membrane-bound receptors, resulting in changes in their biochemical or physical states that typically initiate a cascade of signaling events within the cell (Pawson, 1995; Rosales et al., 1995). Intracellular signal transduction might involve physical processes (such as diffusion), chemical changes (such as phosphorylation) of signaling intermediates or both. For most characterized signal transduction pathways, the initial signaling event and the end point are known, but intermediate events that transmit the signal are either partially or completely unknown. In order to fully understand intracellular signal transduction, it is essential to know the intermediate signaling molecules and to understand how information flows from one to the next. These issues are difficult to address experimentally because signaling molecules typically bind each other transiently and with relatively low affinities.
The cytoskeleton, an interconnected assembly of actin, intermediate filament and microtubule networks that extend throughout the entire cell, is involved in intracellular signal transduction (Rasmussen et al., 1990; Hameroff et al., 1992; Ingber, 1993a,b; Forgacs, 1995a,b; Burridge and Chrzanowska-Wodnicka, 1996; Janmey, 1998; Shafrir et al., 2000). Experimental evidence indicates that individual filaments of the cytoskeleton transmit mechanical perturbations, which can be used as tracks to move organelles within the cell, and provide transient docking sites for proteins and lipids (Mochly-Rosen, 1995; Isenberg and Niggli, 1998; Janmey, 1998). However, most of the evidence regarding the role of the cytoskeleton in signal transduction originates from experiments that employed destructive perturbations to the cytoskeleton, such as those caused by drugs that depolymerize filaments. These manipulations cause a complete loss of one or more cytoskeletal elements, leading to global changes that complicate the interpretation of experiments.
Recent progress in proteomics offers the possibility to quantitatively address the role of the cytoskeleton in intracellular signaling. Analysis of protein interactions on the scale of entire proteomes by yeast-two-hybrid screening and protein purification has generated a huge amount of information regarding protein networks within the cell. So far, these large scale experimental approaches have been applied most extensively to the budding yeast, Saccharomyces cerevisiae (Fields and Song, 1989; Gavin et al., 2002; Ho et al., 2002; Ito et al., 2001; Ito et al., 2000; Bader et al., 2001; Maslov and Sneppen, 2002; Mewes et al., 2002; Tong et al., 2002; Uetz et al., 2000; Xenarios et al., 2000; Jansen et al., 2003). In this study, we developed several independent, quantitative methods to probe for correlations of functionally defined protein classes. Specifically, we tested the hypothesis that the network of interacting cytoskeletal proteins and the network of signaling proteins are integrated to a higher degree than other functionally defined classes of proteins. We found that the correlation of signaling proteins with cytoskeletal proteins is much stronger than with 15 other protein classes examined. These results strongly suggest that without the cytoskeleton, the intracellular signaling apparatus of the cell cannot properly function.
Materials and Methods
Two independently performed, comprehensive two-hybrid assay screens were reported and interaction maps summarizing their results were extensively characterized (Ito et al., 2001; Uetz et al., 2000). These databases primarily contain information regarding pair-wise protein-protein interactions, although they also contain interactions mediated by intermediate bridging proteins. The database of interacting proteins (DIP) (http://dip.doe-mbi.ucla.edu/) (Xenarios et al., 2000) and the Munich Information Center for Protein Sequences (MIPS) (http://mips.gsf.de/) (Mewes et al., 2002) give information based on two-hybrid screens, biochemical purification, and genetically-derived interactions. Here, we present a quantitative analysis based on the two-hybrid screen of Uetz et al. (Uetz et al., 2000) (referred to as `U database'), which contains 4480 interactions between 2115 proteins and is the smallest interaction network, and DIP (Xenarios et al., 2000) (referred to as `D'), which contains 20,098 interactions among 5798 proteins and provides one of the largest networks. In the interaction maps analyzed in the present work, proteins are represented as nodes (small circles) and the interactions are represented as lines linking the nodes. Within these networks, a connected `cluster' is defined as the set of proteins for which a path between any two nodes (through the links) exists. We performed our analysis on the largest connected cluster of each interaction network. For the U database, the largest such cluster contained 1458 nodes (approximately 24% of all yeast proteins), whereas the largest cluster for the DIP database contained 4198 nodes (approximately 68% of all yeast proteins). We note that, although strong disparities exist between the various datasets, all datasets led to similar results.
To quantitatively study the clustering tendency of proteins in the various subclasses we employed several approaches. For global characterization of clustering we defined for each protein pair (i,j) in the interaction network the distance dij as the length of the shortest path connecting them, and analyzed the distance distribution P(dij) for all possible combinations of proteins. By this definition, the value of dAB therefore is, dAB=1 for proteins A and B that interact directly (i.e. are connected by one link) and dAB=2 for proteins A and B that both interact directly with C, but not with each other (and thus dAC=dCB=1), etc. This metric describes the distribution of path lengths between all pairs of interacting proteins in a given cluster.
To characterize the local structure of interaction networks, we introduced the local clustering index md(x/y), which counts all those proteins (denoted by y) that are at a distance d from a given protein (denoted by x). Here, x and y stand for the various protein classes: c, cytoskeletal protein; s, signaling protein; r, a protein that is not in class c or s. By its definition, md(x/y) contains information about the number of those y-type proteins that are d steps away from a given protein x, or equivalently that can be reached from x by 3 links. The primary `d=1-neighbors' or `nearest neighbors' of a given protein x are those proteins that directly interact with protein x. The nearest-neighbor clustering index, m1(c*/s) for a selected cytoskeletal protein c* is then calculated as For a given protein, this metric gives the proportion of interactions to other proteins in a given class. Analogously, md(x/y) quantifies the composition of y-proteins at distance d from an x-protein. Thus, the analysis was carried out for pairs of proteins that directly interact (d=1), that interact via one bridging protein (d=2), and so on.
Results and Discussion
Definitions of signaling and cytoskeletal proteins
In order to construct the signaling (s) and cytoskeletal (c) protein sets, we categorized the gene products of S. cerevisiae as components of a signaling pathway, the cytoskeleton or neither of them (the random r set). The rules used to define these sets were based on experimentally determined, biochemical or genetic features of each protein, without a reference to the databases that constitute the available interaction maps. Because S. cerevisiae does not have intermediate filaments, the composition of the cytoskeleton was defined as actin, tubulin, proteins that bind actin or tubulin, proteins that bind a protein that binds actin or tubulin, and the septins, leading to the identification of 125 cytoskeletal proteins, which is 2.2% of the yeast proteome (see supplemental data for the entire list, http://jcs.biologists.org/supplemental/). This definition includes the filamentous septin, the actin and tubulin networks (including known cross-linkers, capping, severing, etc. proteins), and most proteins that localize to actin patches, which underlie the plasma membrane and are prominent components of the yeast cytoskeleton. The set of signaling proteins included all protein and lipid kinases, phosphatases, GTPases and their auxiliary factors, heterotrimeric G-protein-linked membrane receptors, nucleotide cyclases/phosphodiesterases, and biochemically or genetically characterized scaffolding proteins. This analysis identified 342 signaling proteins, 5.9% of the proteome (see supplemental data for the entire list). Twenty proteins were common to both sets. Importantly, the criteria used to define cytoskeletal and signaling proteins are conservative and independent of each other. Several metabolic kinases known to bind directly to the cytoskeleton (e.g. phosphofructokinase) were not included in the cytoskeleton protein set because they might obscure the more subtle interplay between the cytoskeleton and other signaling pathways. In addition, uncharacterized open reading frames with homology to known signal transduction proteins were excluded. These definitions, therefore, focused the analysis on proteins for which functional information is currently available.
In the currently available protein interaction databases, information was available for subsets of the proteins in the classes defined by us. In the database by Uetz et al. (Uetz et al., 2000) and in DIP (Xenarios et al., 2000), we identified 74 (U) and 92 (D) cytoskeletal proteins, and 141 (U) and 207 (D) signaling proteins in the largest interconnected clusters. Fifteen (U) and 18 (D) proteins were shared by the two classes in each database. Surprisingly, tubulin and tubulin-associated proteins were not present in the largest connected clusters for either the database by Uetz et al. or DIP; they formed separate connected clusters with a small number of proteins.
The largest connected cluster within the U database shows the c proteins in yellow, s proteins in green and proteins found in both classes in red (Fig. 1). Inspection of Fig. 1 qualitatively suggests correlations between cytoskeletal and signaling proteins because the majority of these two protein groups form relatively localized clusters within the network.
To quantify the clustering tendency of proteins in each class, we calculated the distance distribution P(d) (see Materials and Methods) for all protein pairs in the largest interconnected clusters (Fig. 2). Because the distance between two proteins was defined as the number of links required to travel from one protein to another (see Materials and Methods), the function P(d) for all proteins in a cluster reflects the degree to which the proteins within the cluster interact with each other. When calculated for the set of all proteins in the largest connected cluster in the database by Uetz et al., the peak of P(d) was approximately at d=6.8. As expected, the peak of the distance distributions for the c and s proteins was shifted to lower values, 5.4 and 6.0, respectively, indicating that proteins within these groups preferentially interact with each other. The corresponding values for all proteins, cytoskeletal proteins and signaling proteins derived from the DIP data set are 5.4, 4.0 and 4.3, respectively. Notice that, due to our definition of the cytoskeletal protein class, the maximum value of dcc, derived from an ideal interaction map, should be dcc=4, because for each protein in this class (except for septins) the maximal distance from actin is two. (Although the distance between septins and actin is not constrained, only three septins appear in the largest interconnected U and D clusters so their effect on the maximum value of dcc is negligible.) Not surprisingly, this (dcc=4) is not reflected by the two datasets that were used, because our procedure to classify the yeast proteins is independent of these interaction maps. It is, however, consistent with the built-in enhanced clustering of cytoskeletal proteins in that 〈dcc〉 is the smallest among the values listed in Fig. 2. Here, 〈d〉 denotes the average of d over the distribution P(d). For the case of the DIP network map of cytoskeletal proteins, where 〈dcc〉=4 (Fig. 2), the majority of c-c connections do indeed have d≈4. This observation suggests that P(d) accurately describes interactions within the networks and, as more information is obtained regarding interactions of cellular proteins, the methods we have devised should be of general use.
Using distance distribution analysis, we also determined how closely signaling proteins are linked to cytoskeletal proteins. As can be seen from Fig. 2, the peak value of P(dcs), the distance distribution for all pairs of c and s proteins, is also shifted to smaller d values, indicating that the two groups are more linked to each other within the network than it was expected for two random sets. Interestingly, the degree to which s proteins are linked to c proteins (as measured by 〈dcs〉) was approximately the same as for s proteins alone (Fig. 2). This result suggests that signaling proteins are intimately linked to the cytoskeleton.
The distance distribution, P(d) (Fig. 2), gives a global measure of clustering. To gain information about the local composition of the interaction networks, we calculated the local clustering index, md(x/y) (see Materials and Methods). This metric characterizes the proportion of proteins at distance d from a given protein in the x class that are members of the protein class y. In Fig. 3 we plot the average clustering index 〈md(x/y)〉=md(x/y)/N (with N being the total number of proteins in the network) for the various protein classes. This analysis indicates that, at short distances, signaling proteins and cytoskeletal proteins interact primarily with proteins of the same class. Notice that 〈md(c/c)〉 decays fast as a function of distance and at d≈4 practically reaches its asymptotic value, indicating again that the networks derived from the U and D databases are consistent with our independent definition of the set of cytoskeletal proteins.
In the absence of any clustering tendency of proteins from two different classes (x and y) the local clustering index 〈md(x/y)〉 should be independent of distance and should be equal to the average density of the y proteins in the network 〈mrand(x/y)〉=Ny/N, where Ny denotes the total number of proteins that belong to class y. By contrast, if proteins belonging to the x and y classes have a tendency to cluster, then 〈mrand(x/y)〉 should be higher than Ny/N for small values of d, should decrease monotonically and converge to a value smaller (possibly zero) than Ny/N for large d values. These expectations are indeed supported by the plots in Fig. 3. For example, using the DIP dataset, the proportion of s proteins connected by a single link to a c protein (red curve at d=1) is almost three times greater than the same quantity evaluated by replacing the c protein by a randomly selected protein (magenta curve at d=1). Furthermore, this proportion is about six times higher than the proportion of s proteins linked to the cytoskeleton by six or more bonds (red curve at d=6). Similar relationships are seen for the proportion of c proteins that are linked to s proteins by few bonds compared to many bonds (green curve), whereas analysis of random protein sets shows the predicted flat distribution.
Notice that, because the protein classes c and s contain different number of proteins and the local clustering index is affected by the proportion of proteins in each class within the entire network, it was necessary to plot rescaled values of the clustering indices 〈md(x/y)〉/mrand. The values of rescaled clustering indices are smaller than one already for d=8 (the largest distance is shown in Fig. 3), indicating that at large distances, there is no preferential interaction between proteins within the c and s classes.
To further address linkage between signaling and cytoskeletal proteins by using the local clustering index, we compared the nearest-neighbor clustering indexes 〈m1(x/y)〉 that were calculated for all s and c proteins. To determine whether by this analysis s proteins are more closely linked to c proteins, it was necessary to compare m1 of these groups to m1 of randomly chosen proteins. The classes of randomly chosen proteins were termed the pseudo c and pseudo s classes and they contained as many randomly selected proteins as there are c and s proteins in the largest interconnected clusters of the employed protein interaction maps.
In Fig. 4 we summarize the results of this comparison. For the c proteins, 〈m1(c/c)〉 is about an order of magnitude larger for the true cytoskeletal class than for its pseudo analogue, which might reflect our definition of the c class. However, the difference between the true and pseudo classes remains consistently large (around a factor of three) for all the other combinations of the x and y proteins, independently of the dataset used. These results indicate that, at least within the datasets used, the clustering tendency of the c and s proteins and the correlation of the two classes are inherent properties of these proteins.
The special role of the cytoskeleton in signaling networks
The results in Figs 2, 3, 4 suggest that the cytoskeleton and signaling networks are linked. However, this might fortuitously result from the limited nature of the interactions detected by the datasets used. To address this possibility, we studied the correlation between the class of signaling proteins and 15 other functional protein classes as defined by the MIP database (Mewes et al., 2002). We calculated local clustering indices for signaling proteins of each of the other 15 classes of proteins: 〈md(s/i)〉/mrand (i=0 to14), where i denotes the number of the functional protein class (specified in the legend to Fig. 5). As shown in Fig. 5, the nearest-neighbor clustering index (m1) for s proteins to c proteins [2.83(U) and 6.68(D)] is almost twofold higher than to the next most closely linked class of proteins (class 2 in Fig. 5), that are involved in cell growth, cell division and DNA synthesis [1.54(U) and 3.9(D)]. These results confirm that the cytoskeleton plays a distinguished role in the organization of the signaling network of cells.
The cytoskeleton represents a global structure, spanning the entire cell. Thus, its association with various functional protein classes (in particular with the signaling network) could be expected. To see whether our analysis is consistent with this expectation, we repeated the above calculation for 〈md(c/i)〉/mrand, the local clustering index of the cytoskeletal proteins, and plotted the results in Fig. 6. Indeed, as the comparison of Figs 5 and 6 reveals, the association of the c proteins with the 15 functional protein classes defined in the MIPS database is quite uniform, suggesting that signaling proteins have no special role in the organization of the cytoskeleton. This is particularly well reflected by the values of m1. The nearest-neighbor clustering index for the c proteins to the s proteins [〈m1(c/s)〉] is much closer to the analogous quantity of the c proteins to the proteins in class 2 [〈m1(c/2)〉], than the corresponding quantities with c replaced by s: 〈m1(c/s)〉/〈m1(c/2)〉 is 44% (U) and 61% (D) smaller than 〈m1(s/c)〉/〈m1(s/2)〉.
The quantitative analysis presented here, suggests that the topological properties of intracellular signaling pathways within the protein interaction network of S. cerevisiae are strongly dependent on the cytoskeleton. This linkage was even more evident when only those cytoskeletal and signaling proteins were analyzed, that are connected to each other exclusively through c or s proteins. The corresponding subnetwork derived from the U database is shown in Fig. 7. All proteins that directly connect the two classes are unusual in that they have the highest number of links (at least four). They are hubs and are distributed throughout the network, indicating that the cytoskeleton and the set of signaling molecules are linked in a global manner.
The protein interaction networks analyzed here are examples of scale-free networks (Barabasi and Albert, 1999; Jeong et al., 2001; Jeong et al., 2000) that are simultaneously tolerant to random errors and fragile against the removal of the most connected nodes or hubs (Albert et al., 2000). To investigate the significance of the hubs in the present context we removed all signaling proteins that link the signaling subnetwork to the cytoskeleton (23 of the 28 hubs). The resulting interaction map (with only those proteins shown that have at least one connection) is plotted in Fig. 8. The total collapse or fragmentation of the signaling network (as seen in Fig. 8) strongly suggests that without communication with the cytoskeleton the signaling apparatus of the cell cannot properly function.
It is perhaps not surprising that a large number of the most connected hubs in the subnetwork were identified as being members of both the cytoskeleton and the signaling subsets. Some of these proteins, such as the yeast WASP homolog Las17p and the yeast PAK1 kinase homolog Cla4p, are well-characterized regulators of the cytoskeleton and coordinate cytoskeletal dynamics with changes in cell growth, division, and mating. Other hubs provide crucial (possibly the only) connections between two parts of the signaling network. For example, Akr1p, an ankyrin repeat-containing cytoskeletal protein, provides a pathway in this network to transmit a signal from Gcs1p and Ste3p to other components of the mating pathway (Ste4p, Ste5p and Ste18p).
The analysis presented here provides quantitative evidence for the long-standing hypothesis that the cytoskeleton participates in an important way in intracellular signal transduction. How might the cytoskeleton be used in signal transduction pathways? The results of the network analysis suggest that the cytoskeleton is involved in at least two ways. First, individual proteins of the cytoskeleton might participate directly in signal transduction by linking two or more signaling proteins. One implication of this role is that the cytoskeleton might provide alternative signal transduction routes so that there are multiple pathways to transduce a signal. Second, the cytoskeleton might provide a macromolecular scaffold, which spatially organizes components of a signal transduction cascade (Park et al., 2003). This would be analogous to the role of molecular scaffolds, such as the yeast Ste5 protein, that tether multiple components of a pathway to promote signal transduction between them. The analysis presented here suggests that, during eukaryotic evolution, signaling pathways have incorporated components and features of the cytoskeleton as their integral parts and this might be a general feature of eukaryotic intracellular signal transduction networks.
We thank Erfei Bi, Margaret Chou and László Barabási for assistance with protein classification and for helpful discussions. Research in the laboratories of the authors is supported by grants from the National Institutes of Health (C.G.B. and P.A.J.), the National Science Foundation and the National Aeronautics and Space Administration (G.F.) and KOSEF (H.J., Grant No. R08-2003-000-10285-0).
Supplemental data available online
- Accepted January 26, 2004.
- © The Company of Biologists Limited 2004