Eukaryotic gene expression can be viewed within a conceptual framework in which regulatory mechanisms are integrated at three hierarchical levels. The first is the sequence level, i.e. the linear organization of transcription units and regulatory sequences. Here, developmentally co-regulated genes seem to be organized in clusters in the genome, which constitute individual functional units. The second is the chromatin level, which allows switching between different functional states. Switching between a state that suppresses transcription and one that is permissive for gene activity probably occurs at the level of the gene cluster, involving changes in chromatin structure that are controlled by the interplay between histone modification, DNA methylation, and a variety of repressive and activating mechanisms. This regulatory level is combined with control mechanisms that switch individual genes in the cluster on and off, depending on the properties of the promoter. The third level is the nuclear level, which includes the dynamic 3D spatial organization of the genome inside the cell nucleus. The nucleus is structurally and functionally compartmentalized and epigenetic regulation of gene expression may involve repositioning of loci in the nucleus through changes in large-scale chromatin structure.
Introduction
The genome sequences of an increasing number of organisms are now known. Within the draft sequence of the human genome (McPherson et al., 2001; Venter et al., 2001), most protein-coding genes and a limited number of RNA genes have been identified: together close to 35,000 genes. This number will increase, because we probably underestimate the number of genes that encode RNAs, of which many may be involved in gene regulation (Marker et al., 2002). How the controlled expression of the tens of thousands of genes in a genome is orchestrated is a pressing question. This question is difficult to answer owing to several fundamental problems, including the following: (1) gene expression is controlled by regulatory systems that act at different hierarchical levels, and we are only beginning to appreciate how they are integrated; (2) gene regulation involves precisely controlled changes in chromatin structure, which are difficult to analyse in vivo; and (3) control of gene expression in a particular cell depends not only on DNA sequences but also on the history of that cell [e.g. during embryonic development, epigenetic memory mechanisms operate (Turner, 2002)]. A major challenge is to unravel the language and syntax of the system that is responsible for coordinated expression of so many genes. Here, we discuss several aspects of hierarchical gene control in eukaryotes.
To explore the problem of how the genome functions it is useful to discriminate three regulatory levels (Fig. 1). Level 1 is the sequence level, i.e. the 1D organization of functional sequence elements in the genome. This level includes all coding regions, the wide variety of regulatory sequences that bind sequence-specific protein factors, and sequence elements that may have a role in determining the 3D folding of the chromatin fibre. Level 2 is the chromatin level and reflects the different functional states chromatin can adopt in relation to genome function. Classically, one might think about chromatin structure in terms of euchromatin and heterochromatin. However, it is likely that the chromatin fibre can exist in many more different structural and functional states, which exhibit different core histone compositions (Meneghini et al., 2003) and post-translational modifications of histones (the `histone code') (Turner, 2002). Level 3 is the nuclear level, the 3D structure and functional compartmentalization of the genome inside the interphase nucleus. A classical example is the nucleolus, in which the rRNA-coding gene clusters of several chromosomes are brought together to create a subnuclear domain that is dedicated to rRNA synthesis and processing and pre-ribosome synthesis (Pederson and Politz, 2000). Another example is the clustering of heterochromatin at particular regions - for example, near the nuclear envelope.
The sequence level
The best-studied level of gene regulation is that of the individual gene, involving cis-acting elements, such as promoters, enhancers and silencers, in addition to trans-acting factors, including DNA-binding transcription factors, cofactors, chromatin-remodelling systems and RNA polymerases (Fig. 1). This complex machinery is tightly coupled to RNA processing and RNA transport (Maniatis and Reed, 2002). A fundamental question is whether the activity of genes is controlled individually or whether genes are clustered and major regulatory decisions are made at the cluster level. A limited number of gene clusters have been studied in detail - for example, the α-globin genes, β-globin genes, histone genes and Hox genes. Each cluster encodes a set of functionally related proteins, and the genes are expressed in a coordinated manner. Specific regulatory elements - for example, locus control regions (LCRs), have been found to regulate gene activity at the level of the cluster - in addition to those that control expression of the individual genes (Festenstein and Kioussis, 2000; Ho et al., 2002; Li et al., 2002). Although clusters of functionally related genes are relatively rare, growing evidence indicates that many genes are clustered not according to their function but to when they are expressed - for example, in a specific differentiation state.
Expression profiles in Drosophila support the notion of gene clusters (Oliver et al., 2002; Spellman and Rubin, 2002). Under a wide variety of different experimental conditions, groups of on average 20 adjacent genes (representing domains of 20 to 200 kb) display very similar expression profiles, which suggests that they are co-regulated. Remarkably, the genes in such clusters have no obvious functional relationship, except that they are expressed under the same conditions. About 20% of genes analysed are present in co-regulated clusters. Analysing the expression of testes-specific genes in Drosophila, Boutanaev et al. have drawn similar conclusions about gene clustering (Boutanaev et al., 2002). Co-expressed muscle-specific genes of C. elegans are also arranged in clusters of two to five genes (Roy et al., 2002). These observations may be just the tip of the iceberg, since it is likely that regulation at the level of gene clusters primarily represents switching of chromatin domains from a transcriptionally repressive state (i.e. epigenetically silenced) to a state that is permissive for transcription (see the section on the chromatin level, below). Switching does not necessarily induce transcriptional activation of the genes in the cluster; rather it prepares them for regulation at the level of individual genes. The expression profiling approaches that have been used in identifying gene clusters (Boutanaev et al., 2002; Oliver et al., 2002; Spellman and Rubin, 2002) pick up only clusters in which all genes are transcriptionally activated (or inactivated) simultaneously. Gene clusters that behave this way may be relatively rare. These data nevertheless indicate that eukaryotic genomes are functionally compartmentalized and that clusters of genes can be co-activated and co-repressed.
How are gene clusters regulated? At least three classes of genomic element should be involved in the control of cell-type-specific expression of gene clusters. First, a cluster-control element that is responsible for switching the genomic domain between its active and inactive state should be present. Such regulatory elements may recruit histone-modifying enzymes, causing a change in the functional state of chromatin in the domain (see below). LCRs fulfil this role. These often-complex sequence elements can switch genomic domains containing one or more genes to a state that allows transcription to be controlled by their respective promoters. LCRs have been identified in a variety of loci (Li et al., 1999; Li et al., 2002). Activation of a genomic domain by such control elements is necessary for activation of individual genes in the cluster but may not be sufficient. The second class of element comprises the enhancers and promoters that decide the activity of individual genes within a cluster. Finally, the third kind of regulatory sequence is the boundary elements (also called insulators) that separate gene clusters. These limit the range of control of long-distance regulatory elements in the cluster, such as LCRs, silencers and enhancers (Gerasimova and Corces, 2001; Labrador and Corces, 2002; Schedl and Broach, 2003; West et al., 2002).
The best-studied gene cluster is the β-globin gene cluster in man, mouse and chick (Levings and Bungert, 2002). Although this cluster is exceptional, since the transcription rates in blood cells are extremely high, it nevertheless has several of the elements mentioned above. An LCR upstream of the cluster switches the locus from a closed and inactive chromatin configuration to a more open one, allowing transcriptional activation (Festenstein and Kioussis, 2000; Li et al., 2002). In addition, each of the genes has its own promoter, which directs developmentally regulated expression of the gene. Furthermore, boundary elements that flank the gene cluster have been identified (Farrell et al., 2002). The example of the β-globin cluster shows that cells regulate genes at different hierarchical levels. Not all gene clusters are controlled in the same way, however. In common with the β-globin cluster, expression of the α-globin genes is developmentally controlled. In contrast to the β-globin genes, the α-globin cluster is in a typical euchromatic environment, i.e. one that is gene rich and CG rich, early replicating and nuclease-sensitive in non-haematopoietic cells (Brown et al., 2001). Moreover, in human lymphocytes, which do not express haemoglobin, the β-globin locus is associated with pericentromeric heterochromatin, whereas the α-globin locus is not (Brown et al., 2001). The α-globin cluster does not appear to switch between a heterochromatic and a euchromatic state, but rather has a permanently open conformation. Possibly, α-globin genes are regulated exclusively at individual gene level, helped by a locus-specific upstream regulatory element (Vyas et al., 1992). The cell thus evidently uses different control mechanisms for different genomic domains.
One approach to identify regulatory elements that control gene expression at the level of gene clusters is to compare genomic sequences of different organisms, assuming that functional gene clusters and related regulatory elements tend to be evolutionary conserved. Comparison of the human and mouse sequences has already revealed a variety of conserved non-coding sequences that are candidates for such regulatory elements (Hardison, 2000; Mural et al., 2002; Shabalina et al., 2001; Ureta-Vidal et al., 2003). An alternative route is the identification of clusters of transcription-factor-binding sites that are typically a few hundred base pairs in size (Berman et al., 2002). Finally, once boundary elements can be identified on the basis of nucleotide sequence alone, which is currently not possible, their distribution in the genome will reveal much about the one-dimensional functional organization of eukaryotic genomes.
Versteeg and co-workers (Caron et al., 2001) have shown that highly active genes tend to be organized in large clusters (many megabases in size) on various chromosomes. Moreover, housekeeping genes have been found to form several large clusters in the mammalian genome (Lercher et al., 2002). Surralles et al. (Surralles et al., 2002) have shown that transcription-coupled DNA repair (TCR) is very prominent in specific chromosomal domains. Since TCR is tightly associated with active genes, it is likely that such domains overlap those containing highly active genes. Given the size of these clusters (many megabases per cluster, representing hundreds of genes), they are unlikely to correlate with individual regulatory units. Rather, this level of clustering might be related to nuclear compartmentalization - for instance, the creation of nuclear domains that are particularly suited for high rates of gene expression and RNA processing (see below).
The observations discussed above show that genes are arranged far from randomly on the linear genome. We are beginning to see the functional implications of this. A better understanding of this level of genome function is essential if we are to understand the orchestration of gene expression in eukaryotes.
The chromatin level
In eukaryotic cells nuclear DNA is packed as chromatin (Ridgway and Almouzni, 2001). The basic chromatin unit is the nucleosome, consisting of about 150 base pairs of DNA tightly wrapped 1.7 times around a protein octamer. The octamer comprises two of each of the highly evolutionary conserved core histones H2A, H2B, H3 and H4. Adjacent nucleosomes are connected by 30-40 base pairs of linker DNA that is more accessible than DNA in direct contact with the core histones. Each chromosome consists of a single huge nucleosomal fibre. Its 3D folding is dynamic rather than static. For instance, the nucleosomal fibre is densely packed in metaphase chromosomes, whereas in interphase it is on average about five times less compact (Manders et al., 2003). Mounting evidence indicates that the folding of the nucleosomal array in the interphase nucleus is an important element in genome function. Chromatin can occur in various structural and functional states. Transitions between chromatin states are tightly linked to changes in gene activity (Eberharter and Becker, 2002) and are causally related to changes in covalent modification of core histone proteins (Strahl and Allis, 2000).
A considerable number of post-translational modifications of core histones have been identified. Thirteen lysine residues in the N-terminal tails of the four histone proteins can be reversibly acetylated, three serine residues in histones H3 and H4 can be phosphorylated, six lysine side-chains and three arginine residues on histones H3 and H4 can be methylated, and two residues in the C-terminal domains of histones H2A and H2B can be ubiquitylated (Jenuwein and Allis, 2001; Lachner et al., 2003). Most of these modifications are carried out by enzymes that are recruited by sequence-specific DNA-binding proteins. Histone modifications specify functional states of chromatin in a combinatorial manner (Turner, 2002). For instance, hyperacetylation is related to transcriptionally active and relatively open chromatin (Eberharter and Becker, 2002), whereas methylation of Lys9, Lys27 and Lys35 of histone H3 is linked to transcriptional silencing and chromatin compaction, often called heterochromatinization (Lachner and Jenuwein, 2002). To make things more complicated, methylation of histone H3 at Lys4 is associated with transcriptional activity, (SantosRosa et al., 2002; Zegerman et al., 2002). Together, this set of post-translational modifications represents the histone code (Jenuwein and Allis, 2001; Strahl and Allis, 2000; Turner, 2002), which adds an extra level of information to the genome that is linked to DNA sequence only indirectly. It acts as an epigenetic memory of what happened to the cell lineage in the past, which is important in embryonic development and cell differentiation (Turner, 2002).
Histone acetylation and phosphorylation are rapidly reversible. The in vivo acetylation level of nucleosomes, and therefore overall gene activity, has been shown to represent a steady state, depending on the relative activities of histone acetyltransferase (HAT) and histone deacetylase (HDAC) enzymes (Im et al., 2002; KatanKhaykovich and Struhl, 2002). Both types of enzyme can be recruited to specific sites in the genome by sequence-specific transcription factors (Ng and Bird, 2000; Roth et al., 2001). Interestingly, histone lysine methylation is probably not reversible, since no demethylase has been found so far, and the modification is chemically stable (Bannister et al., 2002). Therefore, it seems that silencing by histone methylation is reversed only by the slow process of exchange of histone proteins in nucleosomes (Kimura and Cook, 2001) or DNA replication. Also, specific factors that locally speed up the release of methylated histones and allow their replacement with other histone molecules might exist (Ahmad and Henikoff, 2002).
One other chromatin modification that is closely related to gene silencing is DNA methylation by DNA methyltransferases (Geiman and Robertson, 2002). Coupling of DNA methylation, histone deacetylation and methylation of histone H3 Lys9 results in heterochromatinization and gene silencing (Grandjean et al., 2001; Gregory et al., 2001b; Jackson et al., 2002; Johnson et al., 2002; Kouzarides, 2002; Satoh et al., 2002; Soppe et al., 2002; Tamaru and Selker, 2001). Chemical modification of histones and DNA are probably different aspects of one and the same regulatory system.
Regulation of gene expression at the histone code level creates a regulatory system that switches chromatin domains between different functional states. Do other regulatory mechanisms exist? One group of candidates is the multisubunit ATP-dependent chromatin-remodelling machines (Langst and Becker, 2001; Peterson, 2002). Members of this enzyme family can rearrange nucleosomes in the chromatin fibres in an ATP-dependent manner. The role of remodelling at the promoter level, involving a small number of nucleosomes, has been investigated in detail (Neely and Workman, 2002). There is evidence that remodelling enzymes are also involved in the unfolding of large chromatin domains (Peterson, 2003). Interestingly, linker histones, which are often associated with condensed inactive chromatin, are potent inhibitors of chromatin remodelling, which creates a functional link between the chromatin state and the remodelling machinery (Horn et al., 2002). Remodelling enzymes also play a role in chromatin condensation. Mutations in the Drosophila ISWI gene, which encodes a member of the SWI/SNF ATP-dependent remodelling family, produce a less compact male X-chromosome (Corona et al., 2002). ISWI-mediated remodelling is probably responsible for regular spacing of nucleosomes on the chromatin fibre, allowing generation of more tightly packed chromatin. These observations show that switching chromatin between active and inactive states can be controlled by different mechanisms - changes in histone modification or remodelling. Both systems, however, do appear to be tightly interlinked (e.g. Corona et al., 2002; Neely and Workman, 2002).
All this adds to the classical regulatory mechanisms provided by transcription factors. There is considerable crosstalk between these two regulatory levels because at least in part they use the same set of cis- and trans-acting components. For instance, HAT activity is often recruited through transcription factors that bind to promoters and/or enhancers, often in combination with ATP-dependent chromatin-remodelling enzymes (Gregory et al., 2001a; Mizuguchi et al., 2001). In addition, many other genomic regulatory elements act by recruiting enzymes involved in chromatin modification. For instance, LCRs bind HATs (Ho et al., 2002; Levings and Bungert, 2002), and silencing by Polycomb group (PcG) proteins involves HDAC activity and histone H3 Lys9 and Lys27 methylation (Cao et al., 2002; Muller et al., 2002; Sewalt et al., 2002; Tie et al., 2001). These histone modifications not only occur locally but can spread along the chromatin fibre, thereby inducing a change in the functional state of a complete chromatin domain containing one or more genes (Fig. 1 and see below). This property, which is discussed below, is likely to be a key element in the regulation of complex genomes.
The relationship between the sequence level and the chromatin level
It is a formidable task for a cell to coordinate the expression of thousands of genes. In practice, for each genome a limited number of useful gene expression repertoires exist, each corresponding to a specific differentiation state. Although precise numbers are lacking, it is reasonable to assume that in each particular stably differentiated cell type, only a relatively small fraction (10-30%) of genes can be transcribed at any given time. The eukaryotic genome seems to be hard wired so that only a limited number of stable states can exist, each characterized by a specific repertoire of expressible genes.
How does the histone code allow the cell to orchestrate gene activity? Histone-modifying enzyme activities are recruited by sequence-specific transcription factors to the correct genomic loci (Jenuwein and Allis, 2001; Turner, 2002). Somehow, the histone modification state and, consequently, the functional state of chromatin, seems to spread in cis along the chromatin fibre (Fig. 1). Elegant studies of Noma et al. (Noma et al., 2001b) on the 20 kb silent-mating-type locus of fission yeast suggest that both the transcriptionally activated state (characterized by histone acetylation and histone H3 Lys4 methylation) and the silenced state (characterized by histone H3 Lys9 methylation) spread in this way. Inactivation of the locus is initiated by a centromere-like repeat and spreads in both directions until boundary elements are reached (Hall et al., 2002). The hyperacetylated state of chromatin can also spread along the chromatin fibre (Forsberg and Bresnick, 2001; Ho et al., 2002). The molecular mechanism of spreading in cis is not known, but may involve tracking of histone-modifying enzymes along the chromatin fibre. Evidence indicates that transcription by RNA polymerase I is involved in spreading of the heterochromatin state of the rRNA gene cluster along the chromatin fibre (Buck et al., 2002). Similarly, intergenic transcription, as observed in the β-globin gene cluster and the Hox gene cluster, by RNA polymerase II is probably involved in changing the histone modification state and therefore the functional state of the gene clusters (Plant et al., 2001; Rank et al., 2002). LCRs are candidates for genomic sequence elements that trigger switching of the functional state of chromatin domains and may be starting points for intergenic transcription (Gribnau et al., 2000; Ho et al., 2002; Plant et al., 2001).
Boundary elements (insulators) constitute a currently only operationally defined heterogeneous class of genomic element and may act as boundaries of chromatin domains and at least in some cases limit spreading along the chromatin fibre (Cuvier et al., 2002; Labrador and Corces, 2002; Noma et al., 2001) (Fig. 1). They block in cis the activating and inactivating effects of long-range regulatory elements, such as enhancers, silencing induced by PcG proteins and heterochromatin protein 1 (HP1) (Van der Vlag et al., 2000; West et al., 2002), and LCRs (West et al., 2002). Boundary elements delimit functional genomic domains, such as the β-globin gene cluster and the fission yeast mating-type cluster (West et al., 2002). Moreover, they are able to shield transgenes from inhibitory position effects (RecillasTarga et al., 2002).
There is thus increasing evidence that the linear genome is functionally compartmentalized and that genomic domains containing one or more genes are regulated independently. Domain activation and inactivation are achieved through recruitment of the relevant complement of histone-modifying enzymes and a switch in functional state of a domain probably involves spreading of the histone modification state along the chromatin fibre until a boundary element is encountered.
The nuclear level
So far, we have treated the genome as a `simple', linear coding system in which regulatory elements act only in cis. In the cell nucleus, however, chromatin is in a highly folded state that brings together loci that are far apart on the linear genome and may even be located on different chromosomes. It is likely that regulation of genome function at the nuclear level can occur in trans. Transvection is an example. Here, regulatory elements that control expression of one allele (e.g. an enhancer) functionally, and probably physically, interact with the promoter of the allele on the homologous chromosome (Duncan, 2002; Pirrotta, 1999; Wu and Morris, 1999). Homologous pairing may be involved, providing a good example of in trans interactions (Fung et al., 1998; Sass and Henikoff, 1999). Our knowledge of molecular mechanisms of genome function at this level is still very limited.
The interphase nucleus is functionally compartmentalized (CarmoFonseca, 2002; Francastel et al., 2000). Many different domains have been identified, each having a specific function and macromolecular composition (Spector, 2001). There is considerable evidence that the architecture of the nucleus is closely related to genome function. The position of a gene inside the nucleus is important: some areas are repressive, whereas others promote or even boost transcription. For instance, targeting of a gene to the periphery of the yeast cell nucleus causes silencing (Andrulis et al., 1998). Repositioning of genes to pericentromeric chromatin during lymphocyte differentiation in mice correlates with epigenetic inactivation of these loci, suggesting that contact with centromeric heterochromatin results in spreading of the inactivated state in trans (Brown et al., 1997; Fisher and Merkenschlager, 2002).
What DNA sequence elements and proteins control higher-order chromatin structure in relation to gene activity? Analysis of position effects in transgene expression has provided some insights (Mishra and Karch, 1999; Udvardy, 1999). Sequence elements such as boundary elements (Udvardy, 1999), LCRs (Li et al., 2002) and possibly certain types of enhancer (Francastel et al., 1999) can overcome, in part, silencing owing to association with heterochromatin in cis and/or in trans by enforcing a functional state of a chromatin domain that allows its genes to be transcribed (Dillon and Sabbattini, 2000).
A general problem in investigating the relationship between nuclear organization and function is that we lack tools to manipulate nuclear structure. For instance, we cannot reposition at will individual genomic loci, induce changes in local chromatin structure, or interfere with the positioning or assembly of subnuclear domains and subsequently analyse their effects on gene expression. Therefore, most results so far have allowed us to observe correlations rather than causal relationships. This is a major problem in this field. Evidently, we first must identify the molecular components that determine the different aspects of nuclear structure, including chromatin structure and subnuclear domains, before we can interfere with their function and analyse effects on gene expression. Several attempts to tackle this problem are underway. The approaches taken include interfering with the function of specific proteins and biochemical isolation of specific nuclear domains and subsequent identification of all their constituent proteins. These experiments are revealing several aspects of nuclear architecture that can be related to gene expression. Some of them are addressed below.
Chromosome territories
Individual chromosomes are readily visible only during metaphase. In contrast, classical light and electron microscopy of interphase nuclei do not reveal the position, size and shape of chromosomes. In situ hybridization (ISH), using chromosome-specific probes, has shown in many different cell types and organisms that each interphase chromosome occupies a unique, relatively compact, volume in the nucleus (Cremer and Cremer, 2001). There is only limited intermingling of chromatin from different chromosomes (Visser et al., 2000).
What is the molecular basis for the formation of chromosome territories? The simplest answer is that chromosomes in interphase have decondensed only to a limited extent after mitosis. This would preclude significant intermingling of chromatin from different chromosomes (Visser et al., 2000). In such a scenario, only chromosomal stretches that are strongly decondensed can loop out and intrude into other chromosome territories (Mahy et al., 2002a; Volpi et al., 2000). The observation that the bulk of interphase chromatin is only five times less condensed than in metaphase chromosomes is consistent with this notion (Manders et al., 2003).
Despite the relative compactness of their chromatin, chromosome territories are open structures. Confocal light microscopy and electron microscopy after thin sectioning showed that there is a considerable volume of interchromatin space inside chromosome territories (Cmarko et al., 1999; Verschure et al., 2002; Verschure et al., 1999; Visser et al., 2000). This is not a fixation artefact, because living cells in which chromatin is labelled by incorporation of GFP-histone H2B into chromatin show a spatial distribution of chromatin similar to that in fixed cells (Hendzel et al., 2001; Politz et al., 1999; Verschure et al., 1999). The interchromatin space constitutes a highly convoluted set of channels, which allow diffusion of macromolecular components through the nucleus, including hnRNP particles (Politz et al., 1999). This configuration is dynamic, since interphase chromatin is quite mobile and shows constrained diffusion over distances up to a few tens of a micrometer (Gasser, 2002). Transcription sites are concentrated near the surface of compact chromatin domains in the interchromatin space (Cmarko et al., 1999; Fakan, 1994; Verschure et al., 1999). They probably represent chromatin loops that extend into interchromatin space, emanating from the compact chromatin domain (Fig. 1). Interestingly, Cmarko et al. (Cmarko et al., 2003) have presented evidence that PcG-silenced loci are located predominantly in the same perichromatin area. Apparently, transcriptionally active and silenced loci are physically close together in the same subnuclear compartment, i.e. the perichromatin domain.
The structure of interphase chromosomes in Arabidopsis has been analysed in detail. Here, centromeric heterochromatin acts as an organizing centre on which the gene-rich chromosomal arms fold back to form multiple loops, forming a rosette-like structure (Fransz et al., 2002). The much larger mammalian chromosomes have a more complex structure and contain more non-coding DNA. They might have a similar structural organization, possibly having more than one heterochromatin-organizing centre per chromosome and forming multiple rosette structures.
Large-scale chromatin structure and gene regulation
Transcriptional activation correlates with chromatin decondensation (Fig. 1). This is particularly striking when an array of genes is activated. The chromosomal fibre unfolds extensively, occasionally looping out away from the body of the chromosome territory (Mahy et al., 2002b; Tumbar et al., 1999; Volpi et al., 2000; Ye et al., 2001). Several studies show that chromatin decondensation alone is not sufficient for transcriptional activation, however (Nye et al., 2002; Tumbar et al., 1999; Volpi et al., 2000). This is consistent with the notion that regulation at the chromatin domain level prepares DNA for regulation at the level of individual genes, but does not necessarily transcriptionally activate them.
Although the relationship between decondensation of chromatin and transcriptional (pre)activation is well-established, our understanding of the process at the molecular level is limited. What is chromatin decondensation? Obviously, the average distance between the nucleosomes in a 3D chromatin domain increases. Does this mean that non-covalent interactions between nucleosomes in the compact structure are disrupted? Possibly, specific proteins that are involved in locking chromatin in its condensed state are released. This may be true for HP1 during the transition from heterochromatin to euchromatin. A pertinent question is what is the default state of the nucleosomal fibre under the conditions that exist in the interphase nucleus. Does chromatin (DNA plus core histones) alone form an open structure that requires special factors to compact, or does it self-assemble into a compact structure that requires special molecules and/or chemical modifications to open up? It is tempting to speculate that both are true, i.e. chromatin needs special modifications/proteins to generate its transcriptionally permissive state and others to stabilize its inactive state. Nucleosomal arrays would thus exist in a labile intermediate state if no other proteins are present and the histones are unmodified. Indeed, chromatin is forced to condense by specific histone modifications (e.g. methylation of lysine 9 of histone H3) and subsequent binding of proteins, such as HP1 (Grewal and Elgin, 2002; Turner, 2002). Other chromatin-binding proteins and histone modifications, however, may do the opposite, i.e. make chromatin structure more open - for example, histone acetylation, methylation of histone H3 on lysine 4 and ATP-dependent chromatin remodelling (Memedula and Belmont, 2003).
Spatial arrangement of chromosome territories in the nucleus
There seems to be no specific order in the arrangement of chromosomes inside the eukaryotic nucleus, except that gene-dense chromosomes more often reside in the centre, whereas gene-poor chromosomes reside more frequently in the periphery (Cremer et al., 2001). This arrangement is evolutionary conserved (Tanabe et al., 2002). Transcription itself might thus be responsible for the observed radial distribution. It is not clear whether chromosome territories constitute a type of compartmentalization that is essential for proper genome function. The fact that chromosomes can exchange large fragments without impairing cellular function, provided that no loss or gain of function is induced at the chromosomal breakpoints, suggests that the radial organization of chromosomes is not essential.
The link between the nuclear level and the chromatin level
How is the 3D organization of the nucleus, and in particular its chromatin, related to the 1D structure of the genome? And to what extent is the spreading of active and inactive states of chromatin, which is important in in cis regulation, also important at the 3D level, i.e. in trans. There are indications that in trans spreading of chromatin states does occur. For instance, pairing of the Drosophila brown locus with a mutant allele that contains a large block of heterochromatin results in inactivation of the wild-type allele, which probably involves its heterochromatinization. The inactive chromatin state may thus be transmitted in trans (Csink and Henikoff, 1996; Dreesen et al., 1991; Sass and Henikoff, 1999), which adds an additional level of complexity to the dynamic organization of the eukaryotic genome. Is the molecular mechanism of in trans spreading different from spreading in cis? And what limits spreading in trans in the 3D space of the nucleus? Could insulators play a role, or should other mechanisms be invoked that limit spreading in trans?
Another question concerns the relationship between functional domains of the linear genome, i.e. gene clusters (see above). Such domains may reflect chromatin loops attached at their base to some kind of scaffold. It has been suggested that a fibrous intranuclear network, the nuclear matrix, fulfils this role. However, this concept is controversial (Nickerson, 2001; Pederson, 2000). An alternative view is advocated by Corces and co-workers (Labrador and Corces, 2002), who have suggested that boundary elements delimit functional chromatin domains in the nucleus by forming higher-order 3D structures. Recently, Blanton et al. (Blanton et al., 2003) have produced evidence that different insulator-binding proteins interact, suggesting that the insulator-bounded domains form loops in the nucleus. Finally, chromatin itself might form a compact structure in the nucleus, from which chromatin loops emanate. Such a model is consistent with the observation that transcription takes place at the surface of relatively condensed chromatin domains (Cmarko et al., 1999; Fakan, 1994; Verschure et al., 1999). It is conceivable that intergenic, non-coding chromatin has this structural role.
Recently Versteeg and co-workers showed that genes that are highly expressed in a large number of different human cell types are strongly clustered on a number of chromosomes (Caron et al., 2001). Such clusters are large, comprising hundreds of genes present over many megabases. The clusters of highly expressed genes not only contain housekeeping genes (Lercher et al., 2002), but also loci expressed in specific cell types that, if expressed, display high transcriptional activity. Some of these clusters may correspond to chromatin that forms giant loops bulging out of chromatin territories (Mahy et al., 2002a; Volpi et al., 2000). For instance, the MHC locus on human chromosome 6 represents a large cluster of highly transcribed genes that forms a loop upon transcriptional activation (Volpi et al., 2000). This indicates that inside the cell nucleus genes that are transcribed at high rates form special domains or compartments, creating a distinct link between the 1D organization of the genome and the 3D structure of the nucleus.
Perspective
Whatever the link between nuclear architecture and regulation of gene expression is, it is an important but still poorly understood aspect of genome structure and function (O'Brien et al., 2003). Understanding its molecular basis will be essential if we want to understand the orchestration of gene expression in eukaryotes.
What therefore are the major questions that we must answer in order to begin to understand how the genome functions in the confined space of the cell nucleus? The most prominent one concerns the relationship between 3D chromatin structure and function. We lack information about the static and dynamic 3D arrangement of nucleosomes in different types of chromatin. It is attractive to start from the idea that chromatin can adopt a limited set of different spatial configurations reflecting different functional states and that gene regulation in part is due to the controlled switching between these defined states. This process appears to be closely related to histone modification and is ultimately encrypted in the DNA sequence itself.
Chromatin structure must be studied in the living cell, since it probably changes dramatically if the cell is disrupted. A breakthrough has been the development by Belmont and coworkers of methods for GFP tagging of specific genomic sites (Belmont, 2001). As the spatial resolution of light microscopy is approaching the size of a few nucleosomes, there is hope that detailed information about chromatin structure can be obtained in the next few years (Esa et al., 2000; Kano et al., 2002).
We also need quantitative models of genome function that link transitions between defined structural and functional chromatin states to protein binding and histone modification. Such models should constitute a critical guide for rational experiments, whose results can be fed back into the model. Finally, we urgently need tools to manipulate nuclear organization. This should allow us to rearrange chromatin and chromosomes and interfere with the assembly or function of nuclear domains and subsequently establish the effects on gene expression. Clearly, we first have to learn more about which proteins and possibly RNA molecules are essential for nuclear organization and chromatin structure. Subsequently, they can become targets for experiments that interfere with nuclear structure and function.
Acknowledgements
The authors acknowledge the financial support by several grants from the biological branch (ALW) of the Dutch research council NWO for research on nuclear structure and function.