Early genomics of learning and memory: a review

Authors


S. Cavallaro, MD, PhD, Istituto di Scienze Neurologiche, CNR, Viale Regina Margherita 6, 95123 Catania, Italy. E-mail:s.cavallaro@isn.cnr.it

Abstract

The characterization of the molecular mechanisms whereby our brain codes, stores and retrieves memories remains a fundamental puzzle in neuroscience. Despite the knowledge that memory storage involves gene induction, the identification and characterization of the effector genes has remained elusive. The completion of the Human Genome Project and a variety of new technologies are revolutionizing the way these mechanisms can be explored. This review will examine how a genomic approach can be used to dissect and analyze the complex dynamic interactions involved in gene regulation during learning and memory. This innovative approach is providing information on a new class of genes associated with learning and memory in health and disease and is elucidating new molecular targets and pathways whose pharmacological modulation may allow new therapeutic approaches for improving cognition.

One of the most ambitious goals of modern neuroscience is to identify the mechanisms whereby our brain codes, stores and retrieves memories. Two general forms of memory can be classified by their duration: short-term memory (STM), which is rapidly formed and can outlast training for minutes or hours, and long-term memory (LTM), which lasts from hours to days, weeks or even years. STM involves post-translational modifications of preexisting molecules that alters the efficiency of synaptic transmission. In contrast, LTM can be blocked by inhibitors of transcription or translation indicating that it is dependent on de novo gene expression (Davis & Squire 1984; Stork & Welzl 1999). Proteins newly synthesized during memory consolidation may contribute to restructuring processes at the synapse and thereby alter the efficiency of synaptic transmission beyond the duration of STM. Revealing the dependence of LTM on protein synthesis, however, provides no information about the identity and specificity of the required proteins.

Because the quantity of a particular protein is often reflected by the abundance of its mRNA, a variety of methods has been used to describe only a limited number of differentially expressed genes during LTM (Stork & Welzl 1999). Successive screening therefore were needed to uncover how many and which genes are involved in memory and how do they interact functionally to effect memory storage. To survey the gene-based molecular mechanisms that underlie LTM, genome-wide expression analysis is being used in a variety of behavioral conditions (Cavallaro et al. 2001; Cavallaro et al. 2002; D'Agata & Cavallaro 2003; Dubnau et al. 2003; Leil et al. 2003; Luo et al. 2001; Rampon et al. 2000). This genomic approach requires the development of various laboratory protocols, as well as the development of database and software tools for efficient data collection and analysis. A basic understanding of these computational tools is therefore required for optimal experimental design and meaningful data analysis. For this reason, in the present review, we will outline the main procedures of microarray analysis (Fig. 1) before describing its application in the learning and memory field. For a more general description of microarray technology, the reader is referred to other reviews (Heller 2002; Hess et al. 2001; Noordewier & Warren 2001; Quackenbush 2001).

Figure 1.

Experimental design, data analysis and follow-up experiments in DNA microarray experiments. DNA microarray is a very powerful technology, but it can be useless if the experimental design is not properly identified and executed with precision and accuracy. Because of variability across different samples, it is important to perform microarray experiments on replicate samples. These replicates can be either ‘technical’, where the same sample pair is repeated, or ‘biological’, where the experiment is repeated and an equivalent sample used. Sophisticated computational tools are then needed for the management and analysis of data that result from these experiments. Microarray data are commonly normalized to correct differences in intensity levels. Once normalized, a series of restrictions (gene filtering) can be applied to the data obtained. These restrictions include factors such as quality control, expression level constraints, sample-to-sample fold comparison and statistical group comparisons. More complex computational methods are needed to investigate and interpret microarray data and to group genes based on their expression levels or gene ontologies. Follow-up experiments can finally be used to confirm/modulate the expression/activity of candidate genes or their encoded proteins.

Expression profiling by DNA microarray technology

Basic principles

A DNA microarray is a grid of DNA spots, called probes, each containing a unique DNA sequence (Fig. 2). Spots contain either DNA oligomers or a longer DNA sequence designed to be complementary to a particular mRNA of interest. When a microarray is hybridized to fluorescence-tagged complementary DNAs or RNAs derived from messenger or total RNA, each spot is a target for the mRNA encoded by a gene. A laser can then excite the bound cDNAs or cRNAs and a scanner collects fluorescence intensities from each spot on the slide. The intensity of the fluorescence at each array element is proportional to the expression level of that gene in the sample. The choice of having oligomers or longer cDNA sequences yields two different microarray technologies: oligonucleotide and cDNA microarrays, respectively. The thing that makes microarrays the most promising technology for genome-wide expression analysis is the number of DNA probes that it is possible to place on a microarray. Already there are microarrays with probes for every gene in yeast, and others with over 40 000 human genes. This allows researchers to observe the response of whole genomes to various stimuli instead of one gene at a time.

Figure 2.

DNA microarray methodology. Total or messenger RNA is extracted, reverse transcribed, labeled with different fluorochromes (e.g. Cy3 and Cy5) and hybridized to microarrays. At the end of the hybridization, a laser scanner collects the image produced by the dye. Intensity values from each spot are calculated and then analyzed by specific software. Data can be represented graphically by a scatter plot, with the values of sample one plotted on the x-axis and the values of sample two plotted on the y-axis. Data obtained under different conditions (e.g. different time points) can be analyzed with different algorithms such as hierarchical or k-mean clustering.

Computational analysis of microarray data

Microarray analysis results in large amounts of data that are difficult to interpret without computational methods. The simplest analysis involves two samples, representing a test condition and a control condition and yields a list of paired expression values, one pair for each gene. As illustrated in Fig. 2, these pairs can be represented graphically by a scatter plot, with the values of sample one plotted on the x-axis and the values of sample two plotted on the y-axis. The resulting correlation plot provides a visual image of the relationship between the two expression profiles. In this plot, genes with similar expression levels in the two samples should have points on the identity line (y = x), and genes that are expressed differentially lie at some distance from this line. However, the problem is that microarrays do not measure expression levels directly, but rather intensity levels, as represented by the amount of phosphorescent dye that was recorded by a scanner. Many other factors, such as the overall mRNA concentration of the two samples, the saturation effects in the hybridization or the quenching effect of the phosphorescent dyes can affect these intensity values. In order to correct these differences in intensity levels, the raw data can be normalized, for example, by using a normalization constant derived from housekeeping or spiked control genes.

Once normalized, a series of restrictions (or filters) can be applied to the data obtained. These restrictions include factors such as quality control, expression level constraints, sample-to-sample fold comparison and statistical group comparisons. The simplest way to identify interesting genes in DNA-microarray experiments is to search for those that are consistently either up- or downregulated. Relative differences in expression levels (fold changes) have been typically employed in group comparisons of gene expression. This approach, however, is somewhat arbitrary and inherently subject to high error rates because information on sample variance is not exploited. If array experiments are replicated to an extent that permits direct estimates of the variance of each individual transcript, parametric or non-parametric statistics can be applied. In these cases, however, a high number of false-positive results are expected by chance when one relies on the nominal P-value. For instance, when testing 10 000 transcripts we would expect to misidentify about 500 genes as significant (P < 0.05), even when there is no real difference in gene expression. Multiple testing corrections therefore are needed to adjust the individual P-value to account for this effect.

More complex computational methods are needed to monitor several gene expression profiles, such as those arising from time-course studies, and various clustering techniques have been applied to the identification of patterns in gene-expression data. Cluster analysis is a commonly used method to investigate and interpret gene expression data sets. By grouping together genes that have similar expression profiles, cluster analysis can be used for extraction of regulatory motifs, inference of functional annotation and classification of cell types or tissue samples.

Cluster analysis

The term clustering stands for a method that makes it possible to partition a set of objects (genes) into subgroups with similar features called clusters. These partitions have to satisfy the following features: Homogeneity in the cluster (the objects which belong to the same cluster have to be as similar as possible) and Heterogeneity among clusters (the objects which belong to different clusters have to be as different as possible).

A clustering method generally consists of two distinct components: a distance measure (or similarity coefficient) that indicates how similar two gene expression patterns are (or more generally, two clusters), and a clustering algorithm, which sorts the data and groups genes together on the basis of their separation in expression space.

Measure of distance or similarity coefficient

Many of the advanced analysis techniques are based upon measures of gene similarity. Similarity between genes is usually based on the correlation between the expression profiles of the genes. For expression data, we can solve the problem of ‘similarity’ mathematically by defining an ‘expression vector’ for each gene that represents its location in ‘expression space’. In this way, expression data can be represented in n-dimensional expression space, where n is the number of experiments and where each gene-expression vector is represented as a single point in that data space.

In any clustering algorithm, the calculation of a ‘distance’ between any two objects is fundamental for placing them into groups. There are various methods for measuring distance, typically falling into two general classes: metric and semimetric (or similarity coefficient). Each of these takes two expression patterns and produces a number representing how similar the two genes are.

Detailed mathematical description of distance metrics used in clustering analysis can be retrieved in other reviews (Quackenbush 2001). The most common metric distance is the Euclidean distance. It simply is the geometric distance in the multidimensional space. The most commonly used semimetric distance measure in the analysis of gene expression data is the Pearson's correlation coefficient r.

Clustering algorithms

After providing means of measuring distance between genes, clustering algorithms sort the data and group genes together on the basis of their separation in expression space. Various clustering techniques have been applied to the identification of patterns in gene-expression data. Most cluster analysis techniques are hierarchical, the resultant classification has an increasing number of nested classes and the result resembles a phylogenetic classification. Non-hierarchical clustering techniques also exist, such as k-means clustering, which simply partition objects into different clusters without trying to specify the relationship between individual elements. Examples of hierarchical and k-means clustering are reproduced in Fig. 2.

Although cluster analysis techniques are extremely powerful, great care must be taken in applying this family of techniques. Eventhough the methods used are objective in the sense that the algorithms are well defined and reproducible, they are still subjective in the sense that selecting different algorithms, different normalizations, or different distance metrics, will place different objects into different clusters. Furthermore, clustering unrelated data would still produce clusters, although they might not be biologically meaningful.

Numerical, semantic and mixed clustering. Cluster analysis is a methodology to identify groups of genes that share common expression characteristics and behaviors. It has been frequently exploited in the analysis of genome-wide expression data, as the experimental observation that a set of genes is coexpressed implies that the genes share a biological function and are under common regulatory control.

Frequently, the clustering is used to group together genes considering only similar expression profiles, but it does not consider other well-known features of the gene properties. Actually, genes with a different profile expression could have similar functions as well and the classical clustering methodologies do not put it in evidence.

In order to extract knowledge from gene expression information, cluster analysis can be organized in three different approaches: numerical, semantic and mixed clustering.

The numerical clustering method is applied to the levels of gene expression. It tends grouping genes with a similar expression profile in the same clusters and makes sure that genes having different profiles with similar semantic features fall in different clusters. These considerations suggest that simple numerical clustering algorithms are inadequate to infer the genes' and proteins' role.

In order to discover more complex relationships among gene sequences the semantic clustering is used. The term semantic clustering indicates methods of clustering based on semantic characteristics, such as gene ontologies. When categorical domains are ordered, they can be turned in numerical values in order to transform those that are similar in near values. After that, methods of classical numerical clustering can be applied to that data set. When categorical domains are not ordered, however, this approach does not necessarily produce meaningful results. In this case, the k-modes algorithm can be used to remove this limitation.

Finally, more useful analyses can be performed using the mixed clustering. In this case, each gene can be represented from a vector in the n + m-dimensional space, where n is the number of levels of gene expression and m is the number of semantic features transformed in numerical values. Then, each gene-expression-semantic vector is represented as a single point in data space and any measure of distance can be adopted to calculate the distance between any two genes. The mixed clustering tends to group genes with similar expression profiles as well as genes with similar semantic features.

In order to perform semantic and mixed clustering we have developed new informatics applications (Fig. 3). Functional information is automatically retrieved by means of the software application genelink (Fig. 3A). This application is designed to retrieve genomics and proteomics information from external worldwide databases (NCBI GenBank and LocusLink). Before performing the semantic clustering, the functional annotations have to be turned into numerical values to transform features that are similar functionally in near values. For each gene ontology (GO) number, for example, the software application gene ontology system (GOS) assigns a new GO number in order to identify the hierarchical relationships among GOs (Fig. 3b). The basic idea is that two related GOs must be coded with two closest GO numbers. After this renumbering process, methods of classical numerical clustering can be applied to the original data set (Fig. 3c) to perform numerical (Fig. 3d), semantic (Fig. 3e), or mixed (Fig. 3f) clustering. In this way, cluster analysis can effectively extract functional information from gene expression data.

Figure 3.

Numerical, semantic and mixed clustering. This term indicates a method of clustering genes based on their semantic characteristics. In this figure, gene ontologies (GOs) terms are used as examples of semantic features. Currently, three independent ontologies' information are accessible on the World Wide Web (http://www.geneontology.org): biological process (GO1), molecular function (GO2) and cellular component (GO3). We have developed an informatics application that we call genelink that is capable to automatically retrieve functional annotations from external databases (NCBI GenBank and LocusLink) (panel a). Before performing the semantic clustering these functional annotations have to be turned in numerical values in order to transform features that are similar functionally in near values. For each GO number, the software application gene ontology system (GOS) assigns a new number according to a depth visit of the tree that represents the hierarchical relationships among GOs (Panel b). After this renumbering process, methods of classical numerical clustering can be applied to the normalized data set. Panels c–f show a gene expression matrix (left) and a GO map (right). The GO map is composed of three grids, one for each type of gene ontology: biological process, cellular component and molecular function. Each element of a grid is colored on the basis of the GO number obtained by GO renumbering. (some genes have more than one GO). Panel c represents unclustered gene expression data of 10 genes in four different conditions together with their GO map. Panel d shows clustered gene expression data of the 10 genes in four different conditions (GOs are not grouped). In panel e, gene ontologies are clustered (gene expression data are not grouped). Panel f shows a mixed (gene expression and GO data together) clustering (balanced clusters on both the universes can be seen).

The use of a genomic approach for studying learning and memory

The following part of this review will focus on the use of a genomic approach to dissect and analyze gene-based mechanisms underlying learning and memory. We will highlight gene expression microarray analysis performed in different behavioral paradigms (eye-blink conditioning, water- and T-maze learning and passive avoidance conditioning). For space limitation, in this review we will try to give a broad view of the results obtained refraining from discussing each of the genes implicated by microarray analysis.

Eye-blink conditioning

To begin a comprehensive survey of the molecular mechanisms that underlie LTM, we used cDNA microarray technology to perform genome-wide expression analysis after classical conditioning of the rabbit's nictitating membrane response (NMR), a uniquely well-controlled associative learning paradigm (Fig. 4a) (Cavallaro et al. 2001). Classical conditioning of the rabbit NMR involves the presentation of an innocuous stimulus such as a tone followed by a noxious stimulus such as air puff to or electrical stimulation around the eye (Gormezano et al. 1962). Extensive lesion and recording data have implicated the cortex of the cerebellum and in particular, lobule HVI, in classical conditioning of the rabbit NMR (Berthier & Moore 1986; Gould & Steinmetz 1996; Gruart & Yeo 1995; Schreurs et al. 1991; Yeo et al. 1985). Although the hippocampus may not be necessary for NMR conditioning, recording data do show consistent eye blink-conditioning-specific hippocampal changes (Coulter et al. 1989; Sanchez-Andres & Alkon 1991). In addition, imaging studies have implicated both structures in human eye blink conditioning (Blaxton et al. 1996; Logan & Grafton 1995; Molchan et al. 1994; Schreurs et al. 1997).

Figure 4.

Microarray analyses of eye blink conditioned rabbits. (a) Mean percent conditioned responses in paired, unpaired and sit-control rabbits as a function of three training sessions. To relate changes in gene expression to a learning task, we used pairings of a tone and periorbital electrical stimulation in a standard delay-conditioning procedure, training rabbits to asymptotic levels of conditioning over 3 consecutive days. Paired rabbits (n = 12) acquired conditioned responses to the tone and reached a mean terminal level of 94.7% conditioned responses, whereas the unpaired control rabbits (n = 12) responded to the tone at mean levels of less than 1.3% across the 3 days of stimulus presentations and sit-control rabbits (n = 5) had spontaneous blink rates of less than 1% (P < 0.001). Without further training or testing rabbits show a level of 80% conditioned responses as long as 1 month after the 3 days of the stimulus pairings used in the present experiments. Consequently, harvesting cerebellar and hippocampal tissue 24 h after 3 days of pairings ensured that rabbits were still at an asymptotic level of conditioning. (b–c) Scatter plot of gene expression levels for paired and unpaired animals in (b) cerebellar lobule HVI and (c) hippocampus. Messenger RNA levels from cerebellar lobule HVI and hippocampus of unpaired and paired rabbits (n = 7 per group) were simultaneously analyzed with high-density cDNA microarrays containing more than 8700 cDNA mouse clones with a length of 500–5000 bp and with averages in the 1-kb region. The estimated percentage of homology between mouse clones and rabbit genes is 88.98 ± 3.7 (mean ± SD). The cross species similarity and a complete list of the differentially expressed genes are available online at http://www.web.tiscali.it/sebastiano_cavallaro. (D) Differentially expressed genes with a known function are ordered into functional groups.

Messenger RNA levels from cerebellar lobule HVI and hippocampus of unpaired and paired rabbits were simultaneously analyzed with high-density cDNA microarrays containing more than 8700 cDNAs (Cavallaro et al. 2001). When gene expression patterns were compared, mRNA levels of 79 and 17 genes differed more than twofold in lobule HVI and hippocampus, respectively (Figs 4b,c). These genes were operationally defined as ‘memory related genes’ (MRGs). Approximately 50% of the MRGs differentially expressed in the hippocampus were also differentially expressed in the HVI lobule, suggesting common mechanisms of memory storage in the two areas.

A majority of MRGs were downregulated, whereas only two genes that differed by a factor greater than 2 were upregulated in lobule HVI of paired animals (Fig. 4b). Because LTM can be blocked by transcription and protein synthesis inhibitors, most previous reports have focused on the identification of proteins whose expression is upregulated (Davis & Squire 1984). The preponderant reduction of gene expression during LTM therefore would not have been predicted and provides new and unexpected insights into the molecular mechanisms that underlie it. The specific role of the downregulation of MRGs following learning remains a matter of speculation. Downregulation of a gene may be the end point in a dynamic gene expression process that begins with upregulation during acquisition of the learned response. Alternatively, memory storage may require a balance of upregulation of some genes and downregulation of genes that exert inhibitory constraints on memory formation (Alberini et al. 1994). These latter genes might be termed memory suppressor genes (Abel & Kandel 1998).

A majority of the MRGs implicated have no currently recognized function and are not yet named. Complete nucleotide sequence determination, conceptual translation, expression monitoring and biochemical analysis are currently underway (D'Agata et al. 2003) and should provide a detailed functional understanding of these genes. Seventeen genes have significant similarity to known genes and can be grouped into different functional classes (Fig. 4D).

Our microarray analysis of eye blink-conditioned rabbits (Cavallaro et al. 2001) was the first reported in the literature to demonstrate the feasibility and utility of a cDNA microarray system as a means of dissecting the molecular mechanisms of associative memory. Further studies, however, were required at different time points and behavioral conditions to better understand the role of the implicated genes. To perform such studies we and others have moved to rat or mice, two animal species that are better suitable for genomic studies than rabbit in terms of sequenced genes and available microarrays.

T-Maze learning

Microarrays have been used to analyze hippocampal gene expression in rats following training in a multiunit T-maze (Luo et al. 2001). In this study, the expression of 28 genes (18 known genes and 10 ESTs) was found to be increased in maze-trained animals compared with yoked control rats that were trained in a straight runway. Some of the known genes are involved in Ca2+ signaling, Ras activation, kinase cascades and extracellular matrix function. None of them, however, overlap with genes identified by microarray analysis in the other experimental paradigms examined in this review. Although the aversive foot shock was applied in equal duration and frequency to both the trained and control rats, changes in gene expression could be ascribed to other differences among the two groups, such as locomotor activity. In addition, because the animals were pretrained on day 1 and T-maze-trained on days 2 and 3, the time-dependent patterns of gene regulation during acquisition and consolidation of memory are unknown.

Water-maze learning

To detect learning related changes, microarray analysis has been used by two laboratories to characterize gene expression profiles in animals trained in the Morris water maze (Cavallaro et al. 2002; Leil et al. 2002; Leil et al. 2003). In this learning paradigm, a rodent learns to locate a submerged island in a large pool by creating a spatial map using extra-pool cues. Leil et al. (2002) used cDNA-microarrays containing approximately 9000 clones to detect hippocampal gene expression changes between F1 hybrid mouse strains that perform well on the Morris water maze and inbred strains that perform poorly. Although this study was performed in a brain region intimately involved in spatial learning, genes differentially expressed in mouse strains may subserve other behavioral processes or functions. Indeed, in a later study, the same authors (Leil et al. 2003) used microarray analysis to characterize the differential expression of genes (n = 3) in the hippocampus of F1 hybrid mice after 2 days of water-maze training. Although mouse strains used were the same, no overlap was found between the genes revealed in the two studies. In addition, no overlap was found between genes differentially expressed in water-maze training (Leil et al. 2003) and those performed in the other behavioral paradigms examined in this review. This is probably due to a number of factors, including different genes on the arrays, different species or strains employed, brain regions and time point studied. In addition, because mice are very reactive to placement in water, gene expression differences may be due to stress responses rather than learning and memory.

To analyze the time-dependent patterns of gene regulation during water-maze training, we measured hippocampal gene expression profiles in naïve, swimming control and water-maze trained rats (Fig. 5a,b), using microarrays containing more than 1200 genes relevant to neurobiology (Cavallaro et al. 2002). When gene expression profiles in naïve and swimming control animals 1, 6 and 24 h after swimming sessions were compared, 345 genes were found differentially expressed more than twofold in at least two of the four conditions (Fig. 5c). These genes, operationally defined as ‘physical activity related genes’ (PARGs) indicate that physical activity and mild stress associated with behavioral training has a significant impact on hippocampal gene expression.

Figure 5.

Water-maze learning. (a) Escape latencies of rats swimming to a submerged platform in the water maze during four consecutive trials. In order to reduce stress in the experimental day, the first day was dedicated to swimming training in the absence of an island. Each rat was placed in the pool for 2 min and was returned to its home cage. In the next day, half of the rats were placed again in the pool for a 2.5-min swimming session and were used as swimming controls. The other half were given four consecutive trials to locate the platform, each trial lasting up to 2 min. Rats were required to spend 30 seconds of an intertrial interval on the platform. The rats' escape latency was measured using a HVS2020 video tracking system (HVS Image Ltd, Hampton, UK). (b) Probe trial. To verify that the trained rats in fact learned the spatial location of the island, a group of six rats was trained to find the island and tested 24 h later on a quadrant analysis test. The trained rats swam significantly longer in the quadrant (red) where the island was located. (c–e) Venn diagrams of differentially expressed hippocampal genes. Hippocampal gene expression profiles in naïve, SC and water-maze trained rats were measured using microarrays containing 1263 genes relevant to neurobiology (Affymetrix GeneChip Rat Neurobiology U34 array). Genes differentially expressed in naïve and swimming control animals 1, 6 and 24 h after training were operationally defined as ‘physical activity related genes’ (PARGs), whereas genes differentially expressed in water-maze-trained animals compared with swimming controls were operationally defined as ‘memory related genes’ (MRGs) (c). Among these, 55 genes were upregulated (d), whereas 91 genes were downregulated (e) in at least one of three time points examined. (f) Hierarchical clustering of MRGs. A hierarchical clustering algorithm (Pearson correlation, separation ratio 0.5, minimum distance 0.001) was used to order MRGs in a dendrogram in which the pattern and length of the branches reflects the relatedness of the samples. Data are presented in a matrix format: each row represents a single gene and each column an experimental condition. The averaged normalized intensity from two replicates is represented by the color of the corresponding cell in the matrix. Blue, yellow and red cells, respectively, represent transcript levels below, equal or above the median abundance across all conditions. Color intensity reflects the magnitude of the deviation from the median (see scale at the bottom). A complete list of the differentially expressed genes is available online at http://www.web.tiscali.it/sebastiano_cavallaro.

When gene expression levels in swimming control animals were compared with water-maze trained animals 1, 6, or 24 h after training, 140 MRGs were found (Fig. 5c). The majority of these MRGs (110 of 140) were also PARGs, i.e. influenced by physical activity. Among MRGs, 55 genes were upregulated in the hippocampus of water-maze-trained animals, whereas 91 genes were downregulated (Figs 5d,e).

Most of the MRGs, those differentially expressed between the swimming and spatial learning animal groups, were also affected during swimming alone but with entirely different temporal patterns of expression (Fig. 5f). Although learning and physical activity involve common groups of genes, the behavior of learning and memory can be distinguished from unique patterns of gene expression across time.

All of the MRGs identified during water-maze learning have a recognized function and can be classified into six major groups based on their translated product: (i) cell signaling (ii) synaptic proteins (iii) cell–cell interaction and cytoskeletal proteins (iv) apoptosis (v) enzymes and (vi) transcription or translation regulation. Some of these genes have been previously related to synaptic plasticity, memory, or cognitive disorders. For a complete description of the MRGs implicated by microarray technology, the reader is referred to our previous study (Cavallaro et al. 2002). In the following paragraph we will discuss only one of the MRGs, FGF-18, which has been further tested for its memory regulatory function.

FGF-18 is a novel member of the FGF family, which was shown to stimulate neurite outgrowth (Ohbayashi et al. 1998). Although the function of this peptide is still unknown, the other members of its family are important signaling molecules in several inductive and patterning processes and act as brain organizer-derived signals during formation of the early vertebrate nervous system. Water-maze training but not physical activity induced the expression of FGF-18. To explore the effect of FGF-18 in spatial learning, we tested the effects of a single exogenous dose of FGF-18. Rats were trained in a Morris water maze for two trials and then injected intracerebroventricularly with 0.94 pmoles of FGF-18 or vehicle. As shown in Fig. 6, FGF-18 treatment improved spatial learning behavior by inducing a 49% reduction in the escape latency but no significant changes in motor activity.

Figure 6.

Effects of a single exogenous administration of FGF-18 on water-maze learning. Thirteen male Wistar rats (250–300 g) were implanted strereotaxically with stainless steel guide cannulae in the right and left lateral ventricles. On day 1, 1 week after surgery, animals were subjected to a 2-min swimming training session. Then, a water-maze training session was performed on days 2 and 3 and consisted in finding a submerged platform to escape from the water. Two trials were given to the animal for each session. The escape latency and distance to find the platform were monitored as described above. Ten minutes after the second trial on day 2, an intracerebroventricular administration of drug or vehicle was performed in both lateral ventricles. Six animals received 0.94 pmoles of FGF-18 and the other seven received a control injection of vehicle (saline). *day 1 vs. day 2, P < 0.05; †control vs. FGF-18, P < 0.05.

The data obtained in the hippocampus of water-maze-trained rats (Cavallaro et al. 2002) represent the first temporal gene expression comparison reported in the long term retention of learning and memory and further demonstrated the utility of a genomic approach as a means of dissecting the molecular basis of associative memory. This approach provides information on the gene expression changes that occur during physical activity, stress, learning and memory, allowing the identification of molecular targets and pathways whose modulation may generate new therapeutic approaches for facilitating learning and memory.

Passive avoidance learning

We have recently extended our genome-wide screenings to an additional behavioral animal model, a step-through passive avoidance test, known to require hippocampus-dependent learning and depend upon transcription (Stubley-Weatherly et al. 1996). In these experiments (D'Agata & Cavallaro 2003), conditioned animals (CA) were trained to avoid moving from the lighted to the darkened section of a conditioning chamber by delivering a foot shock when they entered the darkened section. Control rats included untrained (naïve) animals, and animals exposed to the unconditioned (USTA) or the conditioned (CSTA) stimulus. To verify that the trained rats in fact learned the passive avoidance task, learning was assessed in a comparable group of animals by evaluating the latency of step-through in a retention test. Twenty-four hours after the one-trial training period, only CA learned to associate stepping through the darkened chamber with the foot shock (Fig. 7a).

Figure 7.

Passive avoidance learning. (a) Passive avoidance retention test. Conditioned animals (CA) were trained to avoid moving from the lighted to the darkened section of a conditioning chamber by delivering a foot shock when they entered the darkened section. Control rats included untrained (naïve) animals, and animals exposed to the conditioned (CSTA) or the unconditioned (USTA) stimulus. Twenty-four hours after the training trial, half of the animals (n = 4 per group) performed the retention test to verify that the trained rats in fact learned the passive avoidance task. The animals were placed in the safe compartment with the door closed. After 2 min of acclimation the light turn-on, the door opened and the animal was allowed to enter the dark compartment. The latency to enter the dark compartment was recorded and used as the measure of retention. The rats avoiding the dark compartment for over 300 seconds were considered to have a memory of the training experience. During the retention trial, CA had a longer mean step-through latency than naïve, CSTA and USTA (*P < 0.001). (b) Venn diagrams of differentially expressed hippocampal genes. Hippocampal gene expression profiles in CA, USTA, CSTA and naïve animals were measured 6 h after training using microarrays containing 1263 genes relevant to neurobiology (Affymetrix GeneChip Rat Neurobiology U34 array). Genes differentially expressed between naïve and CSTA were defined as ‘conditioned stimulus related genes’ (CSRGs); genes differentially expressed between naïve and USTA were defined as ‘unconditioned stimulus related genes’ (USRGs); genes differentially expressed between naïve and CA were defined as ‘memory related genes’ (MRGs). (c) Hierarchical clustering of MRGs. A hierarchical clustering algorithm (Pearson correlation, separation ratio 0.2, minimum distance 0.001) was used to order MRGs in a dendrogram in which the pattern and length of the branches reflects the relatedness of the samples. Data are presented in a matrix format: each row represents a single gene and each column an experimental condition. The averaged normalized intensity from four replicates is represented by the color of the corresponding cell in the matrix. Green, black and red cells, respectively, represent transcript levels below, equal or above the median abundance across all conditions. Color intensity reflects the magnitude of the deviation from the median (see scale at the bottom). The graphs on the left of the dendrogram represent the averaged Natural Log of normalized data ± SEM of the genes in nine major clusters. Gene expression ratio between naïve and CA together with statistical significant changes (P < 0.05; *naïve vs. CSTA, †naïve vs. USTA and ‡naïve vs. CA) are shown on the right of the matrix. Functional classification of MRGs is represented in a column on the right of the figure where each functional classes or subclasses are color coded. The name and GenBank accession number of MRGs uniquely regulated in CA are indicated in italic, whereas MRGs previously found to be differentially expressed in the hippocampus of water-maze-trained rats (Cavallaro et al. 2002) are indicated in bold. A complete list of the differentially expressed genes is available online at http://www.web.tiscali.it/sebastiano_cavallaro.

Hippocampal gene expression profiles in CA, USTA, CSTA and naïve animals were measured 6 h after training using microarrays containing 1263 genes relevant to neurobiology (D'Agata & Cavallaro 2003). When gene expression profiles of naïve animals were compared with those of CSTA or USTA, 46 and 60 genes, respectively, were found differentially expressed (Fig. 7b). These genes further demonstrate that physical activity and mild stress associated with behavioral training have a significant impact on hippocampal gene expression.

When gene expression levels in naïve animals were compared with CA, 38 MRGs were found (Fig. 7b). Among these, 21 genes were downregulated and 17 genes were upregulated. Some of these MRGs (21/38) were also differentially expressed in CSTA (16) and USTA (16) (Fig. 7b).

A hierarchical clustering method was used to group MRGs on the basis of similarity in their expression patterns (Fig. 7c). The most evident traits of the clustered data were that MRGs showed entirely different patterns of expression in CA vs. CSTA or USTA. Genes segregating into nine major branches of the dendrogram were assigned to nine clusters (Fig. 7c). Clusters 1–4 represent those genes, which were downregulated, whereas clusters 5–9 include those, which were upregulated in CA. Some of the MRGs, those differentially expressed between naïve and CA, were also affected by exposing the rats to the conditioned or the unconditioned stimulus alone, whereas others were uniquely induced when the two were associated and the animals were conditioned (Fig. 7c, clusters 2 and 8). Expression changes of MRGs in CSTA or USTA had different magnitudes or more often opposite trends than CA (Fig. 7c, clusters 1, 2, 3, 5, 8 and 9). As we have previously observed in water-maze-trained animal, learning, physical activity and mild stress associated with behavioral training involve common groups of genes. Their behavior in learning and memory, however, could be distinguished from unique patterns of gene expression as shown in the clustered data.

All of the MRGs identified have a recognized function and can be classified into different functional classes based on their translated product (Fig. 7c). Some of these genes have been previously related to synaptic plasticity, memory or cognitive disorders. Six of 38 MRGs found in the hippocampus of rats after passive avoidance training (Fig. 7c, shown in bold) were also differentially expressed in the same brain area following water-maze learning (Fig. 5f) suggesting common mechanisms of memory storage in different behavioral paradigms. For a complete description of the MRGs implicated by microarray technology during passive avoidance conditioning, the reader is referred to our previous study (D'Agata & Cavallaro 2003).

Conclusions

The characterization of expression patterns associated to the long-term retention of learning and memory in health (Cavallaro et al. 2001; 2002; D'Agata & Cavallaro 2003; Dubnau et al. 2003; Luo et al. 2001; Leil et al. 2003; Rampon et al. 2000) and disease (Blalock et al. 2003; D'Agata et al. 2002; Hata et al. 2001; Ho et al. 2001; Leil et al. 2002; Loring et al. 2001; Pasinetti & Ho 2001; Pasinetti 2001; Tudor et al. 2002; Yao et al. 2003) conditions has just started. Gene expression profiles unlock virtually unexplored frontiers, and we will learn as we explore them.

These ‘early’ studies are limited by the experimental design (animal strain, behavioral condition and time), technology (microarray platforms and number of genes) and computational analysis (normalization, filtering and statistical analysis) used. The value of these experiments will progressively increase as more is learned about the function of each gene and when software applications, like that we presented in this paper, will enable us to identify complex correlations existing between the genomic profiles obtained by microarray experiments and functional information (Fig. 8).

Figure 8.

Correlations between gene expression profiles and functional information. Complex correlations between gene expression profiles and functional information are needed to unravel the role of genes in the pathophysiology of learning and memory. Examples of functional information are listed on the right side of the figure and include gene annotations and phenotypic characteristics (any identifiable or observable structural or functional characteristic of an organism).

Although sure to be just the tip of the iceberg, the results already obtained point toward genes or sets of genes that may play critical roles in learning and memory. The discovery of these genes represents the key to developing novel and efficacious therapies to improve learning and memory, under normal conditions as well as in disorders that affect cognitive functioning, such as Alzheimer's disease.

Acknowledgments

We gratefully acknowledge Alfia Corsino, Maria Patrizia D'Angelo and Francesco Marino for their administrative and technical support. This work was partly sponsored by grants of the Italian Ministry of Health and the Italian Ministry of Education University and research to SC.

Ancillary