Physiol. Genomics Genetics/Genomics of Vascular Disease Workshop
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Physiol. Genomics 34: 34-41, 2008. First published April 22, 2008; doi:10.1152/physiolgenomics.00008.2008
1094-8341/08 $8.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplemental Figures and Tables
Right arrow All Versions of this Article:
34/1/34    most recent
00008.2008v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Gustin, M.-P.
Right arrow Articles by Cerutti, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gustin, M.-P.
Right arrow Articles by Cerutti, C.
Received 11 January 2008; accepted in final form 21 April 2008.
Physiological Genomics 34:34-41 (2008)
1094-8341/08 $8.00 © 2008 American Physiological Society

Call For Papers: Computational Modeling of Physiological Systems

Functional meta-analysis of double connectivity in gene coexpression networks in mammals

Marie-Paule Gustin1,2, Christian Z. Paultre2, Jacques Randon1, Giampiero Bricca1 and Catherine Cerutti1

1 EA4173, Inserm ESPRI, ERI 22, Biostatistiques, Université de Lyon, Université Lyon 1, Lyon, France
2 Institut de Pharmacie (ISPB), Biostatistiques, Université de Lyon, Université Lyon 1, Lyon, France

ABSTRACT

In functional genomics, the high-throughput methods such as microarrays 1) allow analysis of the relationships between genes considering them as elements of a network and 2) lead to biological interpretations thanks to Gene Ontology. But up to now it has not been possible to find relationships between the functions and the connectivity of the genes in coexpression networks. To achieve this aim, we have defined a double connectivity for each gene by the numbers of its significant negative and positive correlations with the other genes within a given biological condition, or group. Here, based on the analysis of 1,260 DNA microarrays, we show that this double connectivity clearly separates two types of genes, those with a predominantly strong negative connectivity, hub– genes, and those with a predominantly strong positive connectivity, hub+ genes. Interestingly, the hub+ genes concerned transcription factors more often than by chance and, similarly, for the hub– genes concerning miRNA predicted targets. Furthermore, a meta-analysis of GO annotations carried out on 67 groups in humans and rats shows that these two types of genes correspond to a functional biological duality. The hub– genes were mainly involved in basic functions common to all eukaryote cells, whereas the hub+ genes were mainly involved in specialized functions related to cell differentiation and communication. The separation and the biological role of these hub– and hub+ genes provide a powerful new tool for a better understanding of the control and regulation of the key genes involved in cellular differentiation and physiopathological conditions.

microarray; Pearson correlation; systems biology; transcription regulation

MATHEMATICAL NETWORK MODELING of complex biological systems, observed at the neuronal (37), protein (6, 39), metabolic (29, 32), or gene expression (5, 12, 13, 19, 36) level, has become an important step in understanding the cell's functional organization (for review, see Refs. 3, 25). In a general way, the connectivity of a node within a network allows the degree of involvement of the node in the network to be appreciated. The most highly connected nodes are called "hub." It has been already shown that they may play an important role in the network (6). Currently, there is a growing interest in defining gene networks from microarray expression data (1, 2). Numerous sophisticated machine-learning algorithms coupled to various statistical methods have been proposed to achieve this aim (13, 16, 26, 40). Genes are usually said to be coexpressed or connected when their expression levels are linearly linked within a group. The connectivity of each gene for this group is then defined by its number of connections. The Pearson correlation remains the basic method for assessing pairwise expression gene linear links. It is widely used both for clustering gene expression profiles (18) and for calculating connectivity in gene coexpression networks (40). Surprisingly, little use has been made of the algebraic (negative or positive) sign of this coefficient. Most of the studies deal with the absolute value of the correlation coefficient, which allows the pairwise gene links to be represented in a binary way. In this study, we define a double connectivity, that is either positive or negative, for each gene in a group and for highly connected hubs.

Sharing genomic knowledge and assessing gene functions were such complex tasks that by the late nineties it became necessary to collate a database of gene properties expressed in a standardized manner (31). Functional gene annotations using Gene Ontology (GO) (33, 34) summarizes knowledge about the role of each gene in a dynamic controlled vocabulary and is very helpful for carrying out functional interpretations, particularly when interpreting the functions of hub genes (17, 22, 30).

In addition, the existence of publicly available microarray datasets allows meta-analyses (23, 30), or analyses over pooled data (35). Such analyses are essential for comparing the way genes operate in various observation conditions. However, due to the high cost of microarray experiments, few studies have been performed in samples with more than five observations, which is the minimum size for computing our correlation.

In this work, we made two main hypotheses: 1) the double connectivity discloses the existence of hub– or hub+ genes, having a predominantly strong negative or positive connectivity, 2) these hub– and hub+ genes have different functions. Our objectives were then 1) to analyze double connectivity of genes from a large number of public microarray data sets obtained in various tissues and biological conditions, and 2) to give a general interpretation using functional annotations of genes with GO terms.

MATERIALS AND METHODS

Data selection.
We first analyzed microarray data obtained in our laboratory from left ventricular hypertrophy in three models of hypertensive rats. (15). Then, microarray data sets were obtained from public repositories for microarray gene expression data, ArrayExpress (European Bioinformatics Institute, Cambridge, UK) (28) and Gene Expression Omnibus (National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD) (4). Data were obtained from various mammalian tissues, excluding cultured cells. We selected the data from Affymetrix GeneChips (Affymetrix, Santa Clara, CA) (RAE230A in rats; HG-U133A, HG-U133B, and HG-U133 Plus 2.0 in humans; and U74A in mice). We selected, from 35 datasets, 130 groups that were made up of at least five individuals showing the same controlled biological conditions (e.g., experimental, clinical, etc.), 48 groups in rats, 81 in humans, and only 1 in mice because of the inadequate sizes of the group samples available in this species (Supplementary Table S1).1 Various tissues had been analyzed in these groups: namely skeletal muscle, heart, kidney, bladder, prostate, breast, reproduction organs, lung, bone marrow, or brain in normal or pathological states. In humans, more than half of the groups were composed of cancer tissues. For a large majority of the groups, the microarray data had been preprocessed using Affymetrix software tools (MicroArray Suite 5.0 or GeneChip Operating Software) (see supplementary information p. 8). No prior filtering of the data was found to be necessary for the purposes of this study.

Double connectivity computation.
The computations were made separately for each group. The microarray gene expression data were represented in an N x n matrix, where N is the number of probe sets (taken as roughly equivalent to genes) and n the number of individuals in each group. We took log2-transformed gene expression values. The distribution of the log2-transformed expression data was found to be normal for >90% of the genes in >90% of the 130 groups using a one-tailed Shapiro-Wilk test at the significance level of 1%. We included all the probes of the microarrays in the computations irrespective of their expression level, their degree of normality, and their accuracy (values called "absent"). We also included in the computation the expression values that were below the detection threshold because although they give correlations that may be of interest, they could not be highlighted when missing some values. Thus, we estimated the linear link between two genes within a group by the Pearson correlation coefficient.

Within a group, two genes were considered to be connected if the absolute value of the Pearson correlation coefficient r between their log2-transformed expression values was significantly greater than the threshold value rth. We chose the threshold rth from the usual two-tailed Student's t-test at the significance level of 5% with (n – 2) degrees of freedom. The sign of the correlation coefficient gave the sign of the link. The number of negative (k–) or positive (k+) links of one gene with all the other genes of the microarray defined its double connectivity (k–, k+) in the coexpression network. For each gene, double connectivity can be viewed as computed by multiple testing. But since our purpose was not to determine the family of genes connected to one gene, we did not need to control the false positive rate using an adjusted P value. We only aimed at sorting the genes according to their double connectivity, and it was then sufficient to compute the connectivity in a similar way for all the genes.

When the group size was <20, a jackknifed Pearson correlation was computed to reduce the effect of extreme values. To test whether double connectivity values could be obtained by chance, we performed the double connectivity computation after randomization of the data by bootstrap. For each gene within a group, we conducted a resampling with replacement of its expression values from the observed values in the group. Then we calculated the double connectivity of each gene for the group. We repeated this procedure 1,000 times and computed the average of the negative and positive connectivities.

Defining hub– and hub+ genes.
Hub genes are highly connected genes, and the sign of one hub gene is given by the sign of its strongest connectivity, negative or positive. Mathematically, hub– genes were defined by: 1) k– > k+; and 2) k– > kth with kth = <k–> + 2SDk–, where <k–> indicates the average of the distribution of k– and SDk– the standard deviation of k– for the group. Similarly, hub+ genes were such as: 1) k+ > k–; and 2) k+ > kth. A study of the robustness of the hubs determination is given in the supplementary information (see Supplementary Fig. S1).

Involvement of hub genes in transcriptional regulation.
We investigated the relationship between hub genes and regulation of gene expression by transcription factors or microRNAs (miRNAs) in humans. 1) We obtained transcription factors from the KEGG website (http:/www.genome.ad.jp/, last updated on 10/31/2007). We collected 1,148 gene symbols of transcription factors, which matched with 1,096 probes of the HG-U133A GeneChip. 2) We obtained miRNA target predictions from the TargetScan site (http://www.targetscan.org/, Whitehead Institute for Biomedical Research, Cambridge, MA) (20). This database collects miRNAs families that were conserved during evolution across human, mouse, rat, dog, and chicken and predicts their conserved target sites. We selected the 454 human miRNAs having the most reliable associations between one miRNA and one predicted target (context score percentile >50). We obtained 11,670 miRNA target genes on the HG-U133A GeneChip.

We tested whether the proportions of hub– or hub+ genes within the transcription factors or the predicted targets of each miRNA differed from those obtained by chance using a two-sided Fisher exact test at the significance level of 5%. We used the Benjamini and Hochberg adjusted P values for multiple testing (9). It enabled us to detect whether there were significant enrichments or impoverishments of transcription factors in hub– genes, as well as to detect the miRNAs that significantly targeted more or less hub– genes than by chance. The same analysis was carried out for the hub+ genes. We performed all these computations over the 31 human data groups exhibiting a clear separation of hub– and hub+ genes, taking into account only the probes associated with a gene symbol.

We performed the same analysis in experimentally validated miRNA targets, obtained from the public database TarBase v4.0 (http://www.diana.pcbi.upenn.edu/tarbase.html). These miRNA targets corresponded to 1,048 genes-probes on the HG-U133A GeneChip.

Functional meta-analysis.
To give a functional interpretation to the hub– or hub+ potentiality of genes, we performed a functional meta-analysis using GO annotations. For that, we examined the public data from microarrays for which we had enough groups: HG-U133A in the human (43 groups) and RAE230A in the rat (48 groups). We retained those groups in which the separation of hub– and hub+ genes was evident (31 out of 43 groups in the human and 36 out of 48 in the rat). These groups are marked with OK in Supplementary Table S2.

Defining top– genes and top+ genes in the meta-analysis from mean standard double connectivity.
For the functional meta-analysis, carried out in the human and in the rat, we used the annotations of genes in the GO category "biological process" that were supplied by Affymetrix on 11/5/2007. We found 16,848 genes annotated in humans and 9,107 genes annotated in rats in this GO category. To accommodate the different group sizes, the negative and positive connectivities of each gene were standardized each to have a mean of 0 and an SD of 1. Then, over all the selected groups of humans or rats, we computed the mean standard double connectivity (kms–, kms+) for each gene. We retained the top 1,000 annotated genes that exhibited the best mean negative or positive standard connectivity in humans, and the top 540 annotated genes in rats, i.e., ~ 6% of the number of the annotated genes for both. We named them top– or top+ genes. Homology between human and rat top genes was tested using the HomoloGene database (http://www.ncbi.nlm.nih.gov/sites/entrez?db=homologene).

GO terms significantly enriched in top– or top+ genes.
We performed preliminary studies using bioinformatics Web tools such as GOstat (7) (http://gostat.wehi.edu.au/) and eGOn (8) (http://www.genetools.microarray.ntnu.no/egon/index.php). To analyze a large number of gene lists, we developed our own software in C language. We counted the number of top– genes and top+ genes annotated by each GO term. We selected the GO terms that significantly annotated more numerous top– (or top+) than the other genes, using a two-sided Fisher exact test at the significance level of 10%. We used the Benjamini and Yekutieli adjusted P values for multiple testing in case of dependent data (10). We called them GO terms enriched in top– (or top+) genes. So, four cases could be expected for a GO term, whether it was enriched or depleted in top– genes and whether it was enriched or depleted in top+ genes.

RESULTS AND DISCUSSION

Double connectivity and hub genes.
For the different species and tissues, each scatter plot of the genes as a function of their double connectivity (k–, k+) shows a very characteristic shape. It is formed by two separated "wings," symmetrical with respect to the principal diagonal (Fig. 1A and Supplementary Fig. S2). Figure 1A shows the wings distinguishing three types of genes, namely those with a strong negative connectivity (hub–: green dots), those with a strong positive connectivity (hub+: red dots) and the others (gray dots). Over 70% of the 130 groups tested exhibit this wing-shape gene scatter plot. The remaining 30% exhibit the same symmetry, but poorly separated wings. We studied the distributions of the gene expression variation coefficient according to the quality of wing separation (Supplementary Table S3). In cases of clear wing separation, variation coefficients are higher than in cases of bad wing separation. When the dispersion is weak, the range of regulation is small. Many physiological regulations cannot then be evidenced by correlation, which results in a poor separation of hub– and hub+ genes when plotted.


Figure 1
View larger version (30K):
[in this window]
[in a new window]

 
Fig. 1. Scatter plots of the genes as a function of their double connectivity (k–, k+) in 3 different species and tissues (A): 1) human skeletal muscle (group 49), 2) rat left ventricle (group 1), and 3) mouse liver (group 130), showing the hub– genes in green, the hub+ genes in red, and the neither hub– nor hub+ genes in gray. Mean dual connectivity obtained after 1,000 resamples by bootstrap is shown in black. B: showing examples of the 30 best negatively (green) and positively (red) genes correlated with a given gene (bold black dot): one hub– gene (PSMD12: proteasome 26S subunit, non-ATPase, 12), one hub+ gene (GPR52: G protein-coupled receptor 52), and one nonhub gene (CSK: c-src tyrosine kinase) (group 49).

 
As shown in Fig. 1A, randomizing the data by bootstrap 1) considerably reduced the range of the connectivities, 2) suppressed the wing shape of the scatter plot, and 3) maintained the symmetry with respect to the principal diagonal. These observations indicate that 1) the symmetry around the diagonal is due to a mathematical property, the transitivity, of the correlation, and that 2) the separation (wing shape) of hub– and hub+ genes is due to a biological property of regulation, shown by the marked imbalance between the negative and positive correlations of some genes.

Focusing on a given gene of interest G, Fig. 1B shows, in each example, that the genes most strongly correlated to G exhibit a double connectivity with specific features. When their correlation with G is positive, they make a condensed neighborhood around G on the scatter plot. In contrast, in case of negative correlations with G, they make a condensed neighborhood that is symmetrical with the former. When G is a hub, the best correlations are seen with two sets of genes that are hubs as well. This neighboring feature is partly related to the correlation's transitivity. Indeed, if two genes are perfectly positively correlated (correlation coefficient = 1), they necessarily have similar double connectivity values. Conversely, if two genes are perfectly negatively correlated (correlation coefficient = –1), the negative connectivity of one gene is equal to the positive connectivity of the other gene.

Double connectivity and expression level.
The importance of a gene is usually examined from its RNA expression level, yet its involvement in the gene network is rarely taken into account. The computation of the Pearson correlation between the negative or positive connectivity and the mean expression level did not reveal any clear, consistent correlation across the different groups (Supplementary Table S4). This major result clearly confirms the absence of a clear linear link between the gene expression value and the unsigned connectivity that has already been shown (14). It indicates that the double connectivity does not supply the same information as the expression level, and so offers new ways to analyze and interpret transcriptional data.

General statistics of hub genes properties.
The properties of hub– and hub+ genes are illustrated for one group of rats in Fig. 2 and for each group in Supplementary Table S2. According to our definition criteria, the hub– genes on average represented ~4% of the genes on the microarray for all the groups. There were fewer hub– genes than hub+ genes (~11%). Hub– genes were expressed at significantly higher levels in 91% of the groups.


Figure 2
View larger version (31K):
[in this window]
[in a new window]

 
Fig. 2. Contour plots of the 3-dimensional scatter plots of frequency and mean expression level of the genes as a function of their double connectivity (k–, k+) (in rat left ventricle: group 1). Color quantifies the intensity of gene frequency and mean expression level.

 
Up to now, the concept of hub genes or hub proteins has never been associated with the sign of the pairwise interactions. Using either unsigned gene coexpressions or unsigned protein interactions, several authors pointed out the importance of hub proteins and showed the modular structure of the whole network (6, 13, 16). In addition, the authors stressed the slower probable evolution of hub proteins than less connected ones (6, 13). In this work we separated hub genes into two types. The hub– genes, with the most numerous negative correlations, could mainly be involved in repressive processes, whereas the hub+ genes could mainly be involved in activator processes. In addition, we showed that 1) the genes, correlated or anticorrelated with a given hub gene, were hub genes as well, and that 2) these two sets of hub genes were strongly anticorrelated to each other (data not shown). Therefore, taking the sign of the correlations into account uncovers the clear balance between coactivated or coactivator genes, and coinhibited or coinhibitor genes, that is involved in a general regulation process.

Transcription factors and microRNAs.
Our main finding was evidence of the statistical tendency for hub+ genes to include transcription factors and for hub– genes to include miRNA targets. Table 1 shows that, in 55% of the groups, hub+ genes were found more often in the transcription factors than would happen by chance, whereas hub– genes were underrepresented in 74% of the groups (see Supplementary Table S5). Conversely, more hub– genes than by chance were found in miRNA predicted targets, for 51% (233 of 454) of miRNAs, on average over all the human groups, independently of their pathological state (see Table 1), whereas significantly less hub+ genes than by chance were observed in miRNA targets, for 53% (242 out of 454) of miRNAs (see Supplementary Table S6 for details). These results are also confirmed for experimentally validated targets (Supplementary Table S7).


View this table:
[in this window]
[in a new window]

 
Table 1. Significant enrichment or impoverishment of hub genes in different gene sets in humans: transcription factors, miRNA host genes or miRNA predicted targets

 
Hub genes and top genes.
As the double connectivity of genes varied a lot between groups, the status of hub was variable (Fig. 3A). Only 25% of the genes were hub genes in more than three different groups. The genes that were frequently hub, however, were hub with the same sign. This variability of hub status complicated the meta-analysis, and we preferred to use mean standard double connectivity when defining top– and top+ genes. Figure 3B shows the clear separation between top– and top+ genes. These top genes were hub genes with the same sign in an average of ~7 groups. Thus, the top genes corresponded to those hub genes particularly highly connected in some groups.


Figure 3
View larger version (16K):
[in this window]
[in a new window]

 
Fig. 3. Scatter plot of genes annotated by biological process GO terms as a function of: the number of groups in which each gene was found hub– (nHub–) or hub+ (nHub+) (A), and their mean standard double connectivity (kms–, kms+) (B). Top– genes are plotted in green, top+ genes are in red, and nontop genes are in gray. Only the 31 human data groups for which there was a good separation of hub– or hub+ genes using the HG-U133A GeneChip, were taken into account.

 
Functional meta-analysis.
The functional meta-analysis used data from a range of tissues studied in various conditions. A stability in the functions of the top– and top+ genes in both humans and rats clearly emerges from this diversity. Strikingly, the GO terms corresponding to similar biological functions appeared to be significantly enriched in top genes with the same sign. Table 2 gives some examples obtained from human data: the GO terms concerning "protein degradation" were exclusively enriched in top– genes (asterisk), whereas those concerning "cell communication and social life" were mainly enriched in top+ genes (dagger). So, we grouped all the enriched GO terms that shared similar biological functions into GO groups, detailed in Supplementary Table S8. These we arranged into eight broader functional groups that are enumerated in Table 3. GO terms concerning nontranscriptional processes, protein processes, cell cycle, energy conversion, and cytoskeleton were mainly enriched in top– genes (italics). In contrast, GO terms related to ion transport, cell communication and signal transduction were mainly enriched in top+ genes (underlining).


View this table:
[in this window]
[in a new window]

 
Table 2. Example of GO terms significantly enriched in top genes in humans

 

View this table:
[in this window]
[in a new window]

 
Table 3. Functional groups of GO terms significantly enriched in top genes in humans

 
These points are illustrated in the supplementary lists of the top– and top+ genes that are annotated by some of the GO terms in Table 2 (Supplementary Tables S9–S11). The top– genes annotated by the GO terms concerning "protein degradation," whatever their enrichment, are largely related to ubiquitin-proteasome and proteolysis systems (~100 distinct genes). Inversely, the top+ genes referred instead to proteins with various peptidase or ligase activities that were often specialized and related to cell cycle or signal transduction (see Supplementary Table S9). In the group "cell communication," the GO term (GO:0007264) called "small GTPase-mediated signal transduction" annotated mainly top– genes such as Rab and Rho GTPases, or members of the RAS oncogene family, involved in membrane budding and trafficking (Supplementary Table S10). The GO term (GO:0007186) called "G protein-coupled receptor protein signaling pathway" annotated numerous top+ genes (89 genes), which are associated to various receptors concerned with the activities of the cellular specialization (Supplementary Table S11).

All the above results came from statistical analyses comparing observed proportions of genes to proportions obtained by chance. Therefore, they revealed only general tendencies, in terms of enrichment or impoverishment. These tendencies did not exclude opposite or complementary features. In summary, top– genes enriched the GO terms that essentially characterize processes relative to essential functions common to all cells (protein machinery, cell cycle, and energy conversion), whereas top+ genes enriched the GO terms that essentially characterize specific processes of pluricellular organisms (social life, cell communication, development, and specialized activities specific to cellular differentiation). The results obtained for the other GO categories ("cellular component" and "molecular function") confirmed the coherence of the above results (data not shown).

An identical study was performed in the rat (Supplementary Tables S12 and S13). While 85% of the probe sets from the rat RAE230A GeneChip referred to genes homologous to those of the human HG-U133A GeneChip, only 37% of the top– genes and 12% of the top+ genes identified in rats were homologous to human top– genes and top+ genes, respectively. This result extends again 1) the variability of the top status by species and 2) the clear distinction between top– and top+ genes, since the former appear to be more conserved across the two mammalian species studied. Although a much larger number of GO terms were found enriched in top– or top+ genes in humans, these GO terms shared the same functional biological criteria, and 88% of those found in rats were present in humans. These results confirmed the functional duality of the most negatively and positively connected genes, therefore providing strong evidence for a general organizing principle that might be linked to evolutionary processes.

In fact, the genes involved in the general cell functioning are likely to be more conserved across species than the genes involved in more specialized cellular functions. Such an observation has already been made for the ubiquitin-proteasome system, which represents the major pathway for intracellular protein degradation in eukaryotes. In particular, the 26S proteasome, responsible for the proteolysis process, has been shown to be highly conserved from yeast to mammals (21, 38). We found that numerous genes of this system were top– genes, suggesting that top– genes are likely to be conserved genes. This point is coherent with the fact that the top– genes were predominantly involved in general cellular functions. Finally, based on these observations, we could ask whether miRNA predicted targets are mainly concerned with general cellular functions.

Perspectives
The double connectivity, computed within a coexpression network for a group of biological conditions, provides a sense of the balance between the positive and negative regulations of the genes. As in the case of gene expression level, the connectivity is quite variable according to the tissue and the conditions of observation. However, these two parameters bring information of a different kind. Because the general knowledge of gene expression regulation is still very incomplete, it is impossible to have gold standard data that would conclusively validate the information shown by double connectivity. It should be borne in mind that, as with all methods based on network modeling, any results obtained from double connectivity require corroboration in subsequent analyses of biological networks to be definitively validated (25). The highly significant GO enrichments indicated by our meta-analysis are a powerful incentive to improve the interpreting methods that combine gene expression and connectivity. To achieve this aim, there are some conditions to be fulfilled: 1) sufficient size of groups, 2) well-controlled experimental conditions to allow differential analyses, and 3) a focus on fields already well studied and widely referenced. Under these conditions, the Pearson coefficient will have more statistical meaning, thereby reinforcing the accuracy of the analysis of differential connectivity. Our method works as a filter to select genes of interest, and the analyses of differential double gene connectivity must be combined with other methods such as those using differential gene expression. Furthermore, the interest in genes that are strongly connected but weakly expressed could provide evidence of regulations that have often been overlooked. From this perspective, it would be interesting to make use of new microarray technology that permits detection of low expression level (11). Such studies would improve the analysis and further interpretation of negative transcriptional regulation.

It is particularly exciting to see that hub genes are present in different proportions according to their sign, when they are transcription factors or miRNA targets. Concerning the transcription factors, the enrichment in hub+ genes in a large number of human data groups reveals a predominant involvement of transcription factors in activator processes. Furthermore, the enrichment of miRNA targets in hub– genes suggests that some hub– genes are involved in transcriptional inhibitory processes, which are induced by miRNAs (24). So our results reinforce the challenging recent findings about posttranscriptional gene expression regulation by noncoding RNAs (20, 27). They thus provide a framework to analyze in detail the involvement of genes in transcriptional regulation processes, as much activator as inhibitor processes.

Conclusion
Here we have introduced for the first time the concept of double connectivity using the sign of the correlation. It turns out to be a powerful tool for analyzing both the transcriptional regulatory properties of the genes and the functional organization of the coexpression network. This double connectivity reveals, in mammals, a very basic gene regulation rule between genes involved in specialized function and genes involved in general cellular function. This simple and general rule links a mathematical topological property (double connectivity) to a biological functional property of the genes (general or specialized cellular function). It might result from general mechanisms of the eukaryote evolution involving transcription factors and miRNAs. Thus, the analysis of double connectivity decisively enriches the analysis and interpretation of microarray data and should be useful for interpreting future data in a new way.

GRANTS

The Affymetrix experiments (6 groups out of 130) carried out in our laboratory were supported by grants from La Fondation de France and grants from Le Ministère de la Recherche.

ACKNOWLEDGMENTS

We thank Robert Grant for the translation and editing of the paper, Dr. Michel Beylot for validating our functional biological criteria, Prof. Alain Puisieux, Prof. Charles Dumontet, Dr. Lilian McGregor, and Dr. John McGregor for their permanent interest in our work. We are also grateful to all the researchers that made their microarray data publicly available, to bioinformatics web resource developers, and to the GO consortium. Finally, we thank the two anonymous reviewers for their pertinent comments, which aided us to improve the manuscript.

FOOTNOTES

Address for reprint requests and other correspondence: M.-P. Gustin, Université Lyon 1, Faculté de Pharmacie, EA4173 - Inserm ERI 22, 8 Ave. Rockefeller, 69373 Lyon Cedex 08, France (e-mail: gustin{at}sante.univ-lyon1.fr).

The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1 The online version of this article contains supplementary material, which includes the list of selected data, the detailed results, the list of the GO functional groups carried out during the meta-analysis, and some lists of top genes according to their function. Back

REFERENCES

  1. Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation and consensus. Nat Genet 7: 55–65, 2006.[CrossRef][Web of Science]
  2. Bansal M, Belcastro V, Ambesi-Impiombato A, di Bernardo D. How to infer gene networks from expression profiles. Mol Syst Biol 3: 78, 2007.[Medline]
  3. Barabasi AL, Oltvai ZN. Network biology: understanding the cell's functional organization. Nat Rev Genet 5: 101–113, 2004.[CrossRef][Web of Science][Medline]
  4. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Edgar R. NCBI GEO: mining tens of millions of expression profiles–database and tools update. Nucleic Acids Res 35: D760–765, 2007.[Abstract/Free Full Text]
  5. Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera RD, Califano A. Reverse engineering of regulatory networks in human B cells. Nat Genet 37: 382–390, 2005.[CrossRef][Web of Science][Medline]
  6. Batada NN, Hurst LD, Tyers M. Evolutionary and physiological importance of hub proteins. PLoS Comput Biol 2: e88, 2006.[CrossRef][Medline]
  7. Beissbarth T, Speed TP. GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 20: 1464–1465, 2004.[Abstract/Free Full Text]
  8. Beisvag V, Jünge FK, Bergum H, Jølsum L, Lydersen S, Günther CC, Ramampiaro H, Langaas M, Sandvik AK, Laegreid A. GeneTools—application for functional annotation and statistical hypotheses testing. BMC Bioinformatics 7: 470, 2006.[CrossRef][Medline]
  9. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statist Soc B 57: 289–300, 2005.
  10. Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Statist 29: 1165–1188, 2001.[CrossRef]
  11. Bosotti R, Locatelli G, Healy S, Scacheri E, Sarori L, Mercurio C, Calogero R, Isacchi A. Cross platform microarray analysis for robust identification of differentially expressed genes. BMC Bioinformatics 7, Suppl 1: S5, 2007.
  12. Butte AJ, Tamayo P, Slonim D, Golub TR, Kohane IS. Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance network. Proc Natl Acad Sci USA 97: 12182–12186, 2000.[Abstract/Free Full Text]
  13. Carlson MRJ, Zhang B, Fang Z, Mishel PS, Horvath S, Nelson SF. Gene connectivity, function, and sequence conservation: predictions from modular yeast co-expression networks. BMC Genomics 7: 40, 2006.[CrossRef][Medline]
  14. Carter SL, Brechbühler CM, Griffin M, Bond AT. Gene co-expression network topology provides a framework for molecular characterization of cellular state. Bioinformatics 20: 2242–2250, 2004.[Abstract/Free Full Text]
  15. Cerutti C, Kurdi M, Bricca G, Hodroj W, Paultre CZ, Randon J, Gustin MP. Transcriptional alterations in the left ventricle of three hypertensive rat models. Physiol Genomics 27: 295–308, 2006.[Abstract/Free Full Text]
  16. Dong J, Horvath S. Understanding network concepts in modules. BMC Systems Biology 1: 24, 2007.[CrossRef][Medline]
  17. Dopazo J. Functional interpretation of microarray experiments. OMICS 10: 398–410, 2006.[CrossRef][Web of Science][Medline]
  18. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95: 14863–14868, 1998.[Abstract/Free Full Text]
  19. Goutsias J, Lee NH. Computational and experimental approaches for modelling gene regulatory networks. Curr Pharm Des 13: 1415–1436, 2007.[CrossRef][Web of Science][Medline]
  20. Grimson A, Farh KK, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 27: 91–105, 2007.[CrossRef][Web of Science][Medline]
  21. Hanna J, Finley D. A proteasome for all occasions. FEBS Lett 581: 2854–2861, 2007.[CrossRef][Web of Science][Medline]
  22. Ihmels JH, Bergmann S. Challenges and prospects in the analysis of large-scale gene expression data. Brief Bioinform 5: 313–327, 2004.[Abstract/Free Full Text]
  23. Lee HK, Hsu AK, Sadjak J, Qin J, Pavlidis P. Coexpression analysis of human genes across many microarray data sets. Genome Res 14: 1085–1094, 2004.[Abstract/Free Full Text]
  24. Lim NP, Lau NC, Garrett-Engele P, Grimson A, Schelter JM, Castle J, Bartel DP, Linsley PS, Johnson JM. Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature 433: 769–773, 2005.[CrossRef][Medline]
  25. Ma'ayan A, Blitzer RD, Iyengar R. Towards predictive models of mammalian cells. Annu Rev Biophys Biomol Struct 34: 319–349, 2005.[CrossRef][Web of Science][Medline]
  26. Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7, Suppl 1: S7, 2006.
  27. Mattick JS, Makunin V. Non-coding RNA. Hum Mol Genet 15: R17–R29, 2006.[Abstract/Free Full Text]
  28. Parkinson H, Kapushesky M, Shojatalab M, Abeygunawardena N, Coulson R, Farne A, Holloway E, Kolesnykov N, Lilja P, Lukk M, Mani R, Rayner T, Sharma A, William E, Sarkans U, Brazma A. ArrayExpress–a public database of microarray experiments and gene expression profiles. Nucleic Acids Res 35: D747–D750, 2007.[Abstract/Free Full Text]
  29. Patil KR, Nielsen J. Uncovering transcriptional regulation of metabolism by using metabolic network topology. Proc Natl Acad Sci USA 102: 2685–2689, 2005.[Abstract/Free Full Text]
  30. Purmann A, Toedling J, Schueler M, Carninci P, Lehrach H, Hayashizaki Y, Huber W, Sperling S. Genomic organization of transcriptomes in mammals: coregulation and cofunctionality. Genomics 89: 580–587, 2007.[CrossRef][Web of Science][Medline]
  31. Schulze-Kremer S. Ontologies for molecular biology. Pac Symp Biocomput 695–706, 1998.
  32. Stelling J, Klamt S, Bettenbrock K, Schuster S, Gilles ED. Metabolic network structure determines key aspects of functionality and regulation. Nature 420: 190–193, 2002.[CrossRef][Medline]
  33. The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology. Nat Genet 25: 25–29, 2000.[CrossRef][Web of Science][Medline]
  34. The Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 32: D258–D261, 2004.[Abstract/Free Full Text]
  35. Ucar D, Neuhaus I, Ross-MacDonald P, Tilford C, Parthasarathy S, Siemers N, Ji RR. Construction of a reference gene association network from multiple profiling data: application to data analysis. Bioinformatics 23: 2716–2724, 2007.[Abstract/Free Full Text]
  36. Van Noort V, Snel B, Huynen MA. The yeast coexpression network has a small-world, scale-free architecture and can be explained by a simple model. EMBO Rep 5: 280–284, 2004.[CrossRef][Web of Science][Medline]
  37. Watts DJ, Strogatz SH. Collective dynamics of 'small-world' networks. Nature 393: 440–442, 1998.[CrossRef][Medline]
  38. Wolf DH, Hilt W. The proteasome: a proteolytic nanomachine of cell regulation and waste disposal. Biochim Biophys Acta 1695: 19–31, 2004.[Medline]
  39. Yook SH, Oltvai ZN, Barabasi AL. Functional and topological characterization of protein interaction networks. Proteomics 4: 928–942, 2004.[CrossRef][Web of Science][Medline]
  40. Zhang B, Horvath S. A General Framework for Weighted Gene Co-Expression Network Analysis. Stat Appl Genet Mol Biol 4: 17, 2005.



This article has been cited by other articles:


Home page
BioinformaticsHome page
J. D. Wren
A global meta-analysis of microarray expression data to predict unknown gene functions and estimate the literature-data divide
Bioinformatics, July 1, 2009; 25(13): 1694 - 1701.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplemental Figures and Tables
Right arrow All Versions of this Article:
34/1/34    most recent
00008.2008v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Gustin, M.-P.
Right arrow Articles by Cerutti, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gustin, M.-P.
Right arrow Articles by Cerutti, C.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Visit Other APS Journals Online
Copyright © 2008 by the American Physiological Society.