Physiol. Genomics Ad Instruments
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Physiol. Genomics 33: 78-90, 2008. First published December 27, 2007; doi:10.1152/physiolgenomics.00167.2007
1094-8341/08 $8.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
33/1/78    most recent
00167.2007v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (2)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Rodenburg, W.
Right arrow Articles by Keijer, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rodenburg, W.
Right arrow Articles by Keijer, J.
Received 25 July 2007; accepted in final form 20 December 2007.
Physiological Genomics 33:78-90 (2008)
1094-8341/08 $8.00 © 2008 American Physiological Society

A framework to identify physiological responses in microarray-based gene expression studies: selection and interpretation of biologically relevant genes

Wendy Rodenburg 1,2,3,*, A. Geert Heidema 4,5,6,*, Jolanda M. A. Boer 4, Ingeborg M. J. Bovee-Oudenhoven 1,3, Edith J. M. Feskens 6, Edwin C. M. Mariman 5 and Jaap Keijer 1,2

1 TI Food and Nutrition, Wageningen, The Netherlands
2 RIKILT-Institute of Food Safety, Wageningen, The Netherlands
3 NIZO Food Research, Ede, The Netherlands
4 National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
5 Department of Human Biology, Maastricht University, Maastricht, The Netherlands
6 Division of Human Nutrition, Wageningen University and Research Centre, Wageningen, The Netherlands


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 REFERENCES
 
In whole genome microarray studies major gene expression changes are easily identified, but it is a challenge to capture small, but biologically important, changes. Pathway-based programs can capture small effects but may have the disadvantage of being restricted to functionally annotated genes. A structured approach toward the identification of major and small changes for interpretation of biological effects is needed. We present a structured approach, a framework, that addresses different considerations in 1) the identification of informative genes in microarray data sets and 2) the interpretation of their biological relevance. The steps of this framework include gene ranking, gene selection, gene grouping, and biological interpretation. Random forests (RF), which takes gene-gene interactions into account, is examined to rank and select genes. For human, mouse, and rat whole genome arrays, less than half of the probes on the array are annotated. Consequently, pathway analysis tools ignore half of the information present in the microarray data set. The framework described takes all genes into account. RF is a useful tool to rank genes by taking interactions into account. Applying a permutation approach, we were able to define an objective threshold for gene selection. RF combined with self-organizing maps identified genes with coordinated but small gene expression responses that were not fully annotated but corresponded to the same biological process. The presented approach provides a flexible framework for biological interpretation of microarray data sets. It includes all genes in the data set, takes gene-gene interactions into account, and provides an objective threshold for gene selection.

gene selection; t-test; random forest; biological processes; transcriptomics


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 REFERENCES
 
TRANSCRIPTOME ANALYSIS using whole genome microarrays is an elegant and widely used approach for identification of the molecular mechanisms underlying diet-induced cellular or physiological changes. Both major effects as well as a wide overview of more subtle changes can be obtained. While the major differences are important for classification and identification of individual response genes, the smaller changes are an integral part of the physiological response and are essential for the identification of the physiological processes that are affected by the challenge or intervention. This is especially true in nutrition, where dietary interventions result in modest, but biologically important gene expression changes (1, 12, 33). In the medical field it is also increasingly recognized that the more subtle changes contribute importantly to the outcome (30, 31, 38).

To translate microarray data into functional physiological information, a set of genes with the maximum amount of information and a minimum of noise is needed. Although a large number of methods exist to select genes from microarray data sets, most methods aim to identify the smallest possible set of genes that still can discriminate, for example, to classify malignancies, predict therapeutic outcomes, or diagnose physiological responses (7, 38). These methods may not always be appropriate to select a larger set of genes for biological interpretation that includes the smaller changes. These smaller changes are part of the response to medication or disease, which occurs through the interactions of multiple genes, via signaling pathways or other functional relationships. Small changes, variability among individuals, and the often small sample sizes on one hand and the large number of genes tested on the other make it difficult to distinguish true differences from noise (30, 50). Careful planning and execution of microarray experiments nowadays offers technically high-quality data, with a minimum of noise. However, the combination of small gene expression changes and the needed selection of the largest informative set of genes demands sophisticated selection methods. A structured framework that incorporates the different considerations in the identification of informative genes and the interpretation of their biological relevance is needed. Here we describe the steps of such a framework and address the following considerations: gene ranking, gene selection, gene grouping, and biological interpretation.

Gene Ranking
To identify genes of relevance within the total data set, genes are ranked by a measure of importance. As such, fold change has often been used. However, fold change is not a reliable measure because it does not take variability in the data into account (2, 48). Therefore, other measures that do take variability into consideration should be used. The most commonly used approach for gene selection in two-class microarray studies that takes variability into account is the conventional t-test, while ANOVA is used for multiclass studies. Genes are tested independently, and a P value is assigned to each gene, which can be used to rank genes by their importance. However, by ranking genes by a univariate test statistic such as the t-statistic, all genes in the data set are assumed to be independent and gene-gene interactions are not taken into account. In biological responses, gene-gene interactions will take place because these responses often result from coregulation of genes (4, 40). Consequently, by testing each gene independently, weak to small genetic effects that only in interaction make an important distinction between different study groups will not be detected by using a univariate test.

Gene Selection
For functional interpretation the total ranked gene set can be used, but this will include noise, and selection of the most important genes is needed. The difficulty in gene selection is how to define the threshold. The threshold for selecting the differentially expressed genes influences the functional interpretation. Selection of genes is to some extent subjective, because there are no clear thresholds for existing methods. For the t-test, the threshold choice is flexible and the significance level is chosen by the researcher (3, 8). However, a threshold should preferably be defined in an objective way. Procedures can be applied to correct for multiple testing, such as the family-wise error rate (FWER) or the false discovery rate (FDR) (20, 46). However, these procedures can be overly stringent, resulting in identification of only the most important changes and possibly discarding other relevant genes (31).

Gene Grouping
Each probe on a microarray corresponds to a specific nucleotide sequence, which represents a specific gene. Most genes known to be involved in a functional category are annotated in annotation databases, such as the Gene Ontology (GO) database (17), Kyoto Encyclopedia of Genes and Genomes (KEGG) (22), or Entrez Gene (23). Whole genome microarrays contain annotated genes as well as nonannotated genes. Although the extent to which spots on whole genome microarrays are annotated has not exactly been established, many known genes are not annotated in functional analysis tools, for example, GO annotated, and are thus lost for biological interpretation when a pathway program uses the GO database as source (15, 23). However, the nonannotated genes may provide important new targets. Clues on the function of these genes can be obtained by establishing similarities in expression behavior to known genes. Genes with similar gene expression can be identified with self-organizing maps (SOM) and hierarchical clustering (35, 44, 47). SOM has the advantage that it provides an ordering of clusters, whereby each cluster consists of a group of genes with similar gene expression profiles. Grouping based on similarity in expression behavior is also useful for functional interpretation of known genes.

Biological interpretation is the final step in this framework. A useful way to interpret microarray data is pathway analysis. In pathway analysis the effects of treatment on biological processes or coregulated gene sets are studied, rather than effects on individual genes (23, 49). A commonly used approach is to import a list of genes that meets the threshold criteria into a pathway program, such as the freely available ErmineJ (24), GeneMapp, David/EASE, SAFE (3), or PLAGE (45) or commercially available programs like Metacore (16) or Ingenuity. These programs search through public or private databases to link related genes that are grouped in biological processes.

Recently, new methods have been developed for functional interpretation that circumvent the need to preselect genes (37). One of these methods is gene set enrichment analysis (GSEA) (43). This method enables detection of important pathways where all genes in a predefined set (for instance a GO category) change in a coordinated manner (29, 30). This is highly relevant for studies where subtle, but coordinated changes in expression can be expected. However, GSEA may have the disadvantage that it is restricted to, and therefore only informs about, functionally annotated genes. Thus not all information that is available in the data set is used. Nevertheless, the application of GSEA has shown that small effects can be captured when coordinate gene expression changes are taken into account (43).

In this study we describe a framework for functional interpretation of microarray-based expression studies using two real gene expression data sets. For gene ranking and selection, we have examined the usefulness of random forests (RF) (6). RF is one of the statistical methods that have been developed to select genes from large data sets containing many variables in small sample sizes. RF and other supervised methods like support vector machines (SVM) and discriminant analysis (DA) have mainly been used to select genes that provide the best classification performance for diagnostic purposes (see, e.g., Refs. 21, 39). In microarray studies, RF was shown to outperform other classification methods, especially when the number of classes is moderate (13, 25). RF could also be a suitable tool to rank and select a larger subset of genes for further interpretation, because it has many advantages (13). One major advantage of RF is that it provides an importance measure for each gene, which can be used to rank the genes. Furthermore, the advantage of this importance measure is that it takes gene-gene interactions in the ranking of genes into account. In this way, RF is able to capture not only the main effects in a data set but also the variables with weak to small genetic effects that mainly contribute by interactions with other genes. Interaction between genes increases the importance of the individual interacting genes, making them more likely to be given high importance relative to other genes. Genes with a higher importance index (Im) are more associated with differences resulting from the treatment. A simulation study has been performed, showing that RF outperforms a univariate method (27). This study showed that the more interactions are present, the better RF performs compared with a univariate method. Because RF takes gene-gene interactions into account in the ranking of genes, this method was applied within this framework as a tool to rank genes at the first step. However, RF does not provide a threshold to define which genes should be selected for further interpretation. Therefore, after applying RF to rank genes by their Im, we examined an approach to define a threshold for the genes ranked by RF to select biologically important genes in an objective way. After selection, genes were clustered by SOM, which clusters genes with similar gene expression in ordered profile groups. The advantage of combining results obtained with SOM and information obtained at previous steps is that insight can be obtained as to whether genes within the same profile contribute by their main effect and/or whether interaction effects are present and whether profiles containing relevant biological information are obtained. Finally, for each gene expression data set, the selected genes obtained by RF were incorporated in pathway programs (Metacore and ErmineJ) and compared with the results obtained with GSEA. Together this provides a stepwise framework focusing on the different considerations in the identification of informative genes and the interpretation of their biological relevance.


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 REFERENCES
 
Data Sets
To illustrate and examine the framework considerations, we used two whole genome gene expression data sets obtained from the same dietary study. The animal welfare committee of Wageningen University approved the experimental protocol. In this study, two groups of Wistar rats were fed different diets for 2 wk. One group of rats received a control diet (n = 12) and the other an experimental diet (n = 12). The experimental diet was identical to the control diet but additionally contained fructo-oligosaccharides. Detailed analysis of the effects of the diet is the subject of another paper. The two data sets were obtained from two different tissues, colon and cecum. RNA from colon mucosa and cecum mucosa was isolated, reverse transcribed into cDNA, labeled, and individually hybridized to Agilent-Whole Rat Genome Microarrays (G4131A). Labeling was performed by incorporating Cy5 for individual samples and Cy3 for pooled RNA. Hybridization and washing were carried out according to Agilent protocols. A total of 24 arrays for colon were analyzed; one array did not pass the quality controls based on MA-plot and signal intensity distribution (2, 41). Therefore, the colon data set contained 23 arrays in total. The cecum data set contained 22 arrays in total, since two cecum RNA samples were excluded because of poor quality of RNA. We preprocessed the microarray data sets as described previously (34). Only genes with an average signal 1.5 times above the background were taken into account for further data analysis, equal to 28,180 genes for colon and 21,049 genes for cecum. Gene expression values were log-transformed before statistical analyses were performed. The data have been deposited in NCBI's Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) and are accessible through GEO Series accession numbers GSE5943 and GSE8587.

Statistical Analyses
t-Test.
t-Tests to obtain t-statistics and corresponding P values for the differences in mean gene expression between the two treatment groups were performed with the program GeneMaths XT (Applied Math, Sint-Martens-Latem, Belgium). Within the same program FDR analyses according to the Benjamini and Hochberg procedure (20) were performed.

Random forests.
In RF a group of tree-based models (the forest) can be used to rank genes with an important contribution to the treatment variable. Each tree starts with the total data set, which is recursively split into smaller and more homogeneous groups to fit models for predicting the different treatment groups from the selected genes. Within the forest, different trees are obtained by bootstrap sampling and random subset selection. In more detail, each tree is constructed from a bootstrap sample of the total data set. A bootstrap sample is obtained by sampling observations (e.g., rats) from the original data set with replacement. The bootstrap sample contains the same number of observations as the original data set, but some observations are sampled more than once, while others are left out. The sampled observations are used to construct the tree, whereas a class prediction is obtained for each left-out observation, based on its gene expression values. A prediction for the forest is obtained by aggregating the predictions over all trees for which the observation was left out. The prediction error of the forest is then the proportion of misclassified samples, indicating the performance of the forest to correctly predict the class labels of the different observations. For each split in a tree, the gene that gives the best split is not selected from the total set of genes but from a random subset of genes. The number of randomly selected genes that is used to be searched through for the best split is referred to as mtry. RF performance is usually not sensitive to this parameter, and it is suggested to use Formula as a default value for mtry (6, 26). Comparing the default value and values lower and higher than the default for both colon and cecum, we obtained similar prediction errors for different mtry values (data not shown). Therefore, default values for mtry (Formula) were chosen for both colon (167 genes) and cecum (145 genes) to perform the RF analyses.

More important genes will discriminate better between the treatment groups and will therefore be present in most of the trees and more often selected at a split close to the total sample. On the other hand, less important genes will be less present in the different trees and selected at splits farther from the total sample. Importance of genes is defined by a measure referred to as the importance index, Im. For each gene, this Im is obtained by comparing the predictive performance of the forest for all genes with the predictive performance of the forest in which the values of the gene are randomly permuted in the trees for the left-out observations. Larger differences in the predictive performance give a larger Im, indicating more important genes. By permuting the values for one gene, not only is the effect of this gene taken into account, but also all possible interactions of this gene with other genes. Interactions between genes increase the Im for each of the genes that are part of the interaction. In this way, RF takes interactions between genes into account. Several measures of importance are available (5, 26). To perform the RF analyses we used the scaled mean decrease in classification accuracy. Genes are ranked according to their importance. To obtain stable estimates of the Im, large numbers of trees in the forest are needed (26, 27). Also, to capture as many important interactions as possible, huge numbers of trees are required. RF does not overfit; therefore we performed the analyses with a large number of trees (40,000). We used all genes in the data set in the analysis, and Im was used as measure to rank the genes.

To obtain a threshold for selection of genes for subsequent interpretation, the permutation test (9, 28) was applied. We used 100 permutation data sets, in which the group labels are randomly permuted. For each permutation data set, RF analysis was performed with the same parameter settings as for the observed data set. Next, for each permutation data set Im values for the genes were obtained and genes were ranked. The distribution of the Im values derived from the permutation data sets indicates how the Im values of the genes behave in the absence of a true association with the treatment. To define the threshold for selecting genes, two approaches were taken. The first approach was to determine the value of Im where the Im of the observed data set was equal to, or lower than, the Im for at least 1 of the 100 permutation data sets. This corresponds to a significance level of P < 0.01. The second approach to define the threshold, which is explained and illustrated at the GeneSrF website (18), was to determine the number of genes with Im larger than the mean value of Im for the first ranked gene obtained from the 100 permutation data sets. However, this second approach yielded only a small number of genes, 11 for colon and 20 for cecum, with highly stringent P values of 7 x 10–7 for colon and 9 x 10–6 for cecum. Since this limited number of genes does not provide sufficient information for pathway analysis, we only used the first approach.

To examine whether RF provides reproducible results over different analyses, we performed several analyses (runs), each time using the same parameter settings but a different seed value. The seed value controls the random number generator, and different seed values generate different forests. The results can be repeated if the same seed value is used. We examined the reproducibility of RF by comparing the Im of the genes for different runs. Each run can return slightly different results because in RF each tree is constructed on a bootstrap sample of the observations (rats), and at each split of the tree the best discriminating gene is selected from a random subset of genes (mtry).

The permutation test that was used to determine the threshold of the Im was also used to obtain the significance of the prediction error of the RF model. For each permuted data set, a prediction error was obtained by RF. The proportion of permutation data sets with a prediction error equal to or lower than the prediction error of the RF model of the observed data set provided the significance of the model.

Software for RF is freely available, including R-packages (10, 26, 36, 42) and the original Fortran code (5). For analyses with RF we have applied the R-package randomForest to obtain the Im for the different genes.

Gene Grouping: SOM
For the gene sets selected with the obtained RF threshold (935 genes in colon, 165 genes in cecum), SOM analyses were performed, in which genes with similar expression are grouped into gene expression profiles. We chose the number of profiles based on the number of genes per profile we expected to be biologically related, and it was therefore set at a mean of 10 genes per profile. This corresponds to 90 SOM profiles for colon and 16 for cecum. To distinguish between genes that mainly contribute by their interaction effect or their main effect, genes selected by RF were compared with the same number of genes ranked by t-test. We explored whether profiles consisting of genes only selected by RF were present, which indicate profiles consisting of gene-gene interaction effects.

To perform SOM analysis, both commercial (e.g., GeneMaths XT) and free open-source [e.g., Orange machine learning software (11) at http://www.ailab.si/orange] are available. In this study we used GeneMaths XT (Applied Math) software packages to obtain the SOM profiles.

Biological Interpretation: Pathway Analysis
For the genes selected by RF, we performed pathway analyses for biological interpretation. The pathway results obtained for genes selected by RF were compared with pathway results obtained for the same number of genes selected by t-test, to ensure comparability. For pathway analysis we used the freely available software ErmineJ (24) and the commercial program Metacore (16). ErmineJ is a web-based application for identification of GO processes on input gene sets. Metacore is a package of GeneGo (St. Joseph, MI).

In ErmineJ we used overrepresentation analysis (ORA); in Metacore GO processes were used for pathway analysis. For both ErmineJ ORA analysis and Metacore GO processes, gene sets existing of 5–250 genes were tested. In both analyses, gene lists selected by RF or t-test were classified into GO processes. These processes were ranked according to their P value, which represents the probability that a particular process is selected by chance. Each pathway program uses different statistical tests to calculate these probabilities; this issue is beyond the scope of this paper and is discussed by others (14, 19, 23). For both programs we selected pathways with two selection criteria: 1) the pathways should have a P < 0.001, and 2) the pathways should include at least three selected genes.

We also analyzed which biological pathways were enriched with GSEA (43). In GSEA, enrichment of genes in a gene set is based on the ranking of the genes within the whole data set (37). We included functional c2 gene sets originated from KEGG, GenMapp, and BioCarta with 5–500 genes with FDR q value <0.25 and ranked on normalized enrichment score (NES) and nominal P value.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 REFERENCES
 
Whole Genome Arrays Are Not Fully Examined in Pathway Analysis Programs
Whole genome microarray analysis combined with pathway analysis is an attractive approach to identification of the effects of an intervention, but the analysis is limited to those genes that are annotated in the database used by the program. To assess completeness of annotation of whole genome arrays in pathway programs, we examined first the extent to which genes were incorporated in the analysis in three different pathway programs, Metacore (GeneGo), ErmineJ, and GSEA. This was performed for the two most widely used array platforms, Agilent and Affymetrix, and for three different species, human, mouse, and rat. Only 23–48% of the probes on whole genome microarrays are translated to functional categories by these programs (Table 1). ErmineJ was not included because it does not provide the number of incorporated genes. Annotation in this program is based on the specific GO term(s) linked to the gene, which for the Agilent 44K rat array applies to 7,437 genes (18%). Altogether, analysis only based on functional annotation and co-occurrence in gene sets leaves out at least half of the microarray data, and thereby potential new targets.


View this table:
[in this window]
[in a new window]

 
Table 1. Percentage of probes from whole genome microarrays identified by pathway programs Metacore and GSEA

 
Information Content of Gene Expression Data Sets
In both gene expression data sets (p = 28,180 for colon, p = 21,049 for cecum) the extent of differential gene expression induced by the dietary treatment was small: in colon, 179 genes were differentially expressed with a change of >1.5-fold, while in cecum the number of differentially expressed genes was 164. Based on fold change the two data sets are similar in number of expressed genes and magnitude of differential expression (fold change). However, the data sets differed in the significance of expression, with the colon data set containing substantially more significantly differentially expressed genes (Table 2). With a t-test threshold of P < 0.001, 803 genes were differentially expressed in the colon data set, while 123 genes were differentially expressed in cecum. Application of FDR using a threshold of q < 0.01 resulted in selection of 231 genes in colon and 19 genes in cecum. RF models were found to be significant in both colon (P < 0.02) and cecum (P < 0.01), indicating that gene expression differences were present.


View this table:
[in this window]
[in a new window]

 
Table 2. Characteristics of colon and cecum data sets

 
Gene Ranking: Taking Gene-Gene Interactions into Account
Genes were ranked according to their Im obtained by RF. To obtain insight into the ranking of genes by RF, we compared the results from RF with the ranking of genes by the commonly used t-test. For the genes present in the data set the absolute values for the t-statistics are plotted against the Im of RF (see Fig. 1). In both data sets, Im obtained from RF does show a similar trend with t-statistics. Both RF and t-test rank genes in common (Fig. 1, box A), indicating strong gene effects related to the treatment. Genes ranked high by RF, compared with t-test (Fig. 1, box B), are indicative of weak gene effects that are likely to be related to the treatment in interaction with other genes.


Figure 1
View larger version (12K):
[in this window]
[in a new window]

 
Fig. 1. Plot of absolute value of t-statistics against importance index (Im) for colon (top) and cecum (bottom) data sets. Box A, genes highly ranked by both random forests (RF) and t-test. Box B, genes highly ranked exclusively by RF.

 
Gene Selection: Defining an Objective Threshold
We aimed to define an objective threshold for Im by using a permutation approach (see METHODS). This permutation test provides an indication of where noise starts to interfere with real gene effects. For both colon and cecum the highest-ranked genes from the observed data set had Im values higher than the ranked Im values obtained from the permuted data sets (see Fig. 2). To define the threshold, we determined the Im value where genes in the observed data set have equal or higher Im values relative to the genes in the permuted data sets. The point at which the Im values of the observed data set equaled that of at least 1 of the 100 permutated data sets was chosen as threshold, which is equal to a significance level for the Im of P < 0.01.


Figure 2
View larger version (14K):
[in this window]
[in a new window]

 
Fig. 2. Genes, of 100 random sets (black lines) and real sets with different seed values (colored lines), ranked by Im values for colon (A) and cecum (B) data sets.

 
We performed 15 runs (each with a different seed value) resulting in very similar thresholds (results not shown). For colon a mean threshold of Im = 0.906 and for cecum a mean threshold of Im = 1.753 were obtained. For each run, the genes with Im values above the threshold were determined. Genes with higher Im values were always selected over the different runs. However, genes with ranking close to the threshold (lower Im values) were not selected over all runs; thus the selection of these genes varied between different runs. We chose to include all genes that were selected in at least one run and not only the overlapping genes, because the number of genes that were additionally selected over increasing numbers of runs decreased rapidly (Table 3; Fig. 3, A and B, for colon and cecum, respectively). This likely indicates that additionally selected genes are truly affected by the treatment and not randomly selected noise. After 10 runs for colon and 11 runs for cecum, the proportion of genes additionally selected became and remained <2%. Therefore, more runs were not performed. Combining the results of different runs resulted in the selection of 935 genes above the threshold for colon and 165 genes above the threshold for cecum. These genes were selected as the set of genes being related to the treatment.


View this table:
[in this window]
[in a new window]

 
Table 3. Selection of genes by RF threshold

 

Figure 3
View larger version (15K):
[in this window]
[in a new window]

 
Fig. 3. Genes selected by RF thresholds Im > 0.906 for colon and Im > 1.753 for cecum. The total number of selected genes is plotted against the number of runs.

 
Comparison of Gene Selection by RF, t-Test, and Fold Change
Genes selected based on the RF threshold (935 genes in colon and 165 genes in cecum) were compared with an equal number of genes selected by t-test. For t-test this resulted in inclusion of genes with P < 0.0014 (q < 0.04) for colon and P < 0.0018 (q < 0.23) for cecum. In colon 679 genes (72.6%) and in cecum 112 genes (67.9%) overlapped between RF and t-test. As shown in the volcano plots (Fig. 4), gene sets selected by RF include the most significant genes based on t-test, as was also seen in Fig. 1. Furthermore, the volcano plots show that RF and t-test also differ in selection of genes. Several genes with high fold change, which would not have been selected based on t-test alone, are also selected by RF.


Figure 4
View larger version (23K):
[in this window]
[in a new window]

 
Fig. 4. Volcano plots for colon (A) and cecum (B). Fold change is plotted against P value. Genes selected by RF are shown in black (935 for colon, 165 for cecum).

 
For both data sets, the set of selected genes by RF were used for subsequent gene grouping and biological interpretation.

Gene Grouping: Obtaining Gene Expression Profiles by SOM
For grouping of the genes selected by RF, we applied SOM analysis to find groups of highly correlated genes. While SOM is mostly used to identify patterns in time or as a result of multiple treatments (44), it will also identify patterns of coordinate changes over a number of animals. In Fig. 5, A and B, the groups of genes with similar expression are shown for colon and cecum, respectively. For both colon and cecum, profiles are present that consist mainly of genes that are selected exclusively by RF (light gray in Fig. 5). SOM analyses for genes selected by the t-test did not result in profiles consisting of genes exclusively selected by t-test (data not shown). Apparently, RF selects genes with main effects similarly to the t-test, but additionally selects genes (not selected by t-test) that can be grouped in profiles, which are likely to be related to the treatment by gene-gene interaction effects.


Figure 5
View larger version (62K):
[in this window]
[in a new window]

 
Fig. 5. Self-organizing maps (SOM) profiles for colon (A) and cecum (B). The total number of SOM profiles was arbitrarily set to 90 for colon and 16 for cecum, corresponding to an average of ~10 genes per profile. The size of the circles corresponds to the number of genes included in the group (range of genes per profile: colon 1–19, cecum 2–27). Within each profile, genes that overlap between RF and t-test are shown in dark gray, and genes exclusively selected by RF are shown in light gray. Genes in profiles 1, 2, and 3 were analyzed in more detail.

 
We examined whether the genes exclusively selected by RF and highly enriched within one profile shared similar biological functions. Therefore we selected profiles consisting of mainly RF-selected genes. For colon two profiles and for cecum one profile was selected (Fig. 5, white boxes). The first colon profile (profile 1) consisted of nine genes, four genes with unknown function expressed sequence tags (ESTs) and five genes that were annotated but not classified to a known GO process. After literature and database search these five genes could not be linked to a single biological process (Table 4). The second colon profile (profile 2) consisted of 13 genes, of which 12 were only selected by RF. Five genes were annotated in a GO process (bold gene names in Table 4), of which four are part of the same GO process: cellular component organization and biogenesis. The remaining eight genes consisted of two ESTs and six genes that are presently poorly understood, because further database and literature mining did not reveal a relation to a known biological process. One of these six (palladin) was recognized to play a role in maintaining normal actin cytoskeleton architecture (32), indicating a possible role in the same biological process as the four annotated genes within this SOM profile.


View this table:
[in this window]
[in a new window]

 
Table 4. Genes mainly selected exclusively by RF, grouped in SOM profiles (white boxes in Fig. 5)

 
The cecum profile consisting of exclusively RF-selected genes (profile 3) consisted of 13 genes, comprising 10 unique genes. Three of the 10 genes were annotated by GO, of which 2 are part of the GO process immune response. Further database and literature mining revealed that six of the seven other genes had a function related to immune response (Table 4). This confirms the notion that genes with a similar expression profile selected from a microarray data set exclusively by RF may be enriched in the same biological process. It further indicates that this is a strategy to hunt for biological function of genes and to reveal new biological processes related to treatment.

Biological Interpretation: Pathway Analysis to Obtain Biological Processes
To examine whether pathway programs are able to identify differences between RF-selected genes and t-test-selected genes, we applied pathway analysis for the set of genes selected by RF and compared this with the same number of genes selected by t-test (935 genes for the colon data set and 165 for the cecum data set). To ensure that we covered different pathway analysis methods, we used two pathway programs, Metacore and ErmineJ. For both colon (Table 5) and cecum (Table 6) the comparison between RF- and t-test-based selection showed highly comparable results per pathway program. However, the ranking of processes was somewhat different, and each selection method (RF or t-test) identified some unique processes.


View this table:
[in this window]
[in a new window]

 
Table 5. Biological processes in colon data set selected by Metacore, ErmineJ, and GSEA

 

View this table:
[in this window]
[in a new window]

 
Table 6. Biological processes in cecum data set selected by Metacore, ErmineJ, and GSEA

 
GSEA does not require preselection of genes, although information may be lost because of incomplete annotation. GSEA is especially suited to identifying processes based on interaction. To see whether similar or complementary information is obtained, we analyzed the complete colon and cecum data sets with GSEA. We focused on pathway-related GSEA gene sets, obtained from GO, GenMapp, and Biocarta, to allow for comparison. Only a few gene sets were found to be significantly enriched (FDR < 0.25 according to GSEA): 12 in colon and 6 in cecum. The small number of processes identified by GSEA analysis suggests that information is lost. The program does give some overlapping pathways in colon, but in cecum other processes are selected. In both cases no overlap with processes only selected with RF was found.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 REFERENCES
 
We described a framework for physiological interpretation of gene expression data. This framework (see Fig. 6, top) consists of the following steps: genes are first ranked, the relevant genes are selected, and the selected genes are grouped according to their expression profile and then biologically interpreted. The considerations underlying the different steps are illustrated with two real gene expression data sets. We show several features of RF that should be part of any data analysis framework. These are 1) all genes in the data set are included in the analysis, 2) interaction between genes is taken into account, and 3) a well-defined gene set can be selected by using an objective threshold.


Figure 6
View larger version (29K):
[in this window]
[in a new window]

 
Fig. 6. A framework for identification of physiological responses in microarray based gene expression studies. The framework is composed of the following steps: gene ranking, gene selection, gene grouping, and biological interpretation. Essential features of the data analysis framework are that 1) all genes (annotated and nonannotated) in the data set are included in the analysis, 2) interaction between genes is taken into account and, 3) an objective threshold is used for selection of a well-defined gene set. RF has these features. Gene grouping can provide information on new targets and add information above pathway analysis. Despite loss of information due to incomplete annotation of the complete data set, gene set enrichment analysis can provide additional information on related genes with small differences.

 
For human, mouse, and rat whole genome arrays, the number of annotated genes is less than half of the genes present on the array. Consequently, analysis only based on functional annotation and co-occurrence in gene sets filters out half of the information present in the microarray data set. Well-studied biological processes are better represented in pathway databases (23). Therefore, conclusions obtained from data analysis based only on pathway programs are biased toward the well-annotated biological processes. By including all genes from a whole genome data set, it is possible to find genes or processes less defined in databases but could be attractive new targets for drug development or nutritional intervention. For both colon and cecum, genes exclusively selected within one SOM profile belonged to the same biological process: cellular component organization and biogenesis (colon) and immune response (cecum), respectively. Because only a few genes within these profiles were GO annotated, these processes were not selected by the different pathway programs. By literature and database search we could clearly identify these genes as part of this process.

A major strength of whole genome microarray studies is that the expression levels of all genes are displayed, allowing for identification of gene-gene interactions. RF was chosen to rank genes because its measure of importance takes possible interactions between genes into account. Compared with the results obtained by t-test, RF selected genes with main effects but additionally was able to capture weak effects. In studies with small gene expression changes that are not significant independently but occurring in one group may be of large relevance, this is an advantage. For example, it enables identification of possible side effects in drug studies or expected subtle differences in nutritional studies. In our study, application of RF in combination with SOM indeed showed enriched profiles containing mainly genes selected exclusively by RF and not by t-test. Genes within these profiles are therefore contributing by gene-gene interactions.

By applying a permutation test, we defined a threshold for RF to select genes in an objective way. Comparison of different runs showed that the most important genes were consistently selected. However, selection of genes ranked closely above the threshold varied between different runs. We chose to include genes that were additionally selected over different runs in the total selected gene set. By including genes selected additionally by different runs there is a chance that more false positives were included in the selection. If we would have chosen to select the set of genes that overlapped in all runs, we might discard truly relevant genes (false negatives). We reasoned that the increased information available for pathway analysis outweighed the potential disadvantage of including some noise, especially since in dietary studies gene expression changes of interest are usually small. Furthermore, the results show that the number of additionally selected genes decreased rapidly for each additional run. Because there was large overlap, it is less likely that many of the additionally selected genes were noise. Thus, within this framework, RF is a useful tool to select a well-defined set of genes for further interpretation.

SOM was applied to find groups of genes with similar gene expression profiles. Other approaches to find gene groups, such as hierarchical clustering, can be used with the same objective (35). However, SOM has the advantage, compared with other clustering methods, that it provides an ordering of the profiles. While individual genes may have small gene expression differences, groups of similarly behaving genes can be biologically significant. When SOM analysis is applied to whole genome data sets, unrelated data will also produce clusters, without any physiological relevance (35). This can be overcome by selecting a subset of genes and examining whether biological valid clusters are obtained. The number of clusters is specified by the user. Specifying larger and smaller numbers of profiles within a certain range does not impact the interpretation of the results, since SOM provides an ordering in the profiles. For both colon and cecum, genes selected by RF and analyzed by SOM provided profiles consisting of genes with similar biological function. In the colon data set, a SOM profile consisted of genes belonging to the same GO process and genes with poorly identified functions. This could be a starting point to identify possible biological function of the nonidentified genes. Using SOM within this framework can provide information on genes with unknown function and help to identify biological processes not captured by pathway analyses. Therefore SOM is a useful tool for identification of biological processes in addition to pathway analysis.

The pathway analysis based on the subset of genes obtained by RF and t-test shows overlap for the selected processes; however, different processes were additionally obtained by RF.

Remarkably, GSEA only returned a few gene sets connected to public databases that were significantly enriched in colon or in cecum. The small number of processes identified by GSEA analysis suggests that information is lost. On the other hand, GSEA did provide biological processes not found in the other pathway programs. Although only a few processes were found by GSEA, these are worth exploring because they may consist of related genes with small differences. Thus, in the context of the framework discussed in this paper, GSEA may additionally be applied.

The advantage of this framework is that different methods can be applied at different steps, depending on the aim and preferences of the researcher. For example, other methods that take interactions into account could be used instead of RF. A next step is to extensively compare different methods that take gene-gene interaction into account to select biologically relevant genes. There are several advantages to the use of RF within this framework to rank and select genes. In a previous simulation study, Lunetta et al. (27) showed that the more interactions that are present in the data set, the more RF outperforms a univariate test statistic in prioritizing the important variables. In our study we used two real data sets with subtle gene expression changes and showed that RF in combination with SOM can be used to extract a biologically meaningful group of genes, such as the set of immune response genes in the cecum data set that would be discarded with univariate tests such as the t-test. As mentioned above, it returns an importance factor for each gene (Im) in which gene-gene interactions are taken into account. On the basis of this Im, we showed an approach that can be used to define an objective threshold for selection of genes.

Besides two classes, RF can also be applied to multiclass problems. Furthermore, free software is available for RF whereby only a few parameters need to be defined (26). Also, users can easily obtain a gene list for further interpretation without the need to understand the finer details of the method thoroughly. Therefore, within this framework RF is a suitable and practical tool to rank and select genes. Combined with gene grouping by SOM and pathway programs, this framework is helpful to obtain insight in the biological processes. These physiological effects are the main focus for further confirmatory and mechanistic studies.

In conclusion, in this study we have examined the application of a framework in which all genes in a microarray data set are analyzed. Within this framework, application of RF has the advantage that it takes gene-gene interactions in the ranking of genes into account. Also, selection of genes by an objective threshold provides a well-defined set of genes for further interpretation. Groups of genes within this set are identified by SOM analysis. In combination with pathway analyses it provides valuable information on biological processes involved in the treatment.


    GRANTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 REFERENCES
 
This work was supported by the Centre for Human Nutrigenomics (G. Heidema), TI Food and Nutrition (W. Rodenburg), and the ministry of Agriculture, Food Quality and Nature Management of The Netherlands. J. Keijer is a member of Mitofood.


    ACKNOWLEDGMENTS
 
The authors thank Dr. E. M. van Schothorst from RIKILT Institute of Food Safety, Dr. P. Wang from Maastricht University, and Dr. D. Molenaar from NIZO Food Research for helpful comments and suggestions. We also thank Dr. T. Travis for allowing the use of the computer cluster at the Rowett Research Institute for the random forests analyses.


    FOOTNOTES
 
Address for reprint requests and other correspondence: J. Keijer, Food Bioactives Group, RIKILT-Institute of Food Safety, PO Box 230, 6700 AE, Wageningen, The Netherlands (e-mail: jaap.keijer{at}wur.nl).

Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).

* W. Rodenburg and A. G. Heidema contributed equally to this paper. Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 REFERENCES
 

  1. Afman L, Muller M. Nutrigenomics: from molecular nutrition to prevention of disease. J Am Diet Assoc 106: 569–576, 2006.[CrossRef][Web of Science][Medline]
  2. Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 7: 55–65, 2006.[CrossRef][Web of Science][Medline]
  3. Barry WT, Nobel AB, Wright FA. Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics 21: 1943–1949, 2005.[Abstract/Free Full Text]
  4. Brazhnik P, de la Fuente A, Mendes P. Gene networks: how to put the function in genomics. Trends Biotechnol 20: 467–472, 2002.[CrossRef][Web of Science][Medline]
  5. Breiman L. Fortran Code for Random Forests. www.stat.berkeley.edu/user/breiman/randomforests/.
  6. Breiman L. Random Forest. Machine Learning 45: 5–32, 2001.[CrossRef][Web of Science]
  7. Breitling R, Armengaud P, Amtmann A, Herzyk P. Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett 573: 83–92, 2004.[CrossRef][Web of Science][Medline]
  8. Chen JJ, Wang SJ, Tsai CA, Lin CJ. Selection of differentially expressed genes in microarray data analysis. Pharmacogenomics J 7: 212–220, 2006.[CrossRef][Web of Science][Medline]
  9. Cox DR, Hinkley DV. Theoretical Statistics. London: Chapman and Hall, 1974.
  10. CRAN. http://cran.r-project.org/.
  11. Curk T, Demsar J, Xu Q, Leban G, Petrovic U, Bratko I, Shaulsky G, Zupan B. Microarray data mining with visual programming. Bioinformatics 21: 396–398, 2005.[Abstract/Free Full Text]
  12. de Boer VC, van Schothorst EM, Dihal AA, van der Woude H, Arts IC, Rietjens IM, Hollman PC, Keijer J. Chronic quercetin exposure affects fatty acid catabolism in rat lung. Cell Mol Life Sci 63: 2847–2858, 2006.[CrossRef][Web of Science][Medline]
  13. Diaz-Uriarte R, Alvarez de Andres S. Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7: 3, 2006.[CrossRef][Medline]
  14. Dopazo J. Functional interpretation of microarray experiments. OMICS 10: 398–410, 2006.[CrossRef][Web of Science][Medline]
  15. Draghici S, Sellamuthu S, Khatri P. Babel's tower revisited: a universal resource for cross-referencing across annotation databases. Bioinformatics 22: 2934–2939, 2006.[Abstract/Free Full Text]
  16. Ekins S, Nikolsky Y, Bugrim A, Kirillov E, Nikolskaya T. Pathway mapping tools for analysis of high content data. Methods Mol Biol 356: 319–350, 2007.[Medline]
  17. Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 32: D258–D261, 2004.[Abstract/Free Full Text]
  18. GeneSrF. Gene selection with random forests. CNIO Bioinformatics Unit. http://genesrf.bioinfo.cnio.es.
  19. Goeman JJ, Buhlmann P. Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 23: 980–987, 2007.[Abstract/Free Full Text]
  20. Hochberg Y, Benjamini Y. More powerful procedures for multiple significance testing. Stat Med 9: 811–818, 1990.[Web of Science][Medline]
  21. Huang X, Pan W, Grindle S, Han X, Chen Y, Park SJ, Miller LW, Hall J. A comparative study of discriminating human heart failure etiology using gene expression profiles. BMC Bioinformatics 6: 205, 2005.[CrossRef][Medline]
  22. Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 28: 27–30, 2000.[Abstract/Free Full Text]
  23. Khatri P, Draghici S. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21: 3587–3595, 2005.[Abstract/Free Full Text]
  24. Lee HK, Braynen W, Keshav K, Pavlidis P. ErmineJ: tool for functional analysis of gene expression data sets. BMC Bioinformatics 6: 269, 2005.[CrossRef][Medline]
  25. Lee JW, Lee JB, Park M, Song SH. An extensive comparison of recent classification tools applied to microarray data. Comput Stat Data Anal 48: 869–885, 2005.[CrossRef]
  26. Liaw A, Wiener M. Classification and regression by randomForest. R News 2: 18–22, 2002.
  27. Lunetta KL, Hayward LB, Segal J, Van Eerdewegh P. Screening large-scale association study data: exploiting interactions using random forests. BMC Genet 5: 32, 2004.[CrossRef][Medline]
  28. Lyons-Weiler J, Pelikan R, Zeh HJ, Whitcomb DC, Malehorn DE, Bigbee WL, Hauskrecht M. Assessing the statistical significance of the achieved classification error of classifiers constructed using serum peptide profiles, and a prescription for random sampling repeated studies for massive high-throughput genomic and proteomics studies. Cancer Informatics 1: 53–77, 2005.
  29. Majumder PK, Febbo PG, Bikoff R, Berger R, Xue Q, McMahon LM, Manola J, Brugarolas J, McDonnell TJ, Golub TR, Loda M, Lane HA, Sellers WR. mTOR inhibition reverses Akt-dependent prostate intraepithelial neoplasia through regulation of apoptotic and HIF-1-dependent pathways. Nat Med 10: 594–601, 2004.[CrossRef][Web of Science][Medline]
  30. Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrale M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34: 267–273, 2003.[CrossRef][Web of Science][Medline]
  31. Norris AW, Kahn CR. Analysis of gene expression in pathophysiological states: balancing false discovery and false negative rates. Proc Natl Acad Sci USA 103: 649–653, 2006.[Abstract/Free Full Text]
  32. Parast MM, Otey CA. Characterization of palladin, a novel protein localized to stress fibers and cell adhesions. J Cell Biol 150: 643–656, 2000.[Abstract/Free Full Text]
  33. Patsouris D, Reddy JK, Muller M, Kersten S. Peroxisome proliferator-activated receptor alpha mediates the effects of high-fat diet on hepatic gene expression. Endocrinology 147: 1508–1516, 2006.[Abstract/Free Full Text]
  34. Pellis L, Franssen-van Hal NL, Burema J, Keijer J. The intraclass correlation coefficient applied for evaluation of data correction, labeling methods, and rectal biopsy sampling in DNA microarray experiments. Physiol Genomics 16: 99–106, 2003.[Abstract/Free Full Text]
  35. Quackenbush J. Computational analysis of microarray data. Nat Rev Genet 2: 418–427, 2001.[CrossRef][Web of Science][Medline]
  36. R Development Core Team. R: A Language and Environment for Statistical Computing. http://www.R-project.org, 2004.
  37. Rubin E. Circumventing the cut-off for enrichment analysis. Brief Bioinform 7: 202–203, 2006.[Abstract/Free Full Text]
  38. Segal E, Friedman N, Kaminski N, Regev A, Koller D. From signatures to models: understanding cancer using microarrays. Nat Genet 37, Suppl: S38–S45, 2005.[CrossRef][Web of Science][Medline]
  39. Shi T, Seligson D, Belldegrun AS, Palotie A, Horvath S. Tumor classification by tissue microarray profiling: random forest clustering applied to renal cell carcinoma. Mod Pathol 18: 547–557, 2005.[CrossRef][Web of Science][Medline]
  40. Slonim DK. From patterns to pathways: gene expression data analysis comes of age. Nat Genet 32, Suppl: 502–508, 2002.[CrossRef][Web of Science][Medline]
  41. Smyth GK, Yang YH, Speed T. Statistical issues in cDNA microarray data analysis. Methods Mol Biol 224: 111–136, 2003.[Medline]
  42. Strobl C, Boulesteix AL, Zeileis A, Hothorn T. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics 8: 25, 2007.[CrossRef][Medline]
  43. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102: 15545–15550, 2005.[Abstract/Free Full Text]
  44. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96: 2907–2912, 1999.[Abstract/Free Full Text]
  45. Tomfohr J, Lu J, Kepler TB. Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics 6: 225, 2005.[CrossRef][Medline]
  46. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98: 5116–5121, 2001.[Abstract/Free Full Text]
  47. Valafar F. Pattern recognition techniques in microarray data analysis: a survey. Ann NY Acad Sci 980: 41–64, 2002.[Web of Science][Medline]
  48. Verducci JS, Melfi VF, Lin S, Wang Z, Roy S, Sen CK. Microarray analysis of gene expression: considerations in data mining and statistical treatment. Physiol Genomics 25: 355–363, 2006.[Abstract/Free Full Text]
  49. Werner T. Regulatory networks: linking microarray data to systems biology. Mech Ageing Dev 128: 168–172, 2007.[CrossRef][Web of Science][Medline]
  50. Yoon S, Yang Y, Choi J, Seong J. Large scale data mining approach for gene-specific standardization of microarray gene expression data. Bioinformatics 22: 2898–2904, 2006.[Abstract/Free Full Text]



This article has been cited by other articles:


Home page
J. Nutr.Home page
M. A. A. Schepens, A. J. Schonewille, C. Vink, E. M. van Schothorst, E. Kramer, T. Hendriks, R.-J. Brummer, J. Keijer, R. van der Meer, and I. M. J. Bovee-Oudenhoven
Supplemental Calcium Attenuates the Colitis-Related Increase in Diarrhea, Intestinal Permeability, and Extracellular Matrix Breakdown in HLA-B27 Transgenic Rats
J. Nutr., August 1, 2009; 139(8): 1525 - 1533.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
33/1/78    most recent
00167.2007v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (2)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Rodenburg, W.
Right arrow Articles by Keijer, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rodenburg, W.
Right arrow Articles by Keijer, J.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Visit Other APS Journals Online
Copyright © 2008 by the American Physiological Society.