DNA-binding transcription factors bind to promoters that carry their binding sites. Transcription factors therefore function as nodes in gene regulatory networks. In the present work we used a bioinformatic approach to search for transcription factors that might function as nodes in gene regulatory networks during the differentiation of the small intestinal epithelial cell. In addition we have searched for connections between transcription factors and the villus metabolome. Transcriptome data were generated from mouse small intestinal villus, crypt, and fetal intestinal epithelial cells. Metabolome data were generated from crypt and villus cells. Our results show that genes that are upregulated during fetal to adult and crypt to villus differentiation have an overrepresentation of potential hepatocyte nuclear factor (HNF)-4 binding sites in their promoters. Moreover, metabolome analyses by magic angle spinning 1H nuclear magnetic resonance spectroscopy showed that the villus epithelial cells contain higher concentrations of lipid carbon chains than the crypt cells. These findings suggest a model where the HNF-4 transcription factor influences the villus metabolome by regulating genes that are involved in lipid metabolism. Our approach also identifies transcription factors of importance for crypt functions such as DNA replication (E2F) and stem cell maintenance (c-Myc).
- crypt-villus axis
- gene regulation
- hepatocyte nuclear factor-4
in a diagram of a gene regulatory network, the transcription factors form nodes with many connections drawn as lines and extending to the genes that the transcription factors regulate (for reviews see Refs. 6, 34). Genome-wide chromatin immunoprecipitation experiments (27) have previously showed that the hepatocyte nuclear factors-1, 4, and 6 (HNF-1, HNF-4, and HNF-6) form important nodes in hepatic gene regulatory networks. Thus HNF-1, HNF-4, and HNF-6 were shown to bind to at least 1.6, 12, and 1.7% of the assayed promoters, respectively, in hepatocytes (27). HNF-4 stands out as being particularly important because it binds to almost 10 times as many promoters in the hepatocyte than HNF-1 and HNF-6 do. HNF-4 controls genes involved in hepatic lipid metabolism (47), thereby influencing the hepatocyte metabolome.
During vertebrate embryonic development, the liver develops as an outgrowth from the anterior primitive endoderm, which also gives rise to the adult small intestinal epithelium (for a review see Ref. 33). This embryonic relationship is reflected in the adult organs, where many gene products such as genes involved in lipoprotein synthesis are expressed in both the liver and the small intestine. The small intestinal epithelium can be divided into two parts: the villus and the crypt compartments (see Fig. 1). The epithelium covers the underlying connective tissue (called the lamina propria) to form finger-like protrusions, the villi, which point outward to the gut lumen. At the base of the villi, the epithelium continues, to line the flask-shaped crypts that penetrate into the connective tissue. The cellular dynamic of the epithelium originates from the positioning of one to four stem cells, which are situated at a few cell positions above the bottom of the crypts. The stem cells give rise to a layer of committed so-called transient amplifying cells, which are positioned at the middle and upper parts of the crypts. These transient amplifying cells undergo a few cell divisions as they migrate toward the crypt openings. At the crypt-villus transition zone, proliferation ceases and the cells differentiate (for reviews see Refs. 30, 31, 36). The absorptive enterocyte is by far the most abundant cell type in the small intestinal epithelium, and the fully differentiated enterocyte is a cell type that in many ways functionally resembles the hepatocyte. It is therefore relevant to ask to what extent the HNF transcription factors might be important for the generation of the villus-specific gene expression. Two decades of work focusing on a few selected genes lends support to the idea that members of the HNF-1 and HNF-4 transcription factor families might indeed be of importance for villus-specific gene expression (for a review see Ref. 48). The order of magnitude of the number of target genes for HNF-1 and HNF-4 in the differentiated enterocyte is, however, not known at present. It is also not known whether other transcription factors might similarly drive a high number of differentiation-induced genes in the villus enterocyte and thereby be just as important for the villus gene expression.
It was the purpose of the present work to determine on a genome-wide scale which transcription factor binding sites are the most common in the promoters for differentiation-induced genes during fetal-to-adult and crypt-to-villus differentiation of small intestinal epithelial cells. Another purpose was to investigate whether connections between the enterocyte metabolome and the investigated transcription factors might exist.
Transcriptome data were collected from embryonic mouse endoderm, adult mouse crypt, and adult mouse villus epithelium by high-density oligonucleotide array analysis. Metabolome data were collected from adult mouse crypt and villus epithelium by magic angle spinning 1H nuclear magnetic resonance (NMR) spectroscopy, a technique that can provide detailed molecular information about a wide range of metabolites in small amounts of intact tissue (45, 46). To identify overrepresentation of potential transcription factor binding sites in the promoters controlling genes with a differentiation-dependent expression, a bioinformatic algorithm was applied.
Our results point to HNF-4 as a critical regulator of the villus specific gene expression because potential HNF-4 binding sites are found in a high fraction of the promoters that control upregulated genes during development and during crypt-to-villus differentiation. Analysis of the villus metabolome revealed the presence of higher concentrations of lipid carbon chains in the villi than in the crypts. This finding led us to formulate a model in which HNF-4 indirectly controls the concentration of lipid carbon chains in the villi by regulating genes involved in lipid metabolism. Finally, our results also provide information about transcription factors that regulate crypt-specific functions. We have identified both a c-Myc crypt transcription factor node, which is presumably associated with epithelial stem cell maintenance, as well as an E2F gene regulatory node, which is presumably associated with crypt cell proliferation.
Isolation of mouse intestinal tissues.
The protocol involving experimental animals conformed to the rules concerning review and approval by the committee for experimental animals under the Danish Ministry of Justice. C57BL/6 mice were kept on a standard rodent diet and fed ad libitum. Animals were killed by cervical dislocation. Rapid access to the abdominal cavity was achieved by use of surgical scissors, the ileum was dissected out, and a 10-cm segment was cut free and immediately placed in ice-cold PBS. The intestinal segments were flushed with ice-cold HBSS [3.3 mM Na2HPO4, 4.1 mM NaHCO3, 136.8 mM NaCl, 0.44 mM KH2PO4, 5.3 mM KCl, 5.5 mM D(+)-glucose] adjusted to pH 7.2. DTT was added to 0.5 mM just before use. Isolation of crypts and villi was performed according to the procedure by Flint et al. (15) with some modifications: A plastic rod (diameter 3 mm and length 115 mm) was gently introduced ∼5 mm into the lumen of the intestinal segment, which was fixed to the rod using 3-0 suture. The rest of the intestinal segment was inverted onto the remaining free part of the plastic rod and fixed at the other end with 3-0 suture. The inverted intestine on the plastic rod was incubated overnight (4°C, 15 h) in chelating buffer (27 mM Na-citrate, 5 mM Na2HPO4, 96 mM NaCl, 8 mM KH2PO4, 1.5 mM KCl, 55 mM d-sorbitol, 44 mM, 0.5 mM DTT) adjusted to pH 7.2. All of the following manipulations were performed at 4°C. The plastic rod with the inverted intestine was placed in fresh chelating buffer in a 15-ml plastic centrifuge tube with a screw cap. The tube was fixed with a clamp that inserted into a motor for a Potter-Elvehjem homogenizer. The motor was adjusted to a speed of 1–2 rpm, allowing the tube to be continuously inverted. Initially, the chelating buffer was collected every 30 min, and the released villi was inspected by phase contrast microscopy. The first fractions, which were dominated by intact villi, were pooled, washed once in PBS, pelleted (800 g, 5 min), snap-frozen in liquid N2, and stored as the villus fraction. The rotation and collection of fractions were continued for 8–10 h until very few cells were released into the new fractions. Crypts were subsequently released by tapping the centrifuge tube hard into a lab dish three to four times. The released cells were harvested by centrifugation and washed once in PBS, and the pellets were stored frozen in liquid N2.
Mouse ileal segments were placed in 4% paraformaldehyde (4°C, 16 h) and subsequently in 60% ethanol (4°C) until embedding. The tissue segments were embedded in paraffin, sectioned, and stained with hematoxylin and eosin according to standard histological procedures. Rehydrated paraffin sections were boiled for 10 min in 10 mM Na-citrate, pH 6.0. The heating was turned off, and the buffer was allowed to reach room temperature. After the antigen retrieval procedure, the sections were incubated for 30 min in blocking buffer (50 mM Tris·HCl pH 7.4, 150 mM NaCl, 0.5% ovalbumin, 0.1% gelatine, 0.2% teleostean gelatine, 0.05% Tween 20) and incubated with a 1:50 dilution of a polyclonal anti-HNF-4 antibody (SC-8987, Santa Cruz Biotechnology) in blocking buffer overnight. The sections were washed three times for 10 min each in blocking buffer and incubated for 30 min at room temperature with a 1:100 dilution of an Alexa-488-conjugated goat anti-rabbit antibody (Invitrogen). After three washes in PBS, the sections were mounted for fluorescence microscopy.
Villus epithelial cells were isolated from the ileum of five C57BL/6 mice as described above. The villus cells were pooled, pelleted (1,000 g, 5 min), and resuspended in 10 ml of minimal essential medium. The resuspended cells were allowed to equilibrate to room temperature for 10 min. We added 280 μl of 37% formaldehyde, and fixation was allowed to proceed for 30 min at room temperature with gentle shaking. The fixation was stopped by the addition of 540 μl of 2.5 M glycine. After the harvest (4,000 g, 10 min) of fixed villus cells, sonication and immunoprecipitation with the HNF-4 antibody (SC-8987, Santa Cruz Biotechnology) were performed exactly as described previously (28). The amount of immunoprecipitated promoter DNA was measured by quantitative real-time PCR. The primers were designed to amplify 130- to 150-bp regions including the predicted HNF-4 binding site in the Apoa4, Numb, Anpep, and Mep1a promoters, respectively. In addition, primers were designed for a region in the Cd24a promoter, which does not have a predicted HNF-4 binding site. The primer sequences, the sequences of the amplified regions, and the predicted HNF-4 binding sites for the promoters can be found in Supplementary Table 1 (the online version of this article contains supplemental data). All amplified promoter regions were sequenced to verify their identity. For quantitative real-time PCR, the LightCycler FastStart DNA Masterplus SYBR green I system (Roche) was applied. Reactions were assembled in LightCycler capillary tubes (Roche), and 5 μl of purified immunoprecipitated DNA were used as template. Melting curves were routinely inspected to rule out the presence of unrelated amplified DNA in the real-time PCR reaction.
Cloning and analysis of the Mep1a promoter.
The region from position −668 to +11 (from the February 2006 assembly of the mouse genome) surrounding the Mep1a gene was amplified using 0.5 μg of mouse (C57BL/6) tail DNA as template in a standard PCR reaction. The primers used were 5′-TTGGCTAGCACCCTTTCCCTGCTTTGTTT-3′ and 5′-TGCAAGCTTCCTATTGGACCTTGCTCTCA-3′ carrying 5′-extensions with NheI and HindIII restriction sites (underlined). The sequence-verified promoter fragment was cloned into the pGL3-basic vector (Promega Biotech) in front of the firefly luciferase gene using NheI and HindIII as cloning sites. To analyze the responsiveness of the Mep1a promoter to HNF-4a, the Mep1a promoter/luciferase construct was cotransfected with the CMVLacZ internal control vector, with or without the rat HNF-4a expression vector, into HeLa cells. As a positive control for HNF-4 responsiveness, the human intestinal alkaline phosphatase promoter was used (28). The culture of HeLa cells, cotransfection with the rat HNF-4a expression vector, and measurements of luciferase and β-galactosidase were performed exactly as described previously (28).
RNA extraction, hybridization probe preparation, and GeneChip hybridization.
Total RNA was isolated using the RNeasy kit (Qiagen, Hilden, Germany). Frozen intestinal tissue pellets were lysed directly in lysis buffer, and the RNA isolated with the Qiagen column was digested, on-column by DNase I according to the manufacturer’s protocol (Qiagen). First-strand cDNA was synthesized from 5 μg of total RNA by incubation (42°C, 1 h) in a 20-μl reaction volume containing 2.5 mM T7-(dT)24 primer, 50 mM Tris·HCl pH 8.3, 75 mM KCl, 3 mM MgCl2, 10 mM DTT, and 500 mM dNTP, 10 units/ml Superscript II reverse transcriptase (Invitrogen, Carlsbad, CA) . Second-strand cDNA was synthesized directly by adding 91 μl of RNase-free water, 30 μl of 5× second-strand reaction buffer (Invitrogen), 3 μl 10 mM dNTP, 1 μl Escherichia coli DNA ligase (10 U/μl), 4 μl E. coli DNA polymerase I (10 U/ml), and 1 μl E. coli RNase H (2 U/ml) followed by incubation (2 h, 16°C). The ends of the double-stranded cDNA were polished using T4 DNA polymerase (20 units, 5 min, 16°C). The cDNA was purified and concentrated by phenol-chloroform extraction and ethanol precipitation. Generation of biotin-labeled RNA was accomplished by in vitro transcription with T7 RNA polymerase using the BioArray High Yield RNA transcript labeling kit (Enzo LifeSciences). Biotin-labeled cRNA was subsequently purified from the transcription reaction using the RNeasy system (Qiagen). Hybridization of biotin-labeled cRNA to MOE430A 2.0 GeneChips, washing, staining, and scanning were performed according to the protocols published by the manufacturer (Affymetrix). Six MOE430A 2.0 GeneChip hybridizations were performed with crypt- and villus-derived RNA (three with villus probes and three with crypt probes). To achieve sufficient amounts of RNA for GeneChip analysis, endoderms and mesenchymes were isolated from 79 embryos and grouped into four separate pools for RNA extraction. Four independent GeneChip hybridization experiments were performed with both the endodermal probes and with the mesenchymal probes.
Expression level comparisons.
Summarization of probe-level data from the scanned GeneChips into single normalized gene expression measures for each probe set was performed by the robust multiarray analysis (RMA) procedure (20). The calculations were performed using the implementation of RMA provided by the open source bioconductor project (http://www.bioconductor.org) (16). The difference between the mean crypt and mean villus expression measure was calculated for each probe set, and the significance was evaluated by an unpaired Student t-test using standard statistical calculations (4). Similar comparisons were performed for the endoderm and villus expression measures. The calculated P values were stored in a table together with the mean expression measure values for each probe set. To classify genes according to the abundance of their transcripts in intestinal cells, an expression measure of eight was chosen as the upper limit for low-abundance transcripts; an expression measure of 10 was chosen as the lower limit for a high-abundance transcript. With these limits, ∼8% of the probe sets had expression measures corresponding to high copy number transcripts, and probe sets with expression measures between 8 and 10 corresponding to transcripts with intermediate copy numbers constitute 17% of the probe sets. Seventy-five percent of all probe sets had expression measures corresponding to low copy number transcripts. Clearly, a large fraction of the probe sets with expression measures below eight will represent transcripts that are not expressed at all in the small intestinal epithelium. Our experience from performing RT-PCR on mRNA extracted from the mouse small intestinal epithelium suggests that an RMA calculated expression measure of five in most cases represents a gene that cannot be amplified by RT-PCR from the RNA sample that was used for the GeneChip analysis. The calculated expression measures have been deposited in the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) under the series accession number GSE3216.
Functional interpretation of gene expression changes during enterocyte differentiation.
From the table with the results of the comparisons between villus and crypt expression measures (see section above), two lists of probe set IDs were generated according to the criteria: 1) mean villus expression measure fourfold higher than the mean crypt expression measure and P < 0.01 for the unpaired Student’s t-test and 2) mean crypt expression measure fourfold higher than the mean villus expression measure and P < 0.01 for the unpaired Student’s t-test. The probe set IDs were loaded into the program GoSurfer (51) and overrepresented (P < 0.01) gene ontology terms for biological processes visualized using the graphical output from the program. Similar calculations were performed for the villus and endoderm comparisons.
Identification of overrepresented promoter cis-elements.
The table described above containing the mean expression measures and the P values form the unpaired Student’s t-test were imported into an MySQL database server running in a 64-bit Mandrake Linux 10 environment on a personal computer equipped with an AMD 64 Athlon processor (Advanced Micro Devices, Sunnyvale, CA). Lists of genes (Supplementary Tables 2–7) with a specified significant (P < 0.05) difference in expression measure (for example, >10 in mean crypt expression measure and <8 in the mean villus expression measure) were generated using standard structured query language statements. To identify potential transcription factor binding sites that occur more frequently than expected by chance (i.e., they are overrepresented) in the promoters regulating the genes that change abundance classes, we used an algorithm developed by Elkon and colleagues (14). We developed our own implementation of the algorithm in a program called PRIMO (promoter integration in microarray result organization), which is significantly faster and provides more detailed data output. In brief, the program uses a simple position weight matrix (PWM)-scoring algorithm exactly as previously described (14) to scan a target set of promoters one nucleotide at a time and on both strands in windows corresponding to the length of the transcription factor binding site described by the PWM. The target set promoters are a part of a larger promoter set of 1.1-kb sequences extracted from the mouse genome sequence (May 2004, build 33). Each promoter in the promoter set represents the mouse genome sequence from 100 bp downstream to 1,000 bp upstream of the nucleotide that aligns with the 5′-end of a transcripts from the mouse reference sequence (RefSeq) (32) collection of mouse curated transcripts. In total, 16,095 promoters were extracted using the UCSC table browser (21). Overrepresentation of promoters with hits for a given PWM in the target set in relation to the occurrence of promoters with hits in the larger promoter set was calculated by the Fisher exact test for proportions (4). For the analysis reported here, a list (Supplementary Table 8) with 65 PWMs derived from the Transfac database (50) was used. Accordingly, the P values reported from the PRIMO analysis have been corrected for performing 65 tests by the Bonferroni method (4). PWMs with overrepresentation of hits in the promoters for both up- and downregulated genes (crypt vs. villus or endoderm vs. villus) were not reported. The PRIMO source codes are available upon request, and a demo version of PRIMO is available at http://gastro.imbg.ku.dk/primoweb.
Magic-angle spinning 1H NMR spectroscopy.
Fifteen samples, corresponding to eight samples of intestinal crypt cells and seven samples of intestinal villus cells, were used for 1H NMR spectroscopy. Approximately 15 mg each of crypt and villus cells were packed into separate 4-mm-diameter zirconia rotors with spherical inserts and Kel-F caps. Approximately 20 μl of D2O were added to the rotor to provide filled lock. All NMR experiments were carried out on a Bruker DRX-600 spectrometer (Bruker Biospin, Rheinstetten, Germany), at 283K, operating at a 1H frequency of 600.13 MHz. Samples were spun at 5 kHz at the magic angle. A total of 15 min was allowed for temperature equilibration before NMR acquisition. A standard Bruker high-resolution magic-angle spinning probe with a magic-angle gradient was employed, and the 90° pulse length was adjusted individually for each sample, having a value between 9.6 and 10 μs. A total of 128 transients were collected into 16,000 data points for each spectrum with a spectral width of 20 parts per million (ppm) and a recycle delay of 2.0 s.
Standard 1H NMR spectra were acquired for each tissue using the water-suppressed NOESY1DPR (90-t1-90-tm-90-acq) (26). The interpulse delay (t1) was 3 μs, and the mixing time (tm) was 100 ms. A weak irradiation was applied on the water resonance during both the mixing time and the recycle delay.
NMR data analysis.
1H NMR spectra were phased and baseline-corrected using XWINNMR 3.5 (Bruker). The spectra were referenced to the anomeric proton α-glucose resonance at δ5.22 (where δ = resonance interval). The continuous spectra over the range δ 0.5–8.0 were digitized into discrete resonance intervals using a MATLAB script developed in-house (Dr. O. Cloarec, Imperial College London). The region δ 4.7–5.1 was removed to avoid the effects of imperfect water suppression. In total, the digitization procedure generated 30,280 chemical shift intervals, each defining a variable. Each of these variables is referenced by its δ value (in ppm) and holds the value of the resonance signal measured. Normalization to the total sum of the spectrum was carried out on the data before data analyses. Orthogonal-partial least-squares discriminate analysis (O-PLS-DA) (41) of the NMR spectra was carried out in a MATLAB 7.0 environment with a MATLAB script developed in-house (Dr. O. Cloarec) (9). All variables were mean centered and scaled to unit variance before O-PLS-DA. The O-PLS-DA model was constructed using the NMR data as the X-variables and the different cell type as the Y-variables (9). One orthogonal component was calculated for the model to remove the irrelevant variations in the NMR data, and one PLS component was calculated for the model. The quality of the model was described by the cross-validation parameters (R2 = 0.69 and Q2 = 0.51), indicating the predictability of the Y-matrix and the total explained variation, respectively. To visualize metabolites that discriminate crypts and villi, the average villus to crypt difference for each variable was calculated and plotted as a function of the chemical shift. In this plot, villus-enriched metabolites are represented by peaks with positive values on the ordinate (and thus pointing upward), whereas the reverse is true for crypt-enriched metabolites. To allow an estimation of the significance of the peaks in the plot, each peak is color-coded according to a scale from 0 to 1 representing the weight of the contribution of each resonance signal at a given chemical shift region to the O-PLS-DA model for the first PLS component. Thus peaks in yellow to red colors represent the metabolites that are most important for the discrimination between crypts and villi.
Gene expression and NMR data were combined in a single model by classical PLS regression (for a review see Ref. 1) using the software SIMCA-P 10.0 (Umetrics, Umeå, Sweden). The gene expression data were used as the independent variables defining the X-matrix, and the NMR data were used as the dependent variables defining the Y-matrix. A total of two components were calculated, and the model explained 83% of variances in the dataset with a predictability of 0.82. The PLS regression model predicts the dependent variables (δ) from the set of independent variables (the gene expression measures). Each dependent variable (e.g., δ = 1.29 ppm) is predicted by multiple regression: δn ppm = a1 × expressiongene-1 + a2 × expressiongene-2 +… an × expressiongene-n, where a1 to an are regression coefficients that are calculated from the parameters derived from the PLS model. The genes with the highest positive regression coefficients have the highest positive influence on the dependent variable (δ). To find genes that are positively correlated with the increased villus lipid resonances, the genes with the highest positive regression coefficient for the lipid resonance at 1.29 ppm were accordingly extracted (Supplementary Table 9) and used in a subsequent PRIMO analysis for promoter cis-element overrepresentation analysis (see above).
The overall experimental strategy is depicted in Fig. 1. The starting point was mouse embryonic endoderm, adult crypt, and villus epithelium. Transcriptome and metabolome data were subsequently collected by high-throughput procedures and finally analyzed biostatistically and bioinformatically to yield information about the biological processes and metabolites that are upregulated during the differentiation of immature intestinal epithelial cells. Information about transcription factors that might be important in mediating these differences was also obtained. Validations were carried out at the single gene level by immunocytochemistry, chromatin immunoprecipitation, and transfection experiments to support the high-throughput studies. A model that integrates the findings was finally generated.
Generation of quantitative genome-wide endoderm, crypt, and villus gene expression data.
Gene expression data were obtained by Affymetrix high-density oligonucleotide array analysis. To allow easy and meaningful mining of our expression data, we constructed a public resource in the form of two databases with web access: one database for the crypt-villus gene expression data (MouseCVDB: http://gastro.imbg.ku.dk/mousecv/) (Fig. 2) and one database (FETALINTDB: http://gastro.imbg.ku.dk/fetalint/) for the endoderm gene expression data that are presented together with gene expression data from its mesenchymal counterpart.
To evaluate the overall quality of the hybridization results, we took advantage of our published crypt-villus in situ hybridization database (29) that stores information of previously reported intestinal in situ hybridization experiments. The expression patterns of genes represented in both databases were compared. Probe sets representing 56 genes in the crypt-villus in situ hybridization database are present on the MOE 430 A 2.0 GeneChip array used. In summary, 47% of the probe sets representing transcripts considered to be crypt-specific by in situ hybridization and 69% of the probe sets representing transcripts considered to be villus-specific by in situ hybridization showed the expected tendency in the differences in their mean crypt and villus expression measures calculated from the GeneChip hybridizations. Moreover, the majority of the probe sets that did not show the expected difference in their mean crypt and villus expression measures had small expression measures that were not significantly different. The signals from these probe sets are presumably below the detection threshold for the GeneChip hybridization procedure.
Overall functional interpretation of gene expression changes during endoderm-villus and crypt-villus enterocyte differentiation.
For an initial characterization of the gene expression data, genes that displayed a fourfold difference (P < 0.01) in expression levels, either between the endoderm and the villus epithelium or between the crypt and the villus epithelium, were identified and subjected to an analysis of gene ontology annotations for biological processes. Most differences were found between endoderm and villus. We found 1,122 probe sets to have a fourfold higher villus expression measure than endodermal expression measure; we found 1,715 probe sets to have a fourfold higher endodermal expression measure than villus expression measure. When the two lists of probe sets were analyzed for overrepresentation of specific gene ontology terms, we found that genes annotated with the gene ontology terms for the biological processes related to immune response, molecular transport, carbohydrate metabolism, and lipid metabolism were upregulated in the adult villus epithelium compared with the endoderm. In contrast, genes annotated with the gene ontology terms related to the biological processes DNA repair, organelle biogenesis, cell cycle regulation, protein, and DNA and RNA metabolism were downregulated in the adult villus epithelium compared with the endoderm (Fig. 3A). Many fewer probe sets displayed a fourfold difference in gene expression measures when we compared hybridization probes generated from either adult crypt or adult villus RNA (143 and 250, respectively). Of note, the gene ontology terms related to lipid metabolism were overrepresented in the villus-expressed genes, whereas the gene ontology terms related to the cell cycle and DNA metabolism were overrepresented in the annotations of the crypt-expressed genes (Fig. 3B).
Combination of cis-element overrepresentation and gene expression analysis.
It has previously been demonstrated that an eukaryotic cell contains at least three classes of transcripts that differ in their abundance in the cell (8, 43), and we recently showed that this is also the case for the mouse small intestinal epithelium (38). For the bioinformatic analysis of overrepresentation of potential transcription factor binding sites in the promoters for differentially expressed genes, we chose to focus our analysis on promoters for genes encoding transcripts that change expression level from one abundance class to another during development from endoderm to villus or during crypt to villus differentiation. Three different abundance classes, corresponding to low expression, medium expression, and high expression, were defined on the basis of gene expression measures (see methods for details). We therefore concentrated on the corresponding 12 relevant gene expression patterns, and we constructed six lists of promoters controlling genes that change expression from one abundance class to another during endoderm to villus development and six lists of genes that change expression pattern from one abundance class to another during crypt to villus differentiation. The genes chosen for the lists should show a shift in mRNA abundance, and the difference in expression should also be significant using an unpaired t-test (P < 0.05). We subsequently analyzed these promoter lists for overrepresentation of potential transcription factor binding sites using a search algorithm based on PWMs for transcription factor binding sites. The list of PWMs contained 65 PWMs (Supplementary Table 8) for vertebrate transcription factors and was derived from the Transfac database. Although the search algorithm was similar to a previously published algorithm (14), we used an in-house implementation that was slightly different and considerably faster.
The most significant finding was the overrepresentation of potential HNF-4 binding sites in the promoters of genes that were upregulated to a high expression level in the villi compared with the endoderms or to the crypts from adult mice (Figs. 4 and 5, Supplementary Tables 2–4). Some interesting features also arose from the analysis of the genes with lower expression in the villi compared with the endoderms or to the crypts (Figs. 6 and 7, Supplementary Tables 5–7). First, the PWM with the accession number M0050 (describing potential binding sites for the E2F transcription factor) had overrepresentation of hits in four of the six expression patterns for downregulated genes during differentiation. Second, the PWMs describing potential binding sites for the Myc transcription factor had overrepresentation of hits in the promoters of the genes that changed expression from a medium level of expression in the crypts or in the endoderm to a low level of expression in the villi. Third, PWMs describing potential binding sites for the transcription factors nuclear factor (NF)-Y, cAMP responsive element binding (CREB), and YY1 had an overrepresentation of hits in the promoters of genes that decreased expression from a medium or high level of expression in the endoderm to a lower level of expression in the villi. Finally, PWMs describing potential binding sites for STAT, ELK, and ETS transcription factors also had overrepresentation of hits in the comparisons between downregulated genes during endoderm to villus development.
HNF-4 binds to target genes in the villus epithelium.
Immunocytochemical analysis (Fig. 8) with an HNF-4 antibody showed that the HNF-4 protein is absent from the epithelial cells located in the lower third of the crypts but expressed in the nuclei of cells located from the upper two-thirds of the crypt to the tips of the villi. Villus epithelial cells were subsequently isolated and macromolecules cross-linked with formaldehyde. After sonication, DNA cross-linked to HNF-4 was precipitated using the same HNF-4 antibody that was used for the immunocytochemical analysis; the precipitations of specific promoter regions were analyzed by real-time quantitative PCR. Four promoters (Apoa4, apolipoprotein A4; Anpep, aminopeptidase N; Numb, numb gene homolog; Mep1a, meprin 1a) were selected from the list of genes (Supplementary Table 10) that both are upregulated during crypt to villus differentiation and contain potential HNF-4 binding sites as predicted by our search algorithm. The Cd24a gene, which is downregulated during crypt-villus differentiation and which does not contain a predicted potential HNF-4 site in its promoter region, was selected as a negative control promoter. The Apoa4 and Mep1a promoter fragments were enriched in the HNF-4 immunoprecipitated cross-linked chromatin, both compared with the negative control Cd24a promoter and compared with the amounts precipitated without the primary HNF-4 antibody (Fig. 9). The Anpep and Numb promoters were not significantly enriched compared either with the negative Cd24a control promoter or when the primary HNF-4 antibody was omitted. The Cd24a negative control promoter itself was also not enriched in the HNF-4-immunoprecipitated chromatin compared with the control situation without the primary HNF-4 antibody. The Apoa4 promoter is already known to be regulated by HNF-4 (3), whereas the Mep1a promoter has not previously been reported as an HNF-4 target promoter. We therefore also tested whether the Mep1a promoter was responsive to cotransfection with an expression vector for HNF-4. We used cotransfection in HeLa cells, and we have previously shown that in this system that cotransfection of an expression vector for HNF-4 activates the human intestinal alkaline phosphatase promoter (ALPI) 1.5- to 2-fold and that this activation depends on the presence of an HNF-4 binding site in the ALPI promoter (28). As shown in Fig. 10, the Mep1a promoter is stimulated significantly (1.8-fold) by HNF-4 cotransfection in HeLa cells; furthermore, the activation is comparable to the activation of the positive control ALPI promoter.
Villus and crypt epithelial cells differ in their content of lipid metabolites.
Our analysis thus far implicated HNF-4 as a villus gene regulatory node with many connected genes in the villus enterocyte. In the liver, HNF-4 is involved in lipid metabolism (47); we therefore investigated crypts and villi for their content of lipid metabolites. Eight crypt and seven villus preparations were prepared for magic angle 1H NMR spectroscopic analysis. Protons in a magnetic field will at the correct resonance frequency absorb energy from electromagnetic radiation. This absorption of energy can be measured in an NMR spectrometer, and the signal strength is proportional to the concentration of resonating protons in the sample. The resonance frequency depends on the chemical environment the protons are situated in. The shift in resonance frequency for protons in a specific molecular environment compared with protons in the environment of a reference compound is referred to as the chemical shift (δ) and is measured in ppm. Figure 11A shows a 1H NMR spectrum generated with crypt and villus samples, respectively. For illustration purposes, two peaks are pointed out. One signal, at 1.29 ppm, is higher in the villus sample compared with the crypt sample, whereas another peak, at 3.21 ppm, is higher in the crypt sample compared with the villus sample. The signal at 1.29 ppm comes from protons in the chemical environment —(CH2)n— (the protons giving the signal are indicated in bold) and is a signal typically obtained from lipid carbon chains such as fatty acid chains. The signal obtained at 3.21 ppm was generated by protons in the three methyl groups of choline, and choline-containing metabolites are responsible for generating this peak, which is higher in the crypt samples compared with the villi samples. In Fig. 11B, the spectra from all samples are integrated into a single figure that shows the average difference in the resonance signal strength between the villus and crypt samples as a function of the chemical shift. Peaks representing NMR signals from metabolites with highest concentration in villi point upward (peaks with positive values), whereas NMR signals from crypt enriched metabolites point downward (peaks with negative values). O-PLS regression was used to construct a multivariate model for classification of crypts and villi samples based on the NMR spectra. Figure 11C displays a score plot of this O-PLS model. The model separates crypt and villus in the first dimension because the crypt samples are plotted to the left on the x-axis, whereas the villus samples are plotted to the right. The regression weights from the O-PLS model were used to give an estimation of the validity of each resonance peak displayed in Fig. 11B. Thus the peaks with yellow to red colors contribute the most to discriminate villi from crypts in the O-PLS model; they therefore reflect the most villus- or crypt-enriched metabolites, respectively. The most valid resonance signals that characterize villi (positive, yellow to red peaks) are almost all related to lipid carbon chains. Thus the molecular structures —CH═CH—, ═CH—CH2—CH═, —CO—CH2—CH2—, —CH2—CH═, —(CH2)n—CH3 can all be found in either saturated or in unsaturated fatty acids present in membrane lipids, triglycerides, or lipoproteins. Apart from lipids, more lactate is present in the villi compared with the crypts. The metabolites that characterize crypts are glucose, glycogen, and choline-containing compounds. In conclusion, the results suggest that lipids related to saturated and unsaturated fatty acid chains are present in higher concentrations in villi compared with crypts.
Bioinformatic support for a connection between genes having potential HNF-4 binding sites in their promoters and lipid metabolites in the villus enterocyte.
The PLS multivariate analysis procedure can also be used to model metabolite data as a function of the gene expression data and thereby uncover functionality of genes. For such an analysis, the gene expression data were used as X-variables and the NMR data as Y-variables in ordinary PLS regression (for a review see Ref. 1). To find genes that are positively correlated with the increased lipid resonances, the genes with the highest positive regression coefficients with respect to the lipid resonance at 1.29 ppm (see Fig. 11) were extracted. We selected 235 probe sets in this way. Of the corresponding genes (Supplementary Table 9), 113 had a promoter represented in our database and were selected for a cis-element overrepresentation analysis. The PWM M00411, representing binding sites for HNF-4, was the only one of the 65 matrices that had a significant overrepresentation of hits in the promoters (36 promoters with hits and 77 without; P = 0.004 after Bonferroni correction). Thus, in villus cells, there is a correlation between the presence of villus-enriched lipids and the expression of genes that have potential HNF-4 binding sites in their promoters. Clearly some of these genes might display a correlation in their expression pattern with the concentration of lipids simply by chance, even without being involved in the metabolism of lipids. Thus an independent approach was taken to obtain additional support for a connection between potential HNF-4 binding sites and lipid metabolism in the villus enterocyte. Two lists of genes that were upregulated in the villi compared with either the crypts or the endoderm and which are annotated with the gene ontology term “lipid metabolism” were generated by the GoSurfer program (see Fig. 3). The promoters for these genes were subsequently analyzed for overrepresentation of potential HNF-4 binding sites. In both cases, a significant overrepresentation of HNF-4 binding sites was detected (P = 2 × 10−3 for the crypt-villus gene list and P = 1 × 10−5 for the endoderm-villus gene list). Thus genes that are upregulated in the villi during enterocyte differentiation and that are annotated with the term “lipid metabolism” have an overrepresentation of potential HNF-4 binding sites in their promoters.
Significance of a genome-wide approach to enterocyte transcriptional gene regulation.
In the present work we took a systems biology approach to enterocyte differentiation and physiology, an approach based on metabolome and quantitative gene expression data from endoderm, crypt, and villus epithelium. Transcriptional regulation of gene expression during enterocyte differentiation has previously been approached by studying single genes. Such studies formulated the hypothesis that HNF-1 and CDX2 were transcription factors that might be of a more general importance for gene expression in the differentiated enterocyte (39). Much focus has subsequently been placed on the CDX2 transcription factor, which was found to regulate the small intestine-specific disaccharidases sucrase-isomaltase (37) and lactase-phlorizin hydrolase (40). Surprisingly, potential CDX2 binding sites were not found to be overrepresented in the genes with higher villus than crypt or endodermal expression in our work. Our search algorithm did detect CDX2 and HNF-1 binding sites in the correct positions of the human sucrase-isomaltase promoter. The lack of overrepresentation of CDX2 and HNF-1 binding sites in our analyses is therefore not due to a poor PWM; rather, it is due to the low number of potential target promoters found. The sucrase-isomaltase and lactase-phlorizin hydrolase genes were not represented on the Affymetrix MOE430 A 2.0 GeneChip used in our work, and they were therefore not included in our analysis. The lack of a few genes, however, did not disturb the overrepresentation analysis. The genes upregulated from a medium level in the crypts to a high level of expression in the villi had, for example, only 16 promoters with hits for the CDX2 PWM (M00729), whereas 202 promoters were without hits. HNF-4 in contrast had 58 promoters with hits and 160 promoters without hits for its PWM. Thus a substantial increase in the number of promoters with hits would be needed to yield significant overrepresentation of CDX2 binding sites in the upregulated promoters. CDX2 is undoubtedly very important for intestine-specific gene expression, and it might also target some important regulators of enterocyte differentiation; yet it seems to be less important than HNF-4 when it comes to activating differentiation-induced genes, which carry out the physiological functions of the differentiated enterocyte. The same arguments are true for HNF-1. The PWM (M00790) used for HNF-1 can find the correctly positioned hits in known target promoters (e.g., Anpep), but it is again the low number of potential HNF-1 binding sites in the promoters for upregulated genes along the crypt villus axis that explain why HNF-1 is not found to be overrepresented (21 with hits vs. 197 without hits for same list of promoters as mentioned above for CDX2).
Therefore, an important lesson from our work is that a genome-wide approach is likely to generate different conclusions concerning transcription factor importance than an approach based on a few selected model promoters. Moreover, our conclusions from the genome-wide approach are supported by our findings that c-Myc and E2F transcription factors are important for crypt cell functions (see below), as is expected from the known functions of these transcription factors.
The c-Myc and E2F crypt and endoderm gene regulatory nodes are likely to reflect differences in cell proliferation and stem cell maintenance.
An important role of the family of E2F proteins is to regulate the cell cycle during G1/S transition and DNA synthesis (for reviews see Refs. 11, 13). The activities of the E2F transcription factors are under the regulation of pocket proteins of the retinoblastoma protein (Rb) family during these processes. The Rb and E2F interaction is in turn regulated by cyclin-dependent kinase complexes (for reviews see Refs. 2, 10). Elkon and colleagues (14) convincingly demonstrated this connection between E2F binding sites and genes involved in cell cycle control by coupling the gene ontology annotation with promoter cis-element analysis. In addition, potential binding sites for NF-Y, Sp1, and nuclear respiratory factor-1 were reported to be overrepresented in the promoters of genes annotated with the terms “cell cycle control,” “mitotic cell cycle,” and “DNA metabolism.” Here we report the overrepresentation of potential binding sites for E2F in the promoters of genes that are downregulated during crypt to villus differentiation. In addition, overrepresentation of potential binding sites for E2F and NF-Y are found in the promoters of genes with higher expression in the endoderm than in the villus. Considering the fact that the crypts and endoderm harbor proliferative epithelial cells, the E2F and NF-Y gene regulatory nodes most likely reflect activity in the cell cycle process in the crypts and in the endoderm. This is also supported by the functional annotation analysis that showed overrepresentation of genes annotated with functions related to the cell cycle and DNA metabolism among the downregulated genes.
In the promoters of genes with higher expression in endoderms and crypts than in villi, we also find overrepresentation of potential c-Myc binding sites. The c-Myc oncoprotein is known to be a downstream nuclear target in the Wingless signaling pathway (18). The secreted Wnt proteins bind to seven transmembrane receptors (from the Frizzled family) and mediate β-catenin stabilization. The stabilization process allows β-catenin to associate with the T-cell factor (Tcf)-4 transcription factor in the small intestinal epithelial cells. The Tcf-4/β-catenin complex subsequently translocates to the nucleus and activates target genes including Myc (for a review see Ref. 7). Our GeneChip analysis also shows that the Myc probe set expression measure is 2.3-fold higher in the crypts than in the villi and 4.2-fold higher in the endoderm than in the villi (use NM_010849 as search criteria at the MouseCVDB and FETALINTDB web pages). A preferential expression of c-Myc mRNA and c-Myc protein in crypts was recently also reported by Mariadason and colleagues (25), further suggesting that the c-Myc gene regulatory node is indeed important in the intestinal crypt cells. An intact Wingless signaling pathway has been shown to be crucial for survival of small intestinal stem cells, since the inactivation of both the mouse Tcf4 alleles leads to stem cell depletion and small intestinal dysfunction in early postnatal life (23). In our opinion, the c-Myc crypt gene regulatory node therefore most likely reflects Wingless signaling in stem cells and their immediate progeny.
The villus HNF-4 gene regulatory node integrates enterocyte physiology.
Lipid carbon chain metabolites distinguished the villi from the crypts in the NMR metabolite spectra. Fat absorption takes place in the differentiated villus enterocytes and is one major metabolic difference existing between crypt and villus cells (35). The synthesis of specialized lipoproteins, the chylomicrons, is essential for fat absorption. The NMR chemical shift reported for lipoprotein (17) coincides with the villus lipid peaks reported here (Fig. 11). It is therefore likely that a higher concentration of lipoproteins was observed in villi compared with crypts. To support the hypothesis that the differences in the lipid metabolite profiles between the villi and the crypt enterocytes are related to lipoprotein synthesis, we inspected the list of genes that both have increased expression in the villi (to a medium or a high level of expression) and contain potential HNF-4 binding sites in their promoters (Supplementary Table 10). Three genes with clear relevance to chylomicron synthesis were found on this list; these are the apolipoprotein C-III (Apoc3) gene, the Apoa4 gene, and the microsomal triglyceride transfer protein gene. The formation of the chylomicron precursor occurs in the endoplasmic reticulum and is followed by the transfer of triglycerides into the chylomicron precursor, a process catalyzed by the microsomal triglyceride transfer protein (for reviews see Refs. 19, 49). Apoc3 is an apolipoprotein found in mature chylomicrons, whereas Apoa4 stimulates chylomicron formation by an unknown mechanism (for a review see Ref. 42). In addition, we directly demonstrated by chromatin immunoprecipitation that the Apoa4 promoter binds HNF-4 in the villus enterocytes. These findings can therefore explain how HNF-4 in the villus enterocytes directs the villus-specific expression of genes that can establish marked differences in the lipid profile between villus and crypt cells.
The list of genes with higher villus than crypt expression and with potential HNF-4 binding sites in their promoters contained other interesting genes that might play important functional roles. Of particular note is the Mep1a gene, which encodes a brush border metalloproteinase (for a review see Ref. 44). In the present work, we directly demonstrate the binding of HNF-4 to the Mep1a promoter in villus epithelial cells and the activation of the promoter in HeLa cells by HNF-4 overexpression. The human lactase phlorizin hydrolase gene was recently demonstrated to be regulated by an upstream enhancer that also contained and HNF-4 binding site (24). Thus HNF-4 also affects the expression of genes involved in the extracellular hydrolysis of carbohydrates and proteins.
Figure 12 depicts our integrated model, which is the outcome of our experimental strategy depicted in Fig. 1. Three significant gene regulatory nodes with clear functions are detected by our bioinformatic cis-element overrepresentation analysis. The E2F and c-Myc transcription factors form the two crypt gene regulatory nodes, and these transcription factors are involved in regulating cell proliferation and stem cell maintenance. The involvement of E2F and c-Myc in these processes has already been demonstrated experimentally by others; in our work, we tie the two transcription factors to the function of the undifferentiated intestinal epithelial cells using a completely different approach. The HNF-4 transcription factor forms a villus gene regulatory node. The crypt-villus expression gradient of potentially HNF-4-regulated genes correlates with crypt-villus concentration gradient of metabolites with lipid carbon chains. Together with the functional annotation analysis, this supports the hypothesis that one consequence of HNF-4-mediated transcription in the villus enterocyte is to increase the villus content of lipids by stimulating the expression of genes involved in lipid metabolism.
This work was supported by grants from The Danish Medical Research Council, The Novo Nordic Foundation, The Lundbeck Foundation, the Alfred Nielsen and Wife’s foundation, and Institut National de la Santé et de la Recherche Médicale. L. Ritie is a recipient of a fellowship from the French Ministry of Research and Education.
Susanne Smed from the MicroArray Center (Rigshospitalet, Copenhagen, Denmark) is thanked for valuable assistance with the Affymetrix GeneChip hybridizations and scannings. LiseLotte Laustsen is thanked for valuable technical assistance. Professor Hans Sjöström is thanked for fruitful discussions during the whole project period. Drs. Chaim Linhart and Rani Elkon are thanked for providing a Linux-executable version of the PRIMA program.
Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).
Address for reprint requests and other correspondence: J. Olsen, Dept. of Medical Biochemistry & Genetics, The Panum Inst., Bldg. 6.4, Univ. of Copenhagen, DK-2200N Copenhagen, Denmark (e-mail:).
- Copyright © 2006 the American Physiological Society