|
|
||||||||
1 Gastrointestinal Research Group, Yale University School of Medicine, New Haven, Connecticut
3 Keck Laboratory, Yale University School of Medicine, New Haven, Connecticut
2 Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
4 Institute of Pathophysiology, Centre for Molecular Medicine, Medical University of Graz, Austria
| ABSTRACT |
|---|
|
|
|---|
GAPDH; housekeeping; microarray
| INTRODUCTION |
|---|
|
|
|---|
Despite publications identifying numerous novel housekeeping genes (47), the majority of investigators use only a single gene from a small panel of housekeeping genes, comprising ß-actin (ACTB), ß2-microglobulin, glyceraldehyde-3-phosphate dehydrogenase (GAPDH), hydroxymethylbilane synthase, hypoxanthine phosphoribosyl-transferase 1 (HPRT1), ribosomal protein L13a, ribosomal 18S RNA, succinate dehydrogenase complex subunit A, TATA box binding protein, ubiquitin C, and tyrosine 3-monooxygenase to normalize real-time PCR results (1). Genes commonly used for normalization in neuroendocrine tumor (NET) tissue studies include GAPDH (23, 25, 37, 46), ACTB (37), HPRT1 (34), and glucose phosphate isomerase (4). A growing body of evidence, however, indicates that transcript levels of these commonly used housekeeping genes may vary considerably in different tissue types or under different experimental conditions, and that a "universal control gene" does not exist (15, 44, 47, 48). Furthermore, a recent study (45) revealed that conventional normalization strategies based on a single housekeeping gene can lead to normalization errors of up to 3- and 6-fold in 25 and 10% of cases, respectively, with some cases showing error values >20-fold (45).
The determination of a panel of genes that have robust expression in the specific experimental system being studied is essential to ensure accurate normalization and interpretation of results. A software program written by Vandesompele et al. (45), geNorm, identifies the most stably expressed gene or set of genes from among a pool of genes and estimates the number of genes required to calculate a robust normalization factor based on the geometric mean of these genes. The gene stability measure "M" that geNorm determines is defined as the average pairwise variation of a particular gene with all other potential reference genes (45). This measure is based on the principle that the expression ratio of two ideal control genes should be identical in all samples; thus genes with the lowest M value are the most stably expressed. This normalization algorithm has been successfully applied and validated in several recent studies (11, 13, 30–33, 36), and the clinical applicability of this approach has been recently highlighted in prostate cancer tumors (31). Normalization of expression levels of the matrix metalloproteinase (MMP) RECK by geNorm resulted in significantly different results (25% decrease vs. matched normal tissue) compared with the standard normalization approach using ACTB (10% increase in RECK in tumors). Expression of this gene is normally decreased in tumor samples. Choosing the "correct" normalization approach therefore is both essential and critical for obtaining reliable expression data, particularly when clinical markers are evaluated.
A question that remains unresolved is how to identify the appropriate housekeeping gene(s) for a particular experimental system. The choice of reference genes is often based on historical precedent, e.g., GAPDH or ß-actin, both originally used in Northern blots and RNase protection assays (15). The measurement of transcript levels of thousands of genes simultaneously using gene chip technology has, however, provided large databases of transcript information from various experimental systems that can then be used as alternative resources to identify reference genes (47). In this manuscript, we propose the approach of using these data resources for the identification of novel endogenous control genes that can be utilized to provide a more robust basis for the establishment of a reference gene set in a particular tissue or experimental set-up. Our report documents the identification and evaluation of a panel of eight reference genes as well as the commonly used GAPDH for real-time PCR normalization in a series of gastroenteropancreatic NETs (GEP NETs; previously referred to as carcinoid tumors), normal gastrointestinal (GI) tissues and a novel small intestinal carcinoid cell line (KRJ-I) (35). These 8 novel genes were selected based on statistical algorithms (outlier detection, robust feature selection methodology) applied to GeneChip data from 36 Affymetrix U133A GeneChip samples (GEP NETs and normal tissue) to detect genes with low variability. Expression levels of these reference genes were then measured by real-time RT-PCR in an independent set of GI tissue samples (n = 42) and the small intestinal carcinoid cell line (KRJ-I). The geometric averaging method (45) was used to identify the most robustly expressed control genes in the tissue set we interrogated. To further assess the efficacy of this strategy, the expression level of MTA1, a gene previously identified by our group as a potential marker of small intestinal NET malignancy (22), was then examined in the same set of GI tissue samples to establish which approach (geNorm or GAPDH) was superior for measuring target gene expression in normal GI tissue and GI NETs. Thereafter, to evaluate the utility of our normalization approach in non-NETs, we prospectively examined the expression levels of MTA1 in adenocarcinomas (n = 44) from the colon, pancreas, and breast and compared this with normal tissue (n = 26) from these GI and non-GI sites.
These approaches resulted in the identification of three reference genes, ALG9, TFCP2 and ZNF410, that may be used for a robust normalization of target gene expression measured by real-time PCR in both GI and non-GI adenocarcinomas and NET samples.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
GI and non-GI adenocarcinomas and normal tissues.
Seventy tissue samples comprising forty-four adenocarcinomas (21 colon, 10 pancreas, and 13 breast) and twenty-six normal tissues including samples from the colon (n = 10), pancreas (n = 8), and breast (n = 8) were studied. Because MTA1 overexpression is associated with malignancy (evidence of lymph node metastases) in these tumors (10, 17, 18, 41), we categorized each tumor using previously reported criteria (10, 17, 18, 41) into either tumors of low malignant potential (no pathological evidence of lymph node metastasis) or tumors with a high malignant potential (pathological evidence of metastasis).
All samples were obtained from the Cooperative Human Tissue Network and Yale New Haven Hospital. Tissues were either frozen at –70°C or placed in liquid nitrogen for storage before RNA extraction.
GeneChip
RNA extraction.
Total RNA was extracted from the 36 samples indicated in Table 1 using TRIzol (Invitrogen) followed by Qiagen RNeasy kit, and the RNA quality was assessed using Agilent Bioanalyzer (Agilent Technologies, Palo Alto, CA) to visually verify the absence of genomic DNA contamination, integrity, and ratio of 28S and 18S bands. Only samples with an absorbance ratio at 260 and 280 nm (A260/A280)
1.9 were used. Ten micrograms of very-high-quality total RNA were provided to the Keck Affymetrix facility where cRNA labeling, hybridization (U133A GeneChip), and data analysis were performed as described previously (22).
Hybridization.
The Affymetrix U133A array consists of
22,000 probe sets targeting 18,400 transcripts and variants, including 14,500 well-characterized human genes (http://www.affymetrix.com/products/arrays/specific/hgu133.affx). The hybridized arrays were scanned using a confocal laser fluorescence scanner (Agilent Microarray Scanner, Agilent Technologies). Arrays were scaled to an average intensity of 500 and analyzed independently using Microarray Suite (MAS) 5.0 software (Affymetrix, Santa Clara, CA).
Statistical analyses of Affymetrix GeneChip data.
The aim of the statistical analysis was to robustly identify candidate genes to be reference genes. To this end, raw expression data for each of the 36 samples and 22,838 probes/sample on the Affymetrix U133A chips were log transformed using Matlab (v.7; The Mathworks, Natick, MA). This is a standard approach that compresses the dynamic range of expression values, thus facilitating data interpretation (12). Log transformation of candidate reference genes previously identified in other tissues (45) in our sample data set identified a mean expression range of 6.2–10.1 with a standard deviation range of 0.23–0.49. Real-time PCR studies have previously determined that genes with large standard deviations could not be considered "stable" genes using the geNorm criteria (45). Since our aim was the identification of stable reference genes, we focused on target genes that showed a low standard deviation (<0.22) based on accepted array analysis criteria (12). To exclude genes that were called "absent" on arrays, we used a lower limit of mean log-transformed expression of 4.
To identify biologically relevant genes whose expression was neither dependent on the cell cycle nor transcriptionally linked, we 1) determined whether expression correlated with cell proliferation, 2) determined whether expression was highly correlated, and 3) focused on genes with known biological function. Probes that showed a correlation of r2 > 0.6 (regression analysis) with probes for the proliferation-associated genes, Ki-67 and PCNA, or probes that were highly correlated with one another were excluded from further analyses. Finally, only those genes with average raw expression values of >300 in all 36 samples were retained, resulting in a panel of 8 potential housekeeping genes that could be evaluated because of the availability of Assays-on-Demand products for real-time PCR analysis.
Real-Time PCR
RNA extraction.
Total RNA (2 µg) was extracted from 114 samples using TRIzol (Invitrogen) and then cleaned using the Qiagen RNeasy kit in conjunction with the DNeasy Tissue kit (Qiagen) to ensure that no contaminating genomic DNA was present (21). The clean RNA was then converted to cDNA using the High Capacity cDNA Archive kit (Applied Biosystems).
Real-time amplification.
Real-time RT-PCR analysis was performed using the Assays-on-Demand products listed in Table 2 (21). All samples were adjusted to 20 ng/µl cDNA before experiments; 1 µl of template cDNA was used per reaction. In addition to the eight candidate reference genes, the expression of MTA1, previously shown by our group to be a potential marker of neoplasia in GI NETs (22, 24), was evaluated in all GI samples (n = 42) using the following Assays-on-Demand product: Hs00183042_m1 (22). The expression of GAPDH (Hs99999905_m1) was also assessed, since this is the most commonly used reference gene for PCR normalization (15, 44). In a second set of studies in 70 tissue samples, ALG9, TFCP2, ZNF410, GAPDH, and MTA1 levels were measured by real-time PCR in colon, pancreas, and breast samples.
|
Cycling and fluorescence detection were undertaken using the ABI7900 Sequence Detection System. Non-RT controls were included in triplicate in each real-time RT-PCR experiment to ensure the absence of genomic DNA contamination. Cycling was performed under standard conditions (TaqMan Universal PCR Master Mix protocol), and the raw cycle threshold (CT) values were exported.
Determination of stable control genes.
The geNorm VBA applet for Microsoft Excel was used to determine the most stable genes from among the eight candidate reference genes (45). Raw CT values were transformed to quantities using the comparative delta CT method (26) where the highest relative quantity for each gene is set to 1 for input into geNorm. The gene expression stability (M) value for each gene was calculated by geNorm (Fig. 1). To estimate how many reference genes should be used, normalization factors based on the geometric mean of the expression levels of the "n" no. of best reference genes were calculated by stepwise inclusion of an extra less-stable reference gene (45).
|
| RESULTS |
|---|
|
|
|---|
Normalization
The ranking of the eight potential reference genes and GAPDH examined according to stability (least variability) was as follows: ALG9 > TFCP2 > ZNF410 > MCRS1 > UBR2 > POLR2B > NCOR1 > ZW10 > GAPDH, with GAPDH the least stably (most variably) expressed gene and ALG9 the most stably (least variably) expressed gene (19). The three reference genes estimated by geNorm to provide the most reliable normalization factor were ALG9, TFCP2, and ZNF410 (geNormATZ).
An examination of the raw CT values for each of these three genes and for GAPDH demonstrated that these four genes were differently distributed (P < 0.001; Kruskal-Wallis test). The three reference genes were both less variably distributed than GAPDH (coefficients of variation: 9.7–10.58 vs. 18.11%) (Fig. 2A), tended to be highly correlated (R2 0.86–0.96, P < 0.0001) (Fig. 2B), and exhibited a lower correlation with GAPDH (R2 0.253–0.28, P < 0.001) (Fig. 2C).
|
Importance of Robust Gene Normalization in GI NET Malignancy
To establish the utility of the geNorm approach in GI NETs, we first evaluated the reproducibility of the normalization factor calculated using geNorm from the three reference genes and compared this to GAPDH. An evaluation of the normalization factor demonstrated that this had a higher interassay reproducibility (2 separate real-time PCR studies) (Spearman = 0.93, P = 0.0011) than GAPDH (Spearman = 0.667, P = 0.0415).
We next compared the relative expression levels of MTA1 normalized by geNormATZ with MTA1 normalized by GAPDH in normal tissues to identify whether normalizing gene expression may have an organ-specific bias. An examination of the distribution of either MTA1/geNorm or MTA1/GAPDH within normal tissues demonstrated that there were no organ-specific differences in expression.
Having demonstrated no organ-specific differences, we examined the effect of each of the two normalization approaches on expression levels of MTA1. Normalization by geNormATZ resulted in significantly lower expression values for MTA1 than when this gene was normalized with GAPDH (P < 0.028, 2-tailed Wilcoxon signed rank test) (19). Using regression analysis, we could identify no relationship between genes normalized by geNormATZ and GAPDH (R2 < 0.3).
To evaluate the utility of each of these normalization approaches (geNormATZ and GAPDH), we next assessed the application of this stratification analysis strategy in normal tissue and compared this with primary NETs and metastases (Fig. 3). MTA1 was significantly elevated (P = 0.0006, 2-tailed Mann-Whitney test) in primary NETs compared with normal mucosa (Fig. 3A). Levels of MTA1 were increased, but not significantly, in metastases compared with normal tissue, but primary NET levels of MTA1 were significantly elevated compared with metastases (P = 0.0314). When GAPDH was used to normalize the data, these trends were evident, but differences in expression were not significant between any of the samples (Fig. 3B).
|
In colon samples, tumors that were classified as malignant (i.e., associated with lymph node metastases) had significantly elevated levels of MTA1 compared with either normal mucosa (P = 0.015, 2-tailed Mann-Whitney test) or nonmalignant tumors (P = 0.006, 2-tailed Mann-Whitney test) (Fig. 4A). When GAPDH was used to normalize the data, differences in expression levels were not significant between any of the samples (Fig. 4B).
|
In breast tissue samples, malignant tumors had significantly elevated levels of MTA1 compared with either normal breast tissue (P = 0.0007) or nonmalignant tumors (P = 0.0012) (Fig. 4E). When GAPDH was used to normalize the data, differences in expression levels were not significant between any of the samples (Fig. 4F).
KRJ-I Cell Line
Finally, we evaluated reference and target gene expression in the small intestinal carcinoid cell line KRJ-I. An analysis of ALG9, TFCP2, and ZNF410 transcript expression in this cell line demonstrated that raw CT values ranged from 24.9 to 27.4. GAPDH was expressed at higher levels in this cell line than the three reference genes (CT = 21.1–21.7). Normalization of potential NET marker gene levels, Ki-67, MAGE-D2, MTA1 and NAP1L1, by either geNormATZ or GAPDH confirmed lower values (
10-fold less) using the former approach. In general, both approaches were reproducible, although interassay variability using geNormATZ was less than when using GAPDH, and gene expression showed less of a difference when normalized using geNormATZ compared with GAPDH (Spearman r = 0.77, P = 0.051, vs. r = 0.54, P = 0.149).
| DISCUSSION |
|---|
|
|
|---|
The selection of candidate reference genes that showed little variation but high expression on GeneChip arrays, followed by real-time PCR and geNormATZ analysis, resulted in the identification of three reference genes that can be used for normalization of PCR data from GI NETs and their metastases.
The three genes, ALG9, TFCP2 and ZNF410, identified by geNorm to be the most robust reference genes exhibit the following features. ALG9 (asparagine-linked glycosylation 9 homolog) is encoded on chromosome 11 and catalyzes the transfer of mannose from Dol-P-Man to lipid-linked ligosaccharides (8); TFCP2 (transcription factor CP2) is encoded on chromosome 12 and is also recognized as
-globin transcription factor, CP2, with homology to Drosophila transcription factor Elf-1/NTF-1 (7); and ZNF410 (zinc finger protein 410), encoded on chromosome 14, is a transcription factor that activates transcription of matrix-remodeling genes such as MMP1 during fibroblast senescence (3). Expression of ZNF410 increases in senescent fibroblasts, but this only occurs at a protein level (3); mRNA levels appear to be constant throughout the fibroblast cell life span (3). Overall, relatively little is known about the transcriptional regulation of ALG9, TFCP2 and ZNF410, although they are all considered to be transcriptionally ubiquitously expressed (3, 39). The different functions of these genes, a mannosyltransferase and transcription factors, involved in unrelated cellular processes further emphasizes their utility as reference genes. In our sample sets, these three genes had a significantly lower coefficient of variation than GAPDH and were not differently expressed in GI tissue and metastatic targets (lymph nodes and liver), and expression levels were tightly correlated.
Overexpression of MTA1 mRNA and protein correlates with tumor invasion and metastasis in a variety of tumors including breast, hepatocellular, esophageal, gastric, pancreatic, and colorectal carcinomas (28, 40–43). Overexpression of this gene has been identified in small intestinal NETs (22), and expression levels are elevated in tumors with neuroendocrine features (14). An examination of the expression of this gene in normal GI tissue demonstrated that, following normalization, expression levels were similarly expressed in the stomach, small bowel, lymph nodes, and liver. Expression levels were significantly increased in primary NETs when appropriately normalized (geNormATZ), but this difference was less evident following normalization with GAPDH. The latter was similar to the results of normalization of MTA1 by the three less robustly expressed genes, NCOR1, POL2RB and ZW10 (Fig. 1), which demonstrated an increase in expression in tumor samples but not significantly so (P = 0.092). These results demonstrating increased MTA1 in primary tumors are supported by two earlier studies (14, 22) and suggest that a geNormATZ approach may be more sensitive than normalization approaches using GAPDH or other currently available reference genes.
Use of these three particular reference genes is not limited to normalization of marker genes in GI NETs. Approximately 40% of colon, pancreatic, and breast cancers overexpress MTA1, particularly when they exhibit lymph node metastases (10, 17, 18, 41). In our sample sets, MTA1 expression following geNormATZ was significantly elevated in malignant tumors from each of these organs: colon, pancreas, and breast. Of note was the observation that, in colon and breast tumors, this approach segregated the malignant tumor group from the nonmalignant tumors, which were not different from normal tissue. Tumors from the pancreas were also segregated using GAPDH as a normalization strategy, indicating that, while our normalization approach has a broad utility, for more precise assessment of specific neoplasia, it may be necessary to consider identifying organ-specific reference genes. The approach of identifying candidate reference genes in microarray databases provides an opportunity to identify suitable organ-specific reference genes for other organs and tumor types.
Identification of appropriate reference genes provides one mechanism to control for variation in real-time PCR studies. However, large interassay variability can be a feature of transcript measurements, particularly in clinical samples, with reports ranging from 2.7 to 25% (5, 9, 16). To minimize potential interassay variability, we examined gene expression in the small intestinal KRJ-I cell line. This was undertaken with the future objective of designing a PCR-based test useful in the clinical setting for quantitating transcript expression of candidate markers of malignancy.
KRJ-I is a continuous small intestinal carcinoid cell line with a rapid doubling time (1.7–1.9 days) that displays classic morphological and immunocytochemical features of an enterochromaffin cell carcinoid (20). It was established in 1992 from a multifocal ileal NET with an insular histological appearance (type I) (35). An examination of expression of four genes in this cell line demonstrated that geNormATZ for normalization was highly reproducible and provided the best interassay variability compared with a single gene-based normalization approach. This identification and characterization of marker gene expression in the KRJ-I cell line provides a reference point that can be included in future analyses of GI carcinoids and NETs. This will facilitate interassay comparisons, a requirement for any long-term laboratory analytical tool and of particular importance for tumors that are relatively rare (48).
On the basis of the material we examined, it appears that the routine use of GAPDH to normalize data from either GI NETs or colon and breast adenocarcinomas should be avoided, as this gene showed the greatest fluctuation among the samples examined. In this study, we have provided the rationale for examination of large transcript databases to identify reference genes in experimental set-ups that can be used as viable alternatives to the traditionally accepted housekeeping genes used thus far. This approach may be widely applicable to other neoplasia or biological situations where quantification of transcript expression is a critical element of the study design.
| GRANTS |
|---|
|
|
|---|
| FOOTNOTES |
|---|
Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. Kidd, I. M. Modlin, B. I. Gustafsson, I. Drozdov, O. Hauso, and R. Pfragner Luminal regulation of normal and neoplastic human EC cell serotonin release is mediated by bile salts, amines, tastants, and olfactants Am J Physiol Gastrointest Liver Physiol, August 1, 2008; 295(2): G260 - G272. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Visit Other APS Journals Online |