|
|
||||||||
1 Renal Division, Department of Medicine
2 Center for Neurologic Diseases, Brigham and Womens Hospital, Harvard Medical School, Boston 02115
3 Department of Physics, Wesleyan University, Middletown, Connecticut 06459
4 Bioinformatics and Metabolic Engineering Laboratory, Department of Chemical Engineering, Massachusetts Institutes of Technology, Cambridge, Massachusetts 01890
5 Bioinformatics Program
6 Department of Biomedical Engineering, Bioinformatics Program, Boston University, Boston 02215
7 Department of Pathology, Brigham and Womens Hospital, Harvard Medical School, Boston 02115
8 Center for Neurologic Diseases, and Division of Neuropathology, Department of Pathology, Brigham and Womens Hospital, Harvard Medical School, Boston 02115
9 Molecular Neurogenetics Unit, Massachusetts General Hospital, Charlestown Massachusetts 02129
10 Division of Womens and Perinatal Pathology, Department of Pathology
11 Division Thoracic Surgery
12 Cardiovascular Division, Brigham and Womens Hospital, Boston, Massachusetts 02115
13 Affymetrix, Inc., Santa Clara, California 95051
| ABSTRACT |
|---|
|
|
|---|
7,000 genes analyzed, 451 genes are expressed in all tissue types and designated as housekeeping genes. These genes display significant variation in expression levels among tissues and are sufficient for discerning tissue-specific expression signatures, indicative of fundamental differences in biochemical processes. In addition, subsets of tissue-selective genes are identified that define key biological processes characterizing each organ. This compendium highlights similarities and differences among organ systems and different individuals and also provides a publicly available resource (Human Gene Expression Index, the HuGE Index, http://www.hugeindex.org) for future studies of pathophysiology. microarrays; human tissues; gene expression; bioinformatics
| INTRODUCTION |
|---|
|
|
|---|
30,000 human genes. Toward this end, a fundamental and primary objective is to define global patterns of gene expression that characterize human tissues in normal and disease states. DNA microarrays, along with other high-throughput approaches, can successfully elucidate expression patterns that distinguish disease states such as different types of cancers (2, 3, 6, 11, 15, 28). Individually, these distinguished genes are potential molecular markers or potential therapeutic targets for a disease process (5, 13, 14, 18, 23, 28, 38, 41). Establishment of baseline expression patterns in normal tissues is an essential element in accurate interpretation of those changes associated with pathological states. In the present study we use oligonucleotide microarrays (GeneChip HuGeneFL) to analyze expression of 7,070 unique sequences in 59 tissue samples representing 19 healthy human tissue types. The purpose is to create a database that can serve as a reference or compendium of expression profiles for studies of human disease. Using a variety of statistical approaches, we identify gene expression patterns that characterize different tissue types. The results reveal striking quantitative similarities and differences among tissues, even for those genes expressed constitutively.
| METHODS AND MATERIALS |
|---|
|
|
|---|
Histology.
Samples were fixed at room temperature in neutral pH phosphate-buffered 10% formalin, dehydrated in graded alcohols, and embedded in paraffin using an automated tissue processor. Four-millimeter-thick paraffin sections were rehydrated and stained routinely with hematoxylin and eosin. Light microscope examination was performed to confirm normal tissue morphology. Histological sections of the tissues are available at http://www.hugeindex.org.
RNA preparation for hybridization.
Total RNA was isolated using Trizol solution (GIBC-BRL, Life Technologies, Rockville, MD). Seven micrograms of total RNA was used for amplification, and the amplified product was labeled with biotin following a procedure described previously (7, 25, 39). Briefly, double-stranded cDNA was synthesized using the SuperScript Choice System (GIBCO-BRL) and a T7-(dT)-24 primer (Geneset Oligos, La Jolla, CA). The cDNA was purified by phenol/chloroform/isoamyl alcohol extraction with Phase Lock Gel (5Prime
3Prime, Boulder, CO) and concentrated by ethanol precipitation. In vitro transcription was performed to produce biotin-labeled cRNA using a BioArray HighYield RNA Transcript Labeling Kit (Affymetrix) according to the manufacturers instructions. cRNA was linearly amplified with T7 polymerase. The biotinylated RNA was cleaned with RNeasy Mini kit (Qiagen, Valencia, CA).
Labeled cRNA, 20 µg, was fragmented and hybridized using the protocol described previously (25). Briefly, the hybridization mixture was incubated at 99°C for 5 min. followed by incubation at 45°C for 5 min. The hybridization was then carried out at 45°C for 1618 h. After being washed, the array was stained with streptavidin-phycoerythrin (Molecular Probes, Eugene, OR), amplified by biotinylated anti-streptavidin (Vector Laboratories, Burlingame, CA), and then scanned on an HP Gene Array scanner. The intensity for each feature of the array was captured with Affymetrix GeneChip Software, according to standard Affymetrix procedures (25) by performing typical scaling (with target intensity of 100) and normalization for all probe sets.
Quality control of samples.
Approximately 50% of total RNA collected from tissues were discarded secondary to unsatisfactory quality on a 1% agarose gel. Each probe array contains several prokaryotic genes (e.g., bioB, bioC, and bioD are genes of the biotin synthesis pathway from the bacteria Escherichia coli, Cre is the recombinase gene from P1 bacteriophage), which serve as hybridization controls. In addition, expression levels of 3' to 5' for both ß-actin and glyceraldehyde-3-phosphate dehydrogenase (GAPDH) were evaluated; the 3'/5' ratio should be less than 3 according to the manufacturers instructions. Data that failed to meet this criteria were excluded from analysis.
Statistical analysis.
The Affymetrix GeneChip 3.1 Expression Analysis Algorithm present (P) or absent (A) calls were used to identify maintenance/housekeeping genes. All genes with a present call in at least one sample of each tissue type were included in the maintenance/housekeeping set [marginal (M) calls were conservatively treated as absent]. A hierarchical clustering algorithm (AGNES) (22) in the statistical analysis package SPLUS (37) was used to group the tissue samples using only the housekeeping genes. Using the "Manhattan" distance metric, variables standardized, and the "Ward" linkage algorithm, we found that the 451 housekeeping genes alone were sufficient to clearly group the different tissue types.
To identify tissue-selective genes, we used a two-tailed t-test to distinguish the gene expression levels in each tissue type from all other tissue samples at a 99.99% confidence level. The two-tailed t-test makes underlying assumptions about the distribution of the data, and this high confidence level was chosen to ensure that the list of tissue-selective genes obtained would still be reasonable, even though the assumptions may be met only in part (34). The tissue-selective genes obtained were ranked by their significance value, which determines the probability of observing a given level of discrimination for a gene by random chance. The lower the P value, the better the tissue-selective nature of the gene. A subset of 98 genes with the lowest P values, 14 from each tissue-selective subset from the brain, kidney, liver, lung, muscle, prostate, and vulva, were then used in a principal component analysis (PCA) to separate the tissue samples in PC space. Before performing the PCA, the data were autoscaled such that each gene had a mean of zero and unit standard deviation. This analysis was done using MATLAB. Finally, the coefficient of variation (CV = standard deviation/mean) for each tissue type was calculated to identify the tissue-variant genes.
| RESULTS |
|---|
|
|
|---|
|
|
|
|
|
|
-enolase, and ion transporters (e.g., ß1-subunit of Na+-K+-ATPase, Na-Cl electroneutral thiazide-sensitive cotransporter, K-inwardly-rectifying channel, bumetanide-sensitive Na-K-2Cl cotransporter, amiloride-sensitive epithelial sodium channel, and amiloride binding protein 1). In addition, hydroxysteroid (11-ß) dehydrogenase 2 (11ß-HSD2), a gene that inactivates glucocorticoids and prevents them from binding to the nonselective mineralocorticoid receptor, is also highly expressed. In the kidney, it is this NAD-dependent high-affinity isoform which is thought to endow specificity on the receptor comprising nature of an autocrine protector of the mineralocorticoid receptor and play an important role in cardiovascular homeostatic mechanism (24).
As anticipated, the liver-selective genes include those associated with the coagulation pathway (e.g., factors II, V, VII, IXXII, fibrinogen, plasminogen, protein S, and antithrombin III), complement pathway (e.g., C2, C4, C5, C8, C9), alcohol metabolism (e.g., alcohol dehydrogenase), lipid process (e.g., apolipoproteins), bile metabolism (e.g., bile acid CoA:amino acid N-acyltransferase), antitrypsin member 8, and xenobiotic metabolism (e.g., cytochrome P-450). Additionally, serum amyloid A1 and A4, constitutive for amyloid fibril formation, angiogenin, ribonuclease, RNase for angiogenesis,
1-glycine amidinotransferase for creatine biosynthesis, cysteine dioxygenase, type I for cysteine metabolism, and genes associated with growth, such as insulin-like growth factor, growth hormone receptor, hepatocyte growth factor (HGF) activator, are highly expressed in liver as well. Unlike 11ß-HSD2, a gene specific in kidney, 11ß-HSD1 is the gene in liver involved in steroid metabolism.
The lung-selective genes include those associated with extracellular matrix (e.g., pulmonary surfactant associated protein), HLA/cytokine (e.g., MHC II,
-interferon inducible protein 30) and others (von Willebrand factor, claudin 5, palmitoyl-protein thioesterase 2, mannose receptor, lung cytochrome P-450). The muscle-selective genes include those associated with the cytoskeleton (e.g., actin,
1, actinin
23), contraction (e.g., tropomyosin, troponin, myosin), mitochondria (e.g., cytochrome C-1, ubiquitin, creatine kinase), and metabolism of glucose, glycogen, and lipids (e.g., lactate dehydrogenase, phosphoglucomutase 1, carnitine palmitoyltransferase). Furthermore, carbonic anhydrase III for CO2 metabolism, creatine kinase, mitochondrial 2 (sarcomeric) for energy transduction, and gene for thermal regulation (neurotrophic tyrosine kinase, receptor, type 1) are also highly expressed in muscle.
The prostate-selective genes include those that are associated with hormones (e.g., prostate secretory protein), redox pathways (e.g., prostatic acid phosphatase, aldehyde dehydrogenase 6), cytoskeleton (actin-binding protein-278), and others (e.g., prostate-specific antigen, T-cell receptor-
, TGF-ß3, estrogen regulated LIV-1 protein). The vulva-selective genes include those associated with the cytoskeleton (e.g., keratin, ladinin, loricrin), extracellular matrix (e.g., desmoplakin 1, profilaggrin, epican, connexin 26, galectin 7, desmocollin), and hair follicle-related protein (e.g., basic/acidic hair keratin).
Identification of variant genes within a tissue.
Another question of significant interest is whether there are genes whose tissue-specific expression is highly variable between different individuals. This was done by calculating the CV for genes called "present" in all samples. Figure 4 shows a histogram depicting the distribution of CV among different kidney specimens. The mean CV for the distribution was 0.31 with a standard deviation of 0.25. The genes with CV score greater than two standard deviations away from the mean are highlighted, indicating those that are most variable in kidney. These transcripts include several known to be associated with disease phenotypes. For example, the Na-Cl electroneutral thiazide-sensitive cotransporter is the target of a major antihypertensive diuretic, and mutations in this gene can cause Gitelmans syndrome, an autosomal recessive disease characterized by diverse abnormalities in electrolyte homeostasis (7, 36). In addition, aldose reductase plays a key role in the diabetic complications of kidney, nerve, and retina (10, 19, 2931). We also observed similar distribution patterns of CV for brain, liver, lung, muscle, and vulva and identified small subsets (<2%) of genes that are highly variable (Supplement 5, published online at the Physiological Genomics web site and at http://www.hugeindex.org). In lung, the most variant genes include integrin-ß2, which has been shown to predispose individuals to recurrent bacterial infections (16, 20), and antileukoproteinase, which is involved in several chronic and acute diseases of the respiratory tract (4, 35). In liver, the most variant genes include insulin-like growth factor 2, a putative susceptibility factor for obesity (9, 12, 32); fibrinogen-
, defects of which are a cause of thrombophilia (26, 27); and hepatic lipase, a complete deficiency of which causes coronary atherosclerosis and premature dyslipidemia (17, 33). Each of the other 13 tissue types contains samples from less than 3 different individuals. Therefore, CVs were not calculated.
|
| DISCUSSION |
|---|
|
|
|---|
7,000 expressed sequences from 19 normal adult human tissue types. We identified a subset of 451 genes expressed in all normal adult human tissue types. This result supplements a previous report of 535 genes that were expressed in 11 fetal and adult tissues (39). Also, 358 of these maintenance/housekeeping genes are common to both lists. Functional annotation revealed that these genes participate in many active cellular processes. We also found that expression of many of these maintenance/housekeeping genes is highly variable. In particular, we report here that these maintenance/housekeeping genes alone contain "tissue-specific" expression patterns (Fig. 2), which may be used to distinguish an individual tissue type. These results suggest that the gene expression patterns of maintenance/housekeeping genes reflect intrinsic differences among the individual tissues, most likely related to differences in metabolic activity and cytoarchitecture. The ability of housekeeping genes to define different biological states suggests that they may be suitable for distinguishing different disease states as well. In a practical sense, these genes could be used as standard controls on all gene expression studies to facilitate data comparison among laboratories and across platforms.
We identified subsets of genes that are highly expressed in one tissue type but not in others. We labeled these "tissue-selective genes" rather than "tissue-specific genes," since very few were expressed in only a single tissue type and a number of human tissues were not included in our analysis. These tissue-selective genes, ranging from 75 to 621 genes per tissue, have enough power to provide class prediction using a three-dimensional PCA. Additionally, these subsets of genes are found to be closely associated with the major functions carried out by each specific tissue type, e.g., genes related to myelin proteins and glial differentiation in brain; genes involved in coagulation and complement pathways in liver; genes associated with channels and transporters in kidney; and genes for pulmonary surfactant proteins in lung. We suggest that these genes may represent potential "signature" genes for the specific tissues with the important caveat that more tissues need to be sampled (e.g., endocrine system) to refine the "tissue-selective" fingerprints. Furthermore, ongoing efforts to deduce the functions of orphan genes will benefit from defining those that have tissue-selective expression patterns, as this will highlight a limited number of biological processes that should be considered.
Although all samples used in this study are normal tissues from a histological perspective, one important observation from our study is the demonstration that, for a given tissue, different individuals have a small set of genes with highly variable expression. Logically, tissues from either biopsy or autopsy often contain multiple cell types, which are in various states. It is conceivable that the variations are due to differences in the cell types and their states when the tissues were collected. In addition, the differences of age, gender, underlying health, and medications may also play roles in the variation. Further studies will be needed to address these issues. Even so, the presence of tissue-variant genes is consistent with the notion that the genotypes and the inherent plasticity of human tissues of an individual may contribute to gene-specific expression.
| ACKNOWLEDGMENTS |
|---|
This work was supported by the Merck Genome Research Institute. In addition, support was provided by National Institutes of Health Grants DK-36031 and DK-58849 (to S. R. Gullans), DK-09987 (to L.-L. Hsiao), CA-80084 (to F. Dangond), NS-16367 (to M. Mahadevappa), and DK-58533 (to Gregory Stephanopoulos). This work was also partially supported by Grants DE-FG02-94ER-14487 and DE-FG02-99ER-15015 (to Gregory Stephanopoulos) from the Engineering Research Program of the Office of Basic Energy Science at the Dept. of Energy, by the BSG foundation (to R. Bueno), and by Integrative Graduate Education and Research Traineeship 9870710 (to P. Haverty).
| FOOTNOTES |
|---|
Address for reprint requests and other correspondence: S. R. Gullans, 65 Landsdowne, Cambridge, MA 02140 (E-mail: sgullans{at}rics.bwh.harvard.edu).
10.1152/physiolgenomics.00040.2001.
1 Supplementary Material (supplements 15) to this article is available online at http://physiolgenomics.physiology.org/cgi/content/full/7/2/97/DC1. ![]()
| REFERENCES |
|---|
|
|
|---|
Cys substitution associated with thrombosis. Thromb Haemost 84: 263270, 2000.[Medline]
This article has been cited by other articles:
![]() |
V. K. Sharma, N. Kumar, S. K. Brahmachari, and S. Ramachandran Abundance of dinucleotide repeats and gene expression are inversely correlated: a role for gene function in addition to intron length Physiol Genomics, September 11, 2007; 31(1): 96 - 103. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Limviphuvadh, S. Tanaka, S. Goto, K. Ueda, and M. Kanehisa The commonality of protein interaction networks determined in neurodegenerative disorders (NDDs) Bioinformatics, August 15, 2007; 23(16): 2129 - 2138. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Allegrucci and L.E. Young Differences between human embryonic stem cell lines Hum. Reprod. Update, March 1, 2007; 13(2): 103 - 120. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Synnergren, T. L. Giesler, S. Adak, R. Tandon, K. Noaksson, A. Lindahl, P. Nilsson, D. Nelson, B. Olsson, M. C.O. Englund, et al. Differentiating Human Embryonic Stem Cells Express a Unique Housekeeping Gene Signature Stem Cells, February 1, 2007; 25(2): 473 - 480. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Hsu, J. Chang, T. Wang, E. Steingrimsson, M. K. Magnusson, and K. Bergsteinsdottir Statistically designing microarrays and microarray experiments to enhance sensitivity and specificity Brief Bioinform, January 1, 2007; 8(1): 22 - 31. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Mueller, D. Papamichail, J. R. Coleman, S. Skiena, and E. Wimmer Reduction of the Rate of Poliovirus Protein Synthesis through Large-Scale Codon Deoptimization Causes Attenuation of Viral Virulence by Lowering Specific Infectivity J. Virol., October 1, 2006; 80(19): 9687 - 9696. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Liang, Y. Li, X. Be, S. Howes, and W. Liu Detecting and profiling tissue-selective genes Physiol Genomics, September 14, 2006; 26(2): 158 - 162. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Kullberg, M. A. Nilsson, U. Arnason, E. H. Harley, and A. Janke Housekeeping Genes for Phylogenetic Analysis of Eutherian Relationships Mol. Biol. Evol., August 1, 2006; 23(8): 1493 - 1503. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Guo, Y. Ma, R. Ward, V. Castranova, X. Shi, and Y. Qian Constructing Molecular Classifiers for the Accurate Prognosis of Lung Adenocarcinoma. Clin. Cancer Res., June 1, 2006; 12(11): 3344 - 3354. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Kim, D. J. Dix, K. E. Thompson, R. N. Murrell, J. E. Schmid, J. E. Gallagher, and J. C. Rockett Gene expression in head hair follicles plucked from men and women. Ann. Clin. Lab. Sci., March 1, 2006; 36(2): 115 - 126. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Ishii, A. M. Wallace, X. Zhang, J. Gosselink, R. T. Abboud, J. C. English, P. D. Pare, and A. J. Sandford Stability of housekeeping genes in alveolar macrophages from COPD patients Eur. Respir. J., February 1, 2006; 27(2): 300 - 306. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Hughes, S. J. Longhorn, A. Papadopoulou, K. Theodorides, A. de Riva, M. Mejia-Chang, P. G. Foster, and A. P. Vogler Dense Taxonomic EST Sampling and Its Applications for Molecular Systematics of the Coleoptera (Beetles) Mol. Biol. Evol., February 1, 2006; 23(2): 268 - 278. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. de la Grange, M. Dutertre, N. Martin, and D. Auboeuf FAST DB: a website resource for the study of the expression regulation of human gene products Nucleic Acids Res., July 28, 2005; 33(13): 4276 - 4284. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. V. Jongeneel, M. Delorenzi, C. Iseli, D. Zhou, C. D. Haudenschild, I. Khrebtukova, D. Kuznetsov, B. J. Stevenson, R. L. Strausberg, A. J.G. Simpson, et al. An atlas of human gene expression from massively parallel signature sequencing (MPSS) Genome Res., July 1, 2005; 15(7): 1007 - 1014. [Abstract] [Full Text] [PDF] |
||||
![]() |
K.M. Whitworth, C. Agca, J.-G. Kim, R.V. Patel, G.K. Springer, N.J. Bivens, L.J. Forrester, N. Mathialagan, J.A. Green, and R.S. Prather Transcriptional Profiling of Pig Embryogenesis by Using a 15-K Member Unigene Set Specific for Pig Reproductive Tissues and Embryos Biol Reprod, June 1, 2005; 72(6): 1437 - 1451. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Barber, D. W. Harmer, R. A. Coleman, and B. J. Clark GAPDH as a housekeeping gene: analysis of GAPDH mRNA expression in a panel of 72 human tissues Physiol Genomics, May 11, 2005; 21(3): 389 - 395. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Sharma, V. K. Sharma, S. Horn-Saban, D. Lancet, S. Ramachandran, and S. K. Brahmachari Assessing natural variations in gene expression in humans by comparing with monozygotic twins using microarrays Physiol Genomics, March 21, 2005; 21(1): 117 - 123. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Yanai, H. Benjamin, M. Shmoish, V. Chalifa-Caspi, M. Shklar, R. Ophir, A. Bar-Even, S. Horn-Saban, M. Safran, E. Domany, et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification Bioinformatics, March 1, 2005; 21(5): 650 - 659. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. B. Larrabee, K. L. Johnson, C. Lai, J. Ordovas, J. M. Cowan, U. Tantravahi, and D. W. Bianchi Global Gene Expression Analysis of the Living Human Fetus Using Cell-Free Messenger RNA in Amniotic Fluid JAMA, February 16, 2005; 293(7): 836 - 842. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. X. Jin, Y.-W. Leu, S. Liyanarachchi, H. Sun, M. Fan, K. P. Nephew, T. H.-M. Huang, and R. V. Davuluri Identifying estrogen receptor {alpha} target genes using integrated computational genomics and chromatin immunoprecipitation microarray Nucleic Acids Res., December 17, 2004; 32(22): 6627 - 6635. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. K. Raymond, J. Castle, P. Garrett-Engele, C. D. Armour, Z. Kan, N. Tsinoremas, and J. M. Johnson Expression of Alternatively Spliced Sodium Channel {alpha}-Subunit Genes: UNIQUE SPLICING PATTERNS ARE OBSERVED IN DORSAL ROOT GANGLIA J. Biol. Chem., October 29, 2004; 279(44): 46234 - 46241. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. B. Plotkin, H. Robins, and A. J. Levine Tissue-specific codon usage and the expression of human genes PNAS, August 24, 2004; 101(34): 12588 - 12591. [Abstract] [Full Text] [PDF] |
||||