|
|
||||||||
1 Cardeza Foundation of Hematologic Research
2 Daniel Baugh Institute for Functional Genomics and Computational Biology, Department of Pathology, Jefferson Medical College, Philadelphia, Pennsylvania
| ABSTRACT |
|---|
|
|
|---|
erythropoiesis; hematopoietic progenitor; transcriptional regulation; expression profiling
| INTRODUCTION |
|---|
|
|
|---|
- and two ß-like globin chains coordinated with a heme moiety for oxygen binding and delivery. In humans, expression of globin genes is under developmental control, with different
- and ß-like globin chains expressed in a temporal fashion (for review, see Ref. 49a). Studies of the erythroid transcriptome in the context of hemoglobinopathies, a common class of diseases involving these genes, will offer insight into the regulation of globin gene expression. Creation of a model system that recapitulates erythropoiesis in the normal adult will allow dissection of the red cell and globin gene expression program. One consequence of these studies may be insights regarding reactivation of the fetal globin (Hb F) program, which may be of clinical benefit for patients with sickle cell disease and ß-thalassemia. Human HPCs, characterized by surface expression of CD34, have been used to study erythroid differentiation in vitro. CD34+ cells can be differentiated from bone marrow, fetal liver, cord and/or peripheral blood samples in semisolid and/or liquid cultures. Liquid culture systems yield relatively pure, reasonably synchronized erythroid cells recapitulating the developmental program in vivo (29, 37, 39, 47, 51). The early erythroid progenitors (burst-forming units, erythroid; or BFUe) proliferate and develop into colony-forming units, erythroid (CFUe), which divide and give rise to proerythroblasts, orthochromatic normoblasts, and enucleated erythrocytes. These systems generate sufficient material for molecular analysis, having been used to study accumulation of a limited number of mRNAs including globins, transcription factors, and cytokine receptors (47). Earlier studies showed fetal bovine serum (FBS)-induced Hb F in cultures from adults (37); thus we implemented a single-phase, serum-free, liquid culture system in which to differentiate CD34+ HPCs into erythroid progenitors. This system allows adult-derived HPCs to mature into erythroid progenitors that have downregulated fetal hemoglobin expression, thus recapitulating all the initial steps in erythroid development.
Genome-wide expression profiling allows insights into development of normal cell types. Recent studies focused on defining transcriptomic profiles of various human stem cell sources [bone marrow, cord blood, granulocyte-colony stimulating factor (G-CSF)-mobilized peripheral blood CD34+ cells] and subsets (CD34+Lin and CD34+Lin+) (3, 13, 17, 33, 41, 50, 56) to better understand distinctions between these cell populations as well as examine the effects of stroma (57), both of which are important and relevant to studies of human transplantation. Examinations of the erythroid transcriptome have involved use of embryonic stem cell-derived erythroid colonies (62), long-term cultured murine FDCP-mix cells (8), the G1E-ER4 GATA-1-null cell line (58), retrovirally transduced CD34+ cells (10), and our studies and those of others using the K562 chronic myelogenous leukemic cell line to model erythroid differentiation (1, 29, 39, 64). More recently, studies of in vitro erythroid differentiation of human bone marrow-derived HPCs were performed (27); however, the in vitro culture medium contained FBS, such that the regulation of the globin program in this setting may not be reflective of the in vivo environment. If erythroid differentiation and globin gene regulation are to be understood at the transcriptional level, transcriptome analysis must be performed sequentially during erythroid differentiation under conditions that recapitulate the in vivo globin gene expression program. In the present study, the changing transcriptome of in vitro differentiated erythroid progenitors from adult peripheral blood was examined using a culture system optimized to yield adult-derived erythroid progenitor cells expressing low Hb F. In this setting, we performed a bioinformatic characterization of the transcriptome at six time points during red cell development, allowing us to examine the expression patterns of known regulators of developmentally controlled globin programs and to trace their role during early erythroid development.
In the present study, we analyzed expression profiles in timed samples during erythroid differentiation, following an unbiased approach to define the minimal number of statistically significantly distinct clusters of coregulated genes across this time course. To overcome limitations of available clustering algorithms such as K-means clustering (52) and self-organizing maps (53), in which data sets are arbitrarily fit to a user-defined number of clusters, we used the silhouette coefficient (SC) metric to separate the differentially expressed (DE) set into clusters distinct from randomly permuted expression data (45). Then, we analyzed the promoters of these 1,504 DE genes, unclustered or divided into two, three, four, five, or six clusters, for enrichment or overrepresentation of transcription factor binding sites (TFBS) compared with a reference gene set using transcriptional regulatory network analysis (TRNA). We used our promoter analysis and interaction network toolset (PAINT) software (Ref. 54; http://www.dbi.tju.edu/dbi/tools/paint), which automates promoter retrieval and mining of existing databases for known regulatory information for a large number of genes identified in a particular biological experiment. Using this analysis, we identified enriched TFBS in the DE gene set as a whole as well as at the level of two, three, four, five, or six clusters of DE genes. The list of candidate regulators includes GATA, whose role in erythroid development has been well documented, PITX and ATF, whose functions in red cell development were identified only recently, and Nkx2.5 and Ecotropic virus integration site-1 (Evi-1), which have not been implicated previously in this process.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
40 ml) from 500 ml of peripheral blood was diluted 1:1 (vol/vol) with phosphate-buffered saline (PBS), pH 7.4, and gently layered on Ficoll-Hypaque (density 1.077 g/ml). Tubes were centrifuged (600 g, 15 min) at room temperature without applying the brake. The interphase was isolated, diluted with PBS, and centrifuged (
300 g, 5 min), and the mononuclear cell pellet was washed twice with PBS. The yield of mononuclear cells was generally 25 x 108 cells/buffy coat.
Isolation of CD34+ Cells, Erythroid Culture, and Cell Staining
Washed mononuclear cells (2 x 108 to 5 x 108 cells) were resuspended in 1 ml of PBS containing 2% (vol/vol) FBS and 1 mM EDTA in 12 x 75-mm polystyrene round-bottom tubes (BD, Franklin Lakes, NJ). EasySep CD34+ Positive Selection kit was used, as per the manufacturer's instructions (Stem Cell Technologies, Vancouver, BC, Canada). Purified CD34+ cells were cultured for 1 day in DMEM containing 20% (vol/vol) serum substitute (Stem Cell Technologies), 2 mM glutamine, 100 µg/ml each penicillin and streptomycin, 105 M ß-mercaptoethanol, 0.3 mg/ml holo-transferrin, 10 ng/ml IL-3, 10 ng/ml stem cell factor, and 4 U/ml erythropoietin. At day 1, nonadherent cells (1 x 104 cells /ml) were transferred to new flasks and cultured for up to 14 consecutive days at 37°C in 5% CO2. At days 7 and 10, cultures were supplemented with 2 ml of complete medium. The cells were harvested at the following time points: days 1, 3, 5, 7, 9, and 11. Each of three samples was pooled from three cultures for days 1 and 3 (from a total of 9 donors), while HPCs from three different donors were used to generate timed cell harvests at days 5, 7, 9, and 11. Cytospins were performed on days 1, 3, 5, 7, 9, and 11, and cell morphology assessed by Giemsa staining (36).
Benzidine Staining
PBS-washed cells were stained in a benzidine solution containing 0.6% (wt/vol) benzidine base, 2% (vol/vol) hydrogen peroxide, and 12% (vol/vol) acetic acid. At least 500 cells were counted at each time point using a light microscope to assess the percentage of cells that appeared blue because of staining of heme-containing globin tetramers.
Protein Analysis and Enzyme-Linked Immunosorbent Assays
Protein was estimated by the bicinchoninic acid (BCA) protein assay kit (Pierce, Rockford, IL). Erythroid progenitors were washed with PBS and lysed in RBC Lysis Buffer (Gentra Systems, Minneapolis, MN). Fetal hemoglobin concentration in cultured cells was determined using a colorimetric enzyme-linked immunosorbent assay (ELISA). The Hb F ELISA (Bethyl Lab, Montgomery, TX) uses a two-antibody sandwich to detect Hb F. The percent Hb F was determined by dividing micrograms of Hb F by micrograms of total hemoglobin, measured spectrophotometrically at 415 nm (NanoDrop ND-1000; NanoDrop Technologies, Rockland, DE) and calculated using 125 as the millimolar extinction coefficient for human hemoglobin (e.g., 128 µg/ml has an optical density of 1.0 at 415 nm) (55).
Hemoglobin Analysis via HPLC
Cord blood hemolysates were examined for Hb F via HPLC and used as standards in the ELISA assay. After centrifugation of hemolysates, the supernatant was filtered through Ultrafree-MC devices (Millipore, Bedford, MA) before cation exchange chromatography. Hemoglobins were separated on a 100 x 4-mm POLYCATA column (PolyLC, Columbia, MD) fitted to a Waters automatic HPLC system using a gradient of Buffer A (50 mM Na3PO4, pH 5.5, 2 mM KCN) and Buffer B (50 mM Na3PO4, 500 mM NaCl, 2 mM KCN). Hemoglobin was detected by absorbance at 540 nm. Ratios of Hb F to Hb A were calculated by peak integration using the Dynamax R Data Reprocessing Program (Rainin Instrument, Oakland, CA). Purified Hb standards (FASC) were used for reference (Helena Laboratories, Beaumont, TX). Modeling experiments in which mixtures of adult and cord hemolysates were analyzed by HPLC indicated that <2% Hb F in a 100-µg total hemoglobin sample is detectable.
Total RNA Isolation and cDNA Synthesis
DNA-free total RNA of cultured cells was isolated with RNeasy microkit (Qiagen, Valencia, CA), according to the manufacturer's instructions. In brief, 1 x 106 cells from triplicate cultures (days 1, 3, 5, 7, 9, and 11) were pelleted, lysed in RLT buffer containing 1% (vol/vol) ß-mercaptoethanol. DNase-treated RNA was ethanol precipitated and quantified on a NanoDrop ND-1000 spectrophotometer, followed by RNA quality assessment by analysis on an Agilent 2100 bioanalyzer (Agilent, Palo Alto, California). First-strand cDNA was synthesized using oligo(dT) and Superscript II RT (Invitrogen, Grand Island, NY). Alternatively, cDNA was prepared using OVATION RNA Amplification System (NuGen Technologies, San Carlos, CA).
Real-Time RT-PCR
Validation of DE genes.
cDNA was assayed in quadruplicate reactions using 2x SYBR Green JumpStart TAQ Ready Mix (Sigma-Aldrich, St. Louis, MO) according to the SYBR Green protocol at the following input RNA concentrations: 2 ng, 0.2 ng, and 0.02 ng. Relative steady-state levels of mRNA expression of each gene and a reference gene (GAPDH) were assayed on an ABI 7900 (Applied Biosystems, Foster City, CA) using the relative quantification or "2
CT" method, essentially as described previously (1).
Determination of
/
+ß globin mRNA levels.
For globin gene mRNA expression, real-time PCR assays were performed in quadruplicate using degenerate primers (forward 5'-GTC TAC CCW TGG ACC CAG AGG TTC-3', reverse 5'-GGC AAA GGT GCC CTT GAG R-3') (Integrated DNA Technologies, Coralville, IA) that simultaneously amplify both
- and ß-globin gene regions. Gene-specific probes (G 5'-[FAM] AGA TGC CAT AAA GCA CC-3', B 5'-[TET] GGC CTG GCT CAC C-3') (Applied Biosystems) were used in separate reactions to detect
- and ß-globin-amplified products. To document specificity of detection, plasmids containing cDNA for
- or ß-globin were used in individual real-time PCR assays, with detection of
-globin template limited to reactions containing the
-globin-specific probe and visa versa. A standard curve generated using increasing amounts of cultured erythroid progenitor cDNA (day 9) was used to confirm that the amplification efficiency of
- and ß-globin transcripts was equivalent. The difference in cycle thresholds (CT) for
- and ß-globin amplification (
CT) defines the fold increase in mRNA using the formula 2e
CT, where e is the efficiency of amplification, derived from cDNA titration experiments. A titration of cDNA input showed a constant ratio of
-globin to ß-globin mRNA over a range of input amounts and facilitated determination of the percentage of
-globin mRNA (
/
+ß) present in erythroid progenitors during development, using the following two equations:
1)
+ ß = 100, where the percentages of
-globin and ß-globin mRNA = 100%; and
2) ß/
= 2e
CT, where e is the average of the average efficiency of amplification for
- and ß-globin mRNA, CT is the cycle threshold, and
CT represents the difference in cycle thresholds for ß- and
-globin signals.
Microarray Methods
Linear amplification.
Ribo-SPIA-based RNA amplifications and target preparations were performed according to the manufacturer's instructions (Ovation Biotin System, NuGen). Briefly, first-strand cDNA was synthesized from 50 ng of total RNA using RT with a unique oligo(dT)/RNA chimeric primer. RNA was degraded by heating, and fragments served as primers for second-strand synthesis, yielding double-stranded cDNAs with RNA/DNA hetero-duplexes at one end. RNA in the hetero-duplexes was digested using RNase H added to the reaction with DNA polymerase and a second chimeric cDNA/cRNA primer (SPIA amplification primer). Amplification was continued using primer extension product hybridization to the target to reveal part of the priming site for subsequent primer hybridization and extension by strand displacement DNA synthesis. Amplified cDNA products were purified by Zymo Research DNA clean and concentrator (Zymo Research, Orange, CA).
Fragmentation and biotin labeling.
cDNA amplification products were fragmented and chemically labeled with biotin to generate biotinylated cDNA targets. Finally, biotin-labeled product was purified on a DyeEx 2.0 spin column (Qiagen, Germantown, MD).
Hybridization.
Fragmented and biotin-labeled target (2.5 µg) in 200 µl of hybridization cocktail was used for each Affymetrix HG U133 Plus 2.0 array (Affymetrix, Santa Clara, CA). Target denaturation was done at 99°C for 2 min, and hybridization was performed for 18 h. Arrays were washed and stained using GeneChip Fluidic Station 450, and hybridization signals were amplified using antibody amplification with goat IgG (Sigma-Aldrich) and anti-streptavidin biotinylated antibody (Vector Laboratories, Burlingame, CA). GeneChips were scanned using a GeneArray scanner 3000 (Affymetrix).
Bioinformatic analysis of mRNA expression profiling.
Fragmented biotin-labeled cDNA was hybridized to the Human Genome 133 Plus 2.0 oligonucleotide array chip (Affymetrix) containing 56,000 probe sets representing 34,000 well-characterized human genes. Chips were scanned with Affymetrix GeneChip Scanner 3000, and the data were scaled from each array to a target intensity value of 500 using GeneChip Operating Software (GCOS) v3.0. GeneSpring v7.2 (Silicon Genetics, Redwood City, CA) was utilized to set intensities less than zero to zero, normalize signal per chip to the 50th percentile, and normalize signal per gene to the median of each gene. The raw microarray data set (series no. GSE4655) can be accessed at the Gene Expression Omnibus (GEO) website (http://www.ncbi.nlm.nih.gov/geo/). The complete list of day 1 Present (P) calls is available in the Supplemental Materials (the online version of this article contains the supplemental data) including raw intensities of probe sets and gene identification information (Supplemental Table 1). Statistically significantly DE genes were identified from the normalized data using one-way analysis of variance (ANOVA), with analysis of the local false discovery rate employing a sliding window of 50 probe set P values (2, 14); the size of the window was chosen based on our group's experience with multiple data sets (11, 61). In contrast to overall false discovery rate (FDR) estimates, the local false discovery rate (fdr) estimates the false positive rate within a neighborhood of genes (chosen as 50 here). Often, as is the case in our data set, the choice of FDR threshold is not clear (e.g., why 10 and not 12%, or 14%, etc.?; with less-restrictive threshold resulting in an increasing no. of DE probe sets). In contrast, within a certain fdr range, the number of DE genes is relatively insensitive to the choice of a particular fdr threshold (2). This allows us to derive a differential gene expression list with a particular fdr threshold. A total of 1,953 probe sets were chosen as DE at a 10% local fdr threshold, which corresponds to allowing
7% overall false positives.
Gene annotation.
The expressed and DE lists were linked to the genome database NetAffx at the NetAffx Analysis Center (http://www.affymetrix.com) using Microsoft Excel and Access, and discrepancies identified by cross-reference to http://mriweb.moffitt.usf.edu/mpv/ (19) were manually corrected in Supplemental Table 2. Gene Ontology (GO) functions were assigned using Database for Annotation, Visualization and Integrated Discovery (DAVID v2.0; http://david.abcc.ncifcrf.gov/).
Clustering of DE Probe Sets
Clusters of different sizes ranging from 2 to 10 were obtained using partitioning around mediods (PAM) (23). The quality of the clusters was evaluated using SC metric (49). SC is a combined measure of cohesion within a cluster and separation between clusters utilizing both inter- and intracluster distances as a ratio. Following the computational negative control approach of Pearson et al. (46), the SC of different PAM clusters of actual DE data were compared with the SC of the randomly permuted DE data. Differences between SC of the actual and randomized data allowed determination of the number of nonrandom clusters that were well distinguished from clusters of randomized data. On the basis of this difference, we chose to analyze PAM results based on two to six clusters. We manually constructed a hierarchical schematic to illustrate the dominant movement of probe sets from the DE set into increasing numbers of clusters.
Transcriptional Regulatory Network Analysis
With the use of PAINT, TFBS were analyzed within 1,000 base pairs (bp) upstream of the transcriptional start sites (TSS) in the DE gene set. Several array probes correspond to the same gene, and this redundancy must be removed before further analysis. The UniGene cluster identification number (ID) corresponding to each probe set was obtained through cross-reference with GenBank accession number. From the UniGene database, the corresponding Entrez gene ID was obtained and used to cross-reference with Ensembl IDs. The 1,953 DE probe sets mapped to 1,853 genomic locations (Ensembl annotation), of which 1,397 were unique. The TFBS were identified using TransfacPro 9.2 database (35) and associated MATCH software (25) with option for minimizing the sum of false positives and false negatives in the TFBS results. The enrichment analysis was performed for different cluster sets (26, from above), and the DE list was compared with different background reference sets. In the enrichment analysis, the "Factor" corresponding to the TFBS (as indicated by MATCH/TRANSFACPro) was considered instead of the individual TFBS. This mapping allows us to account for the correlations in the position-weight matrices (PWMs) to a certain extent. A total of 264 TFBS families were considered. The enrichment P values based on hypergeometric distribution were adjusted for multiple testing using a FDR estimate (5). At each of the clustering levels, we do account for multiple testing using FDR correction for different TFBS enrichment P values. It should be noted that this FDR estimate may be conservative, as it does not account for the correlations among the TFBS present on different promoters. The correlations could arise because of the biological nature of coordinate action of different transcription factors (TFs) and because of similarity of PWMs of several TFs in the TRANSFAC database. The FDR method used might identify a higher number of TFs as being significant than if all correlations were taken into account. However, this is not fatal, given the limited number of predictions from our discovery approach. In the case of the hypothetical factor X family, all binding sites corresponding to the X family of TFs (FACTORX-1/V$FACTORX-1_02, FACTORX-1/V$FACTORX-1_03, FACTORX-2/V$FACTORX-2_02, FACTORX-2/V$FACTORX2_03) would be grouped into one set. Those binding sites present in
10% of promoters in the particular set are presented.
Isolation of Nuclear Extracts and Assessment of TF:DNA Binding Activity
Cells were harvested, and nuclear extracts were prepared from erythroid cultures on day 1 and day 8 using Panomics Nuclear Extraction kit (Panomics, Fremont, CA). Briefly, 1 x 106 cells were incubated in hypotonic buffer and dispersed, and nuclei were lysed in high-salt buffer. Protein was quantified using the BCA protein assay kit and NanoDrop spectrophotometer. Approximately 75 µg of protein were isolated and stored in the presence of protease inhibitor cocktail. Nuclear extracts from day 1 and day 8 (
7.5 µg) were incubated with the biotin-labeled probe mix from the Panomics Protein/DNA TranSignal comboarray. Protein-bound probes were isolated via streptavidin, protein was removed, and the probes were hybridized to the array and visualized using chemiluminescence detection on X-ray film.
| RESULTS |
|---|
|
|
|---|
2 million on day 5 to
30 million on day 13 (Fig. 2A), and the number of cells containing hemoglobin peaked on day 9 at 8090% (Fig. 2B). Cells contained
4 pg/cell of hemoglobin on day 5, peaking at 58.5 pg/cell between days 9 and 11 (M. A. Keller, unpublished observations). Hb F determination showed that one of the cultures expressed <2% Hb F at all time points, while another expressed 3.2% Hb F at day 9 before declining to 2.7% at day 11 (Fig. 2D). Microscopic examination of stained, cytospun samples showed that the majority of cells were erythroid with condensed nuclei as well as enucleating red blood cells after 11 days (Fig. 2C).
|
|
/
+ß Globin mRNA Levels
-globin mRNA over a range of input amounts and facilitated determination of the percentage of
-globin mRNA (
/
+ß) present in erythroid progenitors during development (data not shown). This approach eliminates the need to calculate amounts of each mRNA relative to an internal control (GAPDH), greatly facilitating determination of
/
+ß mRNA ratios in cultured cells. Transcript analysis from day 5 to day 11 demonstrated that the ratio of
/
+ß is <3% at all times examined in two of the cultures, and in one culture, it is highest on day 5 (12.8%), followed by a decrease to 3% on day 11 (Fig. 3). Similar percentages of Hb F mRNA have been seen by others using in vitro erythroid differentiation of adult-derived HPC (38). These findings demonstrate individual variation in
-globin gene expression that correlates with Hb F expression in this in vitro differentiation system. Thus we have cultured progenitors from adult blood under conditions that minimize expression of Hb F and recapitulate the in vivo state using a single-phase, totally defined, serum-free medium.
|
-globin chain stability (16), is not expressed on day 1, is upregulated >10-fold on day 3, and is highly expressed at later time points as well. The downregulated gene set includes TFs (ELK3, ETS1, EVI2A, GATA-2, KLF12, NMYC, RUNX3 and ZNF521), signaling molecules (FLT3, JAK3, PRKCB1), and structural proteins [CD34, CD109, HLA-A, -B and -C, platelet endothelial cell adhesion molecule-1 (PECAM1)] whose expression has been seen in human peripheral blood CD34+ cells (17). The complete list of DE probe sets with intensities at each time point is available as Supplemental Material (Supplemental Table 2).
|
- and ß-globin genes increasing
15-fold and 10-fold, respectively, over the 11-day time course. Real-time RT-PCR using SYBR Green detection showed larger fold changes for
- (>100-fold) and ß-globin (>1,000-fold) mRNA (data not shown). Differences in the fold change comparing microarray and real-time analysis are not unexpected, given that real-time RT-PCR analysis has a wider dynamic range compared with microarray analysis.
|
100 genes were DE in both data sets (see Supplemental Table 3). Aside from erythroid-specific genes such as blood group antigens, heme biosynthesis enzymes and globins, the list showed several TFs (ATF3, CEBPß, v-maf F, and several zinc-finger proteins) whose expression pattern increased during differentiation in both data sets and a transcriptional regulator (Wilm's tumor suppressor Wt1) whose expression decreased in both data sets. Interestingly, some probe sets that are DE in both data sets show divergent expression patterns, (e.g., upregulated in the BFUe data set and downregulated in the K562 data set), reinforcing well-known distinctions between primary cells and cancer cell lines.
Identification of Coregulated Clusters of DE Genes
After identifying the 1,953 DE probe sets, we analyzed them for shared expression patterns. A clustering metric (silhouette coefficient) (18) was used to examine the divergence from randomly permuted DE data when the DE list was segregated into increasing numbers of clusters, from 2 to 10. The DE probe sets can be divided into a number of clusters that are distinct from randomly permuted DE data. As shown in Fig. 6A, two to six clusters are distinct from randomly permuted DE data. We present a graphic representation of cluster assignments with temporal gene expression patterns in Fig. 6B. Cluster memberships are included in Supplemental Table 2. Although the greatest distance from random, and therefore the greatest confidence, is seen when the DE gene set is divided in two clusters, one up- and one downregulated, it is likely that regulatory information can be gained by further clustering into additional expression patterns. Therefore, transcriptional regulatory network analysis was done at all levels of clustering found to be distinct from randomly permuted data (e.g., 2, 3, 4, 5, and 6). Mean expression data from representative probe sets from each cluster are shown in Fig. 7. At the most basic level of two clusters, the upregulated genes in cluster C2-2 include many genes necessary to make a red blood cell, including the globins, heme biosynthesis enzymes, and membrane and surface proteins. On further clustering of the upregulated genes, they are distributed into three clusters, with C6-2 containing
- and ß-globin, ß-spectrin, and glycophorins-A, -B, and -E; C6-4 containing Duffy blood group antigen and NFE2; and C6-6 containing ankryin, the erythropoietin receptor, and
-globin genes.
|
|
2% of the genes were graphed, either in the unclustered set (DE1953) or at the six-cluster level (Fig. 8). In cluster C6-2, which contains several globin genes, blood group antigens, and glycophorins, there is an increased proportion of carrier and metal ion, protein, and receptor binding functions. It is the only cluster with <2% of gene functions made up of TFs. Cluster C6-3 contains genes with enzyme inhibitor functions, while C6-4 contains both transcription cofactor and ligase activity subsets, and C6-5 contains genes with oxidoreductase function.
|
The unclustered DE gene set was examined for evidence of coregulated gene expression. Seventeen binding sites identified as statistically significantly overrepresented (FDR <35%) in the DE group are summarized in the first column of Table 1 ("x") with P values and binding site enumerations listed in Supplemental Table 4. In clusters where adjusted P values were >35%, the top several binding sites for TFs known to play a role in hematopoiesis are included in this group. Among these are GATA-1, Lmo2 complex, MAZ, and RUNX1 (AML1) as well as cell cycle-specific TFs E2F and CREB-ATF.
|
Of the 17 TFBS identified as enriched in the unclustered list, 5 (E12, CP2, polyA binding, NF-
B, and Sox-5) are enriched in the downregulated clusters (Table 1, downward arrows) starting with the 1,221 promoters in C2-1 through to the 546 promoters in C6-1. Sp3, Nkx2.5, c-Ets-2, SMAD3 IRF, and NRF are newly identified in the downregulated cluster C3-1. The promoters of the other downregulated cluster, C3-3, are enriched for the TFBS of NRF2, GABP, Elk-1, E2F, and v-myb. At the level of six clusters, two TF binding sites (FOXL1 and FOX homolog XFD-2) are newly identified as enriched in the downregulated cluster C6-1.
In examining TFBS identified in upregulated clusters (those indicated by upward arrows, Table 1), we found fewer sites are enriched at the lower levels of clustering. At the two-cluster level, the TFBS for ATF and ATF4 are overrepresented in C2-2, with both sites persisting in the upregulated clusters through to C6-4 (ATF) and both C6-4 and C6-6 (ATF4). At the four-cluster level, upregulated cluster C4-4 shows enrichment for five newly identified factor binding sites: CREB, E2F1, Pax3, Pax6, and PITX2. All continue to be enriched in C5-4. When the clustering reaches six, enrichment of Pax3 sites segregates to C6-4, while Pax6 enrichment is seen in C6-6. Eight TFBS are newly identified at the six-cluster level, overrepresented in C6-4, including Evi-1 and TCF4.
Reference Set Differences
In total, 44 TFBS were identified as enriched (Table 1). While the majority of TFBS enriched in promoters of genes in downregulated clusters (downward arrows in Table 1) were identified in both the array and expressed reference set comparisons ("A, E" in "RefS" column in Table 1), the majority of the TFBS enriched in promoters of genes in upregulated clusters (upward arrows in Table 1) were identified when the reference set was the array or the DE set, not the expressed set. The cluster-to-list analysis, examining enrichment in each cluster compared with the DE set, resulted in 11 TFBS not identified with the other reference sets (those with "C" only in RefS column, Table 1).
Compilation of Candidate Regulators of Up- and Downregulated Genes
A listing of candidate transcriptional regulators whose TFBS were found to be enriched in one or more clusters of up- or downregulated sets is summarized in Table 2. Closely related TFs are found in the same list, for example, several CREB and E2F family members in the upregulated genes, GABP and NRF2 as putative regulators of the downregulated genes, and the RUNX family members as putative regulators of both up- and downregulated genes. This list contains TF families already known to play a role in hematopoietic gene regulation (GATA, Evi-1, RUNX, myb) along with many factors whose role in hematopoietic lineage commitment and differentiation has heretofore been unknown.
|
7.5 µg) and are exposed for equivalent amounts of time (20 min). Binding activity to Evi-1 TFBS is not detectable on day 8, even with a longer exposure to X-ray film (2 h, data not shown). The ability to use nuclear extracts from primary human HPCs to examine TF function will be useful in examining gene regulation during erythroid differentiation. Thus both steady-state mRNA levels and DNA binding activity to Evi-1 TFBS suggest a transcriptional silencer role for this factor in upregulation of some genes upregulated late in culture.
|
| DISCUSSION |
|---|
|
|
|---|
Identification of DE Genes During Erythroid Differentiation
The expression profile study design, using three replicates at each time point, allowed for statistical determination of DE without using a raw signal intensity cutoff to determine differential expression, such that low-abundance mRNAs including those encoding transcriptional regulators and signaling molecules were included in the DE list. Upregulated genes included those known to be critical to red cell development including structural proteins and enzymes specific to the erythroid lineage, such as the globins, blood group antigens and heme biosynthesis enzymes. Downregulated genes included cell cycle regulators and TFs regulating hematopoietic progenitors.
Three genes, P-selectin, PF4, and platelet basic protein, heretofore thought to be platelet specific, were found to be upregulated by microarray, and this upregulation was validated by real-time RT-PCR analysis. There are at least two explanations for this finding. It is possible that a minor fraction of megakaryocyte precursors survive under erythroid culturing conditions. Alternatively, these genes may have an unrecognized role in normal erythropoiesis. Recently, integrin-
IIb (15) and PECAM (4) were found to be expressed in erythroid progenitors. Since the erythroid and megakaryocytic lineages share a common progenitor, these findings may not be unexpected.
Clustering Metric Identifies Levels of Coexpression
Given that clustering of DE genes can facilitate identification of genes sharing common temporal expression patterns, we hypothesized that examination of promoters of genes represented in a given cluster that share TFBS would uncover coregulation by transcriptional regulators. While many bioinformatics packages, including GeneSpring, allow for clustering of DE genes into an arbitrary, user-defined number of clusters, we used a clustering metric to examine the potential number of clusters whose members share an expression pattern distinct from randomly permuted data. In this way, we determined that our DE gene set can be divided into clusters ranging from two to six with different temporal patterns of expression. Further clustering was not distinct from such randomly permuted data. Next, we used a family analysis, examining the enrichment of TFBS, made up of all transcriptional regulatory elements that are bound by a given factor. Enrichment analysis is based on the principle that, if the site or sites for the factor were present in a greater number compared with a reference set, this site may play a role in the regulation of the genes in that cluster containing the site. Obvious caveats include the fact that enrichment of one site may not be sufficient, since multiple interactions may be required involving multiple TFBS. Furthermore, enrichment and functional utilization are not necessarily equivalent, and further experimentation involving gel shifts with TFBS-containing oligonucleotides and knockdown, knockout, and/or chromatin occupancy studies are required to validate functionally the role of enriched TFBS. Enrichment for a TFBS could signify binding of a different factor than the one initially described to bind to this site. Finally, a TFBS highly prevalent in the genome could play a critical role in the process being studied but be missed in this analysis, since its frequency already may be high in the reference lists used for comparison.
Comparison of DE List with Other Studies of Erythroid Development
In addition, the DE gene list was compared with that of Komor et al. (27) in which bone marrow-derived CD34+ cells were differentiated toward the erythroid, granulocyte, or megakaryocyte lineage over an 11-day period. Of the DE probe sets in the erythroid lineage in their study, there were 60 unique UniGene IDs, and of these, 31 are shared with the present study and include many erythroid-specific genes such as ALAS2, GYPA, RHD, and ANK1. The "signature" of the erythroid lineage described in their data set was highlighted by upregulation of two GTPases, RAP1GA1 and ARHGAP8, both upregulated also in our data set, and downregulation of Mina53, GLMN (FAP48), and MAX gene-associated protein (MGA). GLMN is in our downregulated DE set. Probe sets for MGA and MINA show decreased expression over time, but neither were classified as DE after ANOVA. Although there are significant commonalities between the two studies, our study of erythroid differentiation identifies a considerably larger set of DE genes, and, importantly, the culturing conditions used by Komor et al. included FBS, and no characterization of globin expression was presented.
We examined our data set for genes whose gene products recently have been implicated in erythroid development. Annexin 1 (ANXA1) was reported to be critical to erythroid differentiation after showing that its knockdown in K562 cells decreased hemin-induced erythroid differentiation (20). However, in our study, ANXA1 is downregulated 20-fold during erythroid differentiation, suggesting its role may be limited to hemin-induced hemoglobinization of K562 cells. Another study in K562 cells suggested that expression of transglutaminase 2 (TGM2) was important in erythroid differentiation (22), and, in our study, expression of TGM2 is upregulated on arrays 63-fold during erythroid differentiation. Growth arrest and DNA damage-inducible transcript 34 (GADD34), also known as protein phosphatase 1, regulatory subunit 15A (PPP1R15A), was shown to be important to hemoglobin synthesis in a Gadd34-null mouse model (44). Competition was proposed between GADD34 and eukaryotic translation initiation factor-2
(eIF2
) kinase 1, also known as HRI, for regulation of the phosphorylation state of eIF2
, such that GADD34-null mice demonstrate a thalassemic phenotype. In our system, GADD34 is upregulated more than fivefold during erythroid commitment.
Transcriptional Regulatory Network Analysis Yields a List of Candidate Regulators
Using PAINT software, we performed TRNA to identify enriched TFBS in the promoters of DE genes, with or without clustering, and using three different reference sets. A total of 44 TFBS were identified with similar numbers of candidates identified as potential regulators of up- and downregulated genes (see Table 1). Interestingly, 10 factors were identified only at the six-cluster level. The use of different reference sets allowed us to identify regulators of hematopoietic gene expression, such as HLF and KROX, identified when the DE set was compared with the array, as well as regulators of erythroid differentiation, such as MYEF-2 and PITX, identified when clusters were compared with the DE set.
Furthermore, we explored the literature for validation of erythroid gene regulation by some of the candidate factors identified by our enrichment analyses. First, the binding site for the homeobox protein Pitx2 was enriched in upregulated clusters C4-4, C5-4, and C6-4 in the cluster-to-list analysis. Two recent studies of Pitx2-deficient mice show that Pitx2-null fetal livers have a reduced erythroid component, and that there is partial rescue of the hematopoietic potential when stromal cells are rescued with Pitx2 +/+ cells (26, 63). Our finding that Pitx2 binding sites are enriched in the promoters of upregulated genes during erythroid maturation of human hematopoietic progenitors supports an emerging role for Pitx2 and/or Pitx2-like factors.
Second, TFBS for Evi-1 were enriched compared with all three reference sets, consistently overrepresented in cluster C6-4. Evi-1, first identified as a key regulator in myeloid leukemias and myelodysplastic syndromes (9, 40, 43), is also critical to early erythroid differentiation. Aberrant expression of Evi-1 in patients is associated with abnormal erythroid development (6), and overexpression of Evi-1 during in vitro erythroid differentiation results in reduced CFUe (28). Evi-1 is a GATA binding factor, and the GATA-1 consensus site (WGATAR) is contained in the Evi-1 consensus site. Recently, Evi-1 was shown to regulate GATA-2 expression (59, 60). The 76 promoters within C6-4 that contain these sites included CD36, CPOX, and NFE2. A role for Evi-1 or related factors in regulation of expression of these genes is supported by binding assays performed with nuclear extracts from day 1 and day 8 erythroid progenitor cells (Fig. 9, inset). This sets the stage for further examination of the role of Evi-1 in transcriptional silencing of target genes identified in this study as well as its role in erythroid differentiation using this model system.
Third, ATF binding sites were enriched in the upregulated gene clusters compared with the array, and ATF4 binding sites were enriched when the clusters were compared with the DE list. The ATF/CREB TFs are basic leucine-zipper transcriptional regulators. Interestingly, ATF4 deficiency results in severe fetal anemia due to a defect in proliferation of hematopoietic progenitors (34), and, as mentioned earlier, it is known to bind to the promoter and regulate expression of GADD34 (32, 42). As mentioned earlier, GADD34 was upregulated, and, at the six-cluster level, was found in cluster C6-6 where TFBS for ATF4 were overrepresented.
Fourth, TFBS for Nkx2.5 were overrepresented in the downregulated gene clusters, from C2-1 to C6-1, independent of the reference list used. Nkx2.5 is a member of the homeobox family, with a critical role in cardiac development (30). Nkx2.5 is known to interact with a GATA family member, GATA-4, and recently GATA-6 has been shown to be necessary for upregulation of Nkx2.5 (7). A potential role for Nkx2.5 or related factors in hematopoietic development is interesting to postulate. The genes represented in cluster C6-1 showed a significant decline in mRNA expression from day 1 to day 3, suggesting that transcriptional repression must occur early. At day 1, expression of Nkx family members was low or absent in our microarray data set. Nkx transcription may be transient, perhaps measurable at day 0 (CD34+ cells), a population not examined in this analysis. Alternatively, enrichment of Nkx2.5 sites in the downregulated clusters may be indicative of another factor that binds a similar sequence. Functional follow-up will be necessary to address this.
Some Critical Regulators Not Identified Using TRNA
Just as intriguing as the list of candidate regulators (Table 2) are those not identified in our analyses, including a family of regulators critical to hemoglobin gene regulation. Kruppel-like factors, such as erythroid KLF (KLF1), are known to play a role in expression of erythroid genes (12), many of which (HBA1, HBA2, HBB) are expressed in C6-2. We performed a post hoc analysis of binding site enrichment for these factors, which are classified as CACCC binding factors. Although these sites did not emerge as significantly enriched after multiple testing, the CACCC binding site is enriched in C6-2 (unadjusted P value = 0.15) in the cluster-to-list analysis, with 10% of the promoters in this cluster containing one or more of these sites compared with 6% in the unclustered promoter list. When we examined our data set for expression of EKLF target genes identified through the use of an EKLF knockout mouse model (12), heme biosynthesis enzyme ALAS2,
-globin stabilizing protein (ERAF), peroxiredoxin 2 (PRDX2), and the erythroid-specific membrane protein Band 4.9 (EPB49) were contained in our DE list. EKLF target genes EIF2AK1 (HRI) and transferrin receptor (CD71 or TFRC) were upregulated more than twofold in our system but were not statistically significantly DE, as assessed by ANOVA. Another EKLF target gene, Gardos channel (KCNN4), showed very little change in expression in our system but is represented by a single probe set on the U133 Plus 2.0 array.
The TFBS for GATA family members was enriched in the downregulated gene set but absent from the upregulated set (Table 2). The enrichment for GATA sites is complicated by several factors. First, the GATA consensus is common, being present on 57% of the promoters of the genome, 56% of promoters of the expressed set, and 59% of the DE set. Second, GATA is known to play a role in both up- and downregulation of gene expression in hematopoiesis, through interaction with various corepressors and coactivators (48), and therefore would be expected to play a role in both up- and downregulated genes. GATA was found to be enriched in the downregulated clusters in both the comparison to the array and to the expressed gene set. At the six-cluster level, GATA binding sites are enriched in C6-1, present on 62% of promoters in that cluster. Further examination of enrichment of TFBS for GATA in conjunction with binding sites for coregulators, using predictive tools for composite binding site analysis (24, 31), may help elucidate the role of this complex family of regulators in erythroid development. This type of composite TFBS analysis also may be necessary to determine whether the segregation of the adult and fetal globin genes into different clusters at the six-cluster level (adult in C6-2, fetal in C6-6) is due to factors critical to developmental-specific regulation of globin expression.
Improved annotation is a continuous process, and, in that context, our analysis has the same inherent limitations that affect all microarray studies. To estimate annotation discrepancies, we performed a comparison of the NetAffx annotation we employed at the time of original submission with alternative annotation available at http://mriweb.moffitt.usf.edu/mpv/. We found
1.2% discordance at the level of UniGene cluster ID. Considering the FDR threshold used in our analysis of gene clusters for transcriptional coregulation, we do not expect this to substantially affect the enrichment results.
In conclusion, this study of erythroid differentiation in primary, nontransformed cells provides a detailed characterization of the developing erythroid transcriptome. Our bioinformatics tools allowed us to use this transcriptome information to define gene clusters and to examine the promoters of the genes in these clusters for shared transcriptional regulation. We have identified candidate transcriptional regulators, some established, some novel, and some of which are gathering interest based on multiple studies for a potential role in erythroid development. Thus, we have demonstrated that a computational analysis of transcriptomic information during erythroid development is a valid approach for identification of candidate regulators in hematopoietic lineage commitment and red cell development, and have generated a focused list of candidates that will be the subject of future functional studies. Furthermore, we have demonstrated that candidate transcriptional regulators identified by this analysis can be studied through TF binding analysis in primary human HPCs.
| GRANTS |
|---|
|
|
|---|
| DISCLOSURES |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|
Present address of M. A. Keller: Coriell Institute for Medical Research, 403 Haddon Ave., Camden, NJ 08103-1505.
| FOOTNOTES |
|---|
Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).
| REFERENCES |
|---|
|
|
|---|