The propensity for developing atherosclerosis is dependent on underlying genetic risk and varies as a function of age and exposure to environmental risk factors. Employing three mouse models with different disease susceptibility, two diets, and a longitudinal experimental design, it was possible to manipulate each of these factors to focus analysis on genes most likely to have a specific disease-related function. To identify differences in longitudinal gene expression patterns of atherosclerosis, we have developed and employed a statistical algorithm that relies on generalized regression and permutation analysis. Comprehensive annotation of the array with ontology and pathway terms has allowed rigorous identification of molecular and biological processes that underlie disease pathophysiology. The repertoire of atherosclerosis-related immunomodulatory genes has been extended, and additional fundamental pathways have been identified. This highly disease-specific group of mouse genes was combined with an extensive human coronary artery data set to identify a shared group of genes differentially regulated among atherosclerotic tissues from different species and different vascular beds. A small core subset of these differentially regulated genes was sufficient to accurately classify various stages of the disease in mouse. The same gene subset was also found to accurately classify human coronary lesion severity. In addition, this classifier gene set was able to distinguish with high accuracy atherectomy specimens from native coronary artery disease vs. those collected from in-stent restenosis lesions, thus identifying molecular differences between these two processes. These studies significantly focus efforts aimed at identifying central gene regulatory pathways that mediate atherosclerotic disease, and the identification of classification gene sets offers unique insights into potential diagnostic and therapeutic strategies in atherosclerotic disease.
- vascular disease
atherosclerosis is the primary cause of heart disease and stroke and thus the most common cause of morbidity and mortality worldwide (40b, 40c). The prolonged, chronic, and unpredictable nature of the disease in humans, which is a function of both genetic and environmental factors, has prohibited systematic temporal gene expression studies in humans. However, mouse genetic models of atherosclerosis do allow systematic analysis of gene expression and provide a good representation of the human disease process (9). Apolipoprotein (Apo)E-deficient mice predictably develop spontaneous atherosclerotic plaques with numerous features similar to human lesions (39, 40, 52). On a high-fat diet, the rate and extent of progression of lesions are accelerated. In addition to environmental influences such as diet, the genetic background of mice has also been found to have an important role in disease development and progression. Whereas C57Bl/6 (C57) mice are susceptible to developing atherosclerosis, the C3H/HeJ (C3H) strain of mice is resistant (25, 56, 63). It is imperative, therefore, to consider the genetic differences as well as environmental influences when studying atherosclerotic plaque-specific gene expression.
It is likely that the various stages of atherosclerosis are governed by a set of genes that are expressed by a variety of cell types present in the vessel wall (55). A small number of genes that are differentially expressed in vascular disease have been identified and a few of these genes linked through mechanistic studies to disease processes (9, 22, 36). Recent efforts to identify disease-related gene expression patterns have employed transcriptional profiling with DNA microarrays; however, these studies have included relatively small arrays (67) as well as limited time points, with the primary comparison being between normal and late-stage diseased tissue (1, 18, 37, 50, 55, 70). With the utilization of microarrays in animal models, where a disease process can be studied over time, the impact of individual risk factors and perturbations on the expression of individual genes during disease development can be studied systematically without a priori knowledge of gene identity. The temporal expression patterns of the genes can then be correlated with the well-described disease stages. A recent transcriptional profiling study of apoE-deficient mice suggested that temporal analysis would allow greater sensitivity for the identification of disease-related genes (67). Transcriptional profiling of the entire aorta provides the benefit of studying complex interactions between multiple cell types and matrix during the time course of the disease development. Pathway enrichment methodologies used for analysis of the genes identified by this approach provide strong biological and analytical evidence for involvement of particular biological processes and molecular functions in the development and progression of atherosclerosis. Obviously, these whole tissue studies will have to be followed up with studies at the individual gene level to better understand the mechanisms and implications of their differential regulation.
To more fully characterize the vascular wall gene expression patterns that are associated with atherosclerosis, we have undertaken a systematic large-scale transcriptional profiling study that takes advantage of a longitudinal experimental design, and mouse genetic model and diet combinations that provide varying susceptibility to atherosclerosis. Previously, we have demonstrated the genetic-based diet- and age-induced transcriptional differences between these two strains (63). Here we turn our focus to study atherosclerosis-associated genes independently of other variables. Primarily, these studies have investigated differential gene expression over time in apoE-deficient mice on an atherogenic diet, with comparison to apoE-deficient mice (C57BL/6J-Apoetm1Unc) on a normal diet as well as C57 and C3H mice on both normal chow and atherogenic diet. Identification of atherosclerosis-associated genes was facilitated by development of permutation-based statistical tools for microarray analysis, which takes advantage of the statistical power of time-course experimental design and multiple biological and technical replicates. Using these tools, we have identified hundreds of known and novel genes that are involved in all stages of atherosclerotic plaque, from fatty streak to end-stage lesions. To further examine the expression of individual genes in the context of particular biological or molecular pathways, we utilized pathway enrichment analysis with Gene Ontology (GO) terms for functional annotation. Using classification algorithms, we were able to identify a signature pattern of expression for a core group of mouse atherosclerosis genes and to validate the significance of these classifier genes with additional mouse and human atherosclerosis samples. These studies have identified atherosclerosis-related genes and molecular pathways.
Atherosclerotic Lesion Analysis
For select time points of various experimental groups, five to seven female mice were used for histological lesion analysis. Atherosclerosis lesion area was determined as described previously (63). Briefly, the arterial tree was perfused with PBS (pH 7.3) and then perfusion fixed with phosphate-buffered paraformaldehyde (3%, pH 7.3). The heart and full length of the aorta-to-iliac bifurcation was exposed and dissected carefully from any surrounding tissues. Aortas were then opened along the ventral midline and dissected free of the animal and pinned out flat, intimal side up, onto black wax. Aortic images were captured with a Polaroid digital camera (DMC1) mounted on a Leica MZ6 stereomicroscope and analyzed using Fovea Pro (Reindeer Graphics, Asheville, NC). Percent lesion area was calculated as total lesion area divided by total surface area.
Experimental Design, RNA Preparation, and Hybridization to Microarrays
All experiments were performed in accordance with Stanford University Animal Care Guidelines (53); protocols were approved by the Stanford University Institutional Review Board. Three-week-old female apoE knockout mice (C57BL/6J-Apoetm1Unc), C57Bl/6J, and C3H mice were purchased from Jackson Laboratories (Bar Harbor, ME). At 4 wk of age, the mice were either continued on normal chow or were fed a high-fat diet that included 21% anhydrous milkfat and 0.15% cholesterol (Dyets no. 101511; Dyets, Bethlehem, PA) for a maximum period of 40 wk. At each of the time points, including 0 (baseline), 4, 10, 24 and 40 wk, for each of the conditions (strain-diet combination), 15 mice (3 pools of 5) were harvested for RNA isolation for a total of 405 mice. Additional mice were used for histology for quantification of atherosclerotic lesions as described above. A separate cohort of 16-wk-old apoE-deficient mice on high-fat diet for 2 wk (4 pools of 3 aortas) was also used for classification purposes. After perfusion of mice with saline, the aortas were carefully dissected in their entirety from the aortic root to the common iliacs and subsequently were flash frozen in liquid nitrogen. Total RNA was isolated as described previously (62), using a modified two-step purification protocol. RNA integrity was also assessed using the Agilent 2100 Bioanalyzer System with RNA 6000 Pico LabChip Kit (Agilent). First-strand cDNA was synthesized from 10 μg of total RNA from each pool and from whole embryonic day 17.5 embryo for reference RNA in the presence of Cy5 or Cy3 dCTP, respectively. Hybridization to the mouse 60-mer oligo microarray (G4120A; Agilent Technologies, Palo Alto, CA) (11) was performed following the manufacturer's instructions, generating three biological replicates for each of the time points. The RNA from the group of 16-wk-old mice was linearly amplified and hybridized to a different array (G4121A, Agilent Technologies). Primers and probes for 10 representative differentially expressed genes were obtained from Applied Biosystems Assays-on-Demand. A total of 90 reactions including triplicate assays on 3 pools of 5 aortas were performed from representative RNA samples used for microarray experiments, demonstrating a high correlation between the two platforms (Pearson correlation of 0.82) (see Supplemental Fig. SC; available at the Physiological Genomics web site). 1
Image acquisition of the mouse oligo microarrays was performed on an Agilent G2565AA Microarray Scanner System, and feature Extraction was performed with Agilent Feature extraction software (version A.6.1.1, Agilent Technologies). Normalization was carried out using a LOWESS algorithm. Dye-normalized signals of Cy3 and Cy5 channels were used in calculating log ratios. Features with a reference value of <2.5 standard deviation for the negative control features were regarded as missing values. Those features with values in at least two-thirds of the experiments and present in at least one of the replicates were retained for further analysis. Reproducibility of microarray results, as measured by the variation between arrays for signal intensities, was assessed using box plots (GeneData, South San Francisco, CA). For further statistical analysis of the data, a K-nearest-neighbor (KNN) algorithm was applied to impute missing values (64). Numerical raw data were then migrated into an Oracle relational database (CoBi) that has been designed specifically for microarray data analysis (GeneData). Heat maps were generated using HeatMap Builder software (5). All microarray data were submitted to the National Center for Biotechnology Information's Gene Expression Omnibus (GEO; GSE1560, http://www.ncbi.nlm.nih.gov/geo/).
Principal components analysis.
For each gene, we computed the average log expression values at the four postbaseline observation times, 4, 10, 24, and 40 wk. This was done separately for the six different (diet, strain) combinations, for example, ApoE on high fat, presumably the most atherogenic combination. Differences of these vectors were taken for various interesting contrasts, e.g., for ApoE, high-fat, minus C3H, normal chow, giving N = 20,280 vectors of length 4, 1 for each gene. Principal components analysis of the N vectors showed a consistent pattern, with the first principal vector indicating a roughly linear increase with observation time.
Time-course regression analysis.
A standard ANACOVA model was fit separately to the log expression values for each gene, using a model incorporating strain, diet, and time-period effects. A single important “z-value” was extracted from each ANACOVA analysis, for example, corresponding to the significance of the time-slope difference between the ApoE-high fat combination and the average of the other five combinations. The N z-values were then analyzed simultaneously, using empirical Bayes false discovery rate methods, described previously (15–17). These analyses identified a set of several hundred genes clearly associated with atherosclerosis progression. Detailed methods are described online (http://quertermous.stanford.edu/mouseatherosignature/home.htm).
Time-course area under the curve analysis.
Area under the curve (AUC) analysis was employed as described previously (63). For each sequence of four triplicate gene expression measurements over time, we first subtracted the measurement at time 0 from all values. We then computed the signed AUC. The area is a natural measure of change over time. These areas were then used to compute an F-statistic for the six groups (3 mouse strains and 2 diets) and three replicates (between sum of squares/within sum of squares). A permutation analysis, similar to that employed in significance analysis of microarrays (SAM) (65), was carried out to estimate the false discovery rate (q-value or FDR) for different levels of the F-statistic.
For enrichment analysis, we used the Expressionist software (GeneData,) which employs the Fishers exact test to derive biological themes within particular gene sets defined by functional annotation with GO terms (20) and Biocarta pathways (4). In this way, overrepresentation of a particular annotation term corresponding to a group of genes was quantified.
Support vector machine for gene selection.
For supervised analyses, we used the Expressionist software (GeneData), which employs support vector machine (SVM) algorithm (10) to rank genes based on their utility for class discrimination between time points of 0, 4, 10, 24, and 40 wk in apoE mice on a high-fat diet. SVM is a binary classifier, so to classify multiple categories, we created N classifiers that classify one group vs. a combination of the rest of the groups, “one-vs.-all” classifiers (49). The larger set of genes identified by the time-course analysis was used for this analysis. We then used this method to determine the optimal number of ranked genes to classify the experiments into their correct groups at minimal error rate. The optimal error rate or misclassification is calculated by cross validation with 25% of the experiments as the test group and the rest as the training group. This is reiterated 1,000 times (see Fig. 4A). In this study we used a linear Kernel, since a nonlinear Gaussian kernel yielded similar results. Detailed methods are described online (http://quertermous.stanford.edu/mouseatherosignature/home.htm). This minimal subset of classifier genes was then used for cross validation as well as classification of other independent gene expression-profiling data sets.
Analysis of independent data sets.
We utilized the SVM algorithm for classification of independent groups of experiments (69). In this analysis, we used the primary time-course experiments (corresponding to the 5 time points mentioned above) as the training set and the independent set of experiments (different array and labeling methodology) as the test set. SVM output for each experiment based on one-vs.-all comparisons were represented graphically in a heatmap format (see Fig. 4B), which is the normalized margin value for each of the five SVM classifiers mentioned above. The SVM output allows us to view how a new experiment is classified according to the five SVM hyperplanes. The SVM algorithm (linear Kernel) was also utilized for external validation by classifying different sets of human expression data. In these analyses, a confusion matrix was generated using cross validation with repeated splits into 75% training and 25% test sets to determine the accuracy of classification based on the small subset of genes identified earlier. Results are represented in tabular fashion (see Table 2). Detailed methods are described online (http://quertermous.stanford.edu/mouseatherosignature/home.htm).
Transcriptional Profiling of Human Atherosclerotic Tissue and Atherectomy Samples
Approval to use human tissue samples was granted by the Institutional Review Board of Stanford University. Informed consent was obtained from all participants. For one set of samples, coronary arteries were dissected from explanted hearts of patients undergoing orthotopic heart transplantation. Arteries were divided into 1.5-cm segments, classified as lesion or nonlesion after inspection of the luminal surface under a dissecting microscope. RNA was isolated from each individual sample and hybridized to an individual microarray. A central portion (1–2 mm) of each segment was removed and stored in optimum cutting temperature compound for later histological staining (hematoxylin and eosin, Masson's trichrome). Samples (n = 40) were derived from 17 patients (male 13, female 4; mean age 43 yr). Six patients had a diagnosis of ischemic cardiomyopathy, while eleven were classified as nonischemic, although some vessel segments from the latter had microscopic evidence of coronary artery disease. Of 21 diseased segments, seven were classified as grade I, four as grade III, and nine as grade V according to the modified American Heart Association criteria (66), and one sample had only macroscopic information available. For the second set of tissues, coronary atherectomy samples were obtained with a cutting atherectomy catheter system (Fox Hollow, Redwood City, CA) for chronic atherosclerosis lesions (n = 28) and in-stent restenosis lesions (n = 14). Patient characteristics in both groups were similar (male 78 vs. 71%, mean age 64 vs. 67 yr). RNA was isolated from each individual sample, labeled by direct or linear amplification methods, and hybridized as described above to a 22k feature custom cardiovascular oligonucleotide microarray designed in this laboratory and Agilent Technologies (G2509A, Agilent). Common reference RNA for all human hybridizations was a mixture of 80% HeLa cell RNA and 20% human umbilical vein endothelial cell RNA. Data processing and analysis were as described above. For two-class comparison of gene expression, we used SAM (http://www-stat.stanford.edu/~tibs/SAM/) (62, 65).
RESULTS AND DISCUSSION
Atherosclerosis in the Genetic Models
To correlate the gene expression results with the extent of disease in each experimental group, the total atherosclerotic plaque burden in the aorta was determined by calculating a percent lesion area from the ratio of atherosclerotic area to total surface area. ApoE-deficient mice (C57BL/6J-Apoetm1Unc; n = 7) on high-fat diet were compared with other control mice (n = 5–7 for each mouse-diet combination). Representative time intervals were used for analysis, including baseline measurements in mice before initiation of high-fat diet at 4 wk as well as end-point measurements corresponding to 40 wk on either high-fat or normal diet (Fig. 1). Gross histological evaluation of these mice demonstrated increased atherosclerotic lesions in ApoE-deficient mice on a high-fat diet, involving ∼50% of the entire aorta, and lesser area involved in ApoE-deficient mice on normal diet (Fig. 1, B and C). As expected, the control mice on either diet did not demonstrate evidence of atherosclerosis throughout the course of the experiment (30, 41). Although some fatty infiltrates were noted on histological evaluation of the aortic root in C57 mice on high-fat diet, there were no obvious changes in inflammatory cell infiltrate (63). The metabolic and lipid profiles of these mice were not obtained in this study, given that they are well described in the literature (25, 41, 42). These studies have shown that C3H mice have consistently elevated cholesterol compared with C57 mice.
Temporal Patterns of Gene Expression
Employing a number of mouse models with different propensities to develop atherosclerosis, two different diets, and a longitudinal experimental design, it was possible to factor out differentially regulated genes that are unlikely to be related to the vascular disease process in the apoE-deficient model. For instance, age-related and diet-related gene expression patterns that are not linked to vascular disease were eliminated by virtue of their expression in the genetic models that did not develop atherosclerosis. However, the complexity of the experimental design provided significant difficulties related to statistical analysis. Although analytic methods have been proposed to address a single set of time-course microarray data (35, 45, 46, 68), there was no accepted algorithm for comparing differences in patterns of gene expression across multiple longitudinal data sets.
Using principle component analysis (Supplemental Fig. SA), we determined that the greatest variation in the data was between time points, correlating with the progression of disease described previously for the apoE knockout mouse on high-fat diet (39, 52). Given this finding, we pursued a linear regression model to identify genes that were differentially expressed in ApoE-deficient mice on high-fat diet compared with all other experimental groups across time. (For details, please see Supplemental Methods.) This comparison across strains and dietary groups was employed to focus the analysis on atherosclerosis-specific genes, taking into account gene expression changes in the vessel wall associated with aging, diet, and genetic background. We employed empirical Bayes and permutation methods to derive an FDR and minimize false detection due to multiple-testing error. With high stringency limits, global FDR < 0.05 and local FDR < 0.3, and 667 genes demonstrated a linear increase with time, whereas only 64 genes showed the opposite profile (Fig. 2). Genes and biological pathways whose expression is correlated with degree of atherosclerosis are prime candidates for further scientific investigation as well as important targets for future diagnostic and therapeutic strategies.
Genes with Increased Expression in the Atherosclerotic Vessel Wall
As expected, we identified a number of known genes previously linked to atherosclerosis, validating the methodology and analysis algorithm. Most striking in this regard were inflammatory genes, including chemokines and chemokine receptors such as Ccl2, Ccl9, CCr2, CCr5, Cklfsf7, Cxcl1, Cxcl12, Cxcl16, and Cxcr4 (Fig. 2, Supplemental Table SA). Also upregulated were interleukin receptor genes, including IL1r, IL2rg, IL4ra, IL7r,IL10ra,IL13ra, and IL15ra, and major histocompatibility complex (MHC) molecules such as H2-EB1 and H2-Ab. The value of transcriptional profiling in this disease was demonstrated by the identification of numerous inflammatory genes not previously linked to atherosclerosis, including CD38, Fcer1g, and oncostatin M (Osm) and its receptor (Osmr), which provide novel insights into the atherosclerosis process and are potential targets for therapeutic intervention.
Oncostatin M (Osm) and its cognate receptor (Osmr) are likely to have significant roles in atherosclerosis, based on a number of studies that suggest several important related functions for these genes (38). Osm is a member of a cytokine family that regulates production of other cytokines by endothelial cells, includingIl6, G-CSF, and GM-CSF. Osm also induces Mmp3 and Timp3 gene expression via JAK/STAT signaling (34). It induces cyclooxygenase-2 expression in human vascular smooth muscle cells (3) as well as Abca1 in HepG2 cells (32). Interestingly, Stat1, Jak3, Cox2, and Abca1 were among the disease-associated upregulated genes. Additionally, Osm produced by macrophages may contribute to the development of vascular calcification (57). This may occur via regulation of osteopontin or osteoprotegerin (44), both of which have demonstrated significant changes in our data set. These observations suggest that vascular calcification may play an even more complex role in vascular inflammation and atherosclerosis than previously recognized, and should be investigated in more detail. Recent studies have suggested modulation of vascular calcification with statin drugs (31, 48), offering evidence linking hyperlipidemia, inflammation, and vascular calcification.
Osteopontin (Spp1) is thought to mediate type-1 immune responses (2). While Spp1 has been extensively studied in atherosclerosis and other immune diseases, some of the osteopontin-related genes identified through these studies are novel and provide additional links between inflammation and calcification. Some of these include Cd44, Hgf, osteoprotegerin, Mglap, Il10ra, Infgr, Runx2, and Ccnd1. Ibsp (sialoprotein II) was also noted to be upregulated in these studies. Despite its similar expression profile to Spp1 in various cancer types and its binding to the same α-v/β-3 integrin, the role of Ibsp in atherosclerosis has not been elucidated.
Known and novel genes were identified for many other protein classes that have been studied in atherosclerosis. Genes encoding endothelial cell adhesion molecules were among these groups, including Alcam and Vcam1. Extracellular matrix and matrix remodeling proteins were found to be upregulated, including fibronectin, Col8a1, Ibsp, Igsf4, Itga6, and thrombospondin-1. Matrix metalloproteinase genes such as gelatinases and Mmp2 and Mmp14 as well as those encoding tissue inhibitors of metalloproteinases, including Timp1, were also among the upregulated genes. Many transcription factors and lipid metabolism and vascular calcification genes, as well as macrophage and smooth muscle cell-specific genes, were among those found to be upregulated. New genes were identified in each of these classes; for example, members of the ATP-binding-cassette family not previously associated with atherosclerosis were identified through these studies, including Abcc3 and Abcb1b.
Interesting genes linked to atherosclerosis for the first time through these studies encode a variety of functional classes of proteins. For example, genes encoding transcription factors Runx2 and Runx3 were linked to atherosclerosis in these studies and provide for novel findings. Cytoplasmic signaling molecules Vav1, Hras1, andKras2 are factors that are well known to have critical signaling functions, but their role in atherosclerosis has not yet been defined. Wisp1 is a secreted wnt-stimulated cysteine-rich protein that is a member of a family of factors with oncogenic and angiogenic activity.Rgs10 is a member of a family of cytoplasmic factors that regulate signaling through Toll-like receptors and chemokine receptors in immune cells. Among the new classes of genes identified through these studies to be upregulated in atherosclerosis were those encoding histone deacetylases (HDACs). Among those genes identified were Hdac7 and Hdac2. Although there is significant evidence that HDACs have important functions regulating growth, differentiation and inflammation, these molecules have not been well studied in the context of atherosclerosis (14, 28). HDAC inhibitors have been postulated to modulate inflammatory responses (61).
Our data has also yielded numerous expressed sequence tags (ESTs) and uncharacterized genes. These genes may be attractive candidates for further characterization. One example of such an EST is 2510004L01Rik, a gene termed “viral hemorrhagic septicemia virus-induced gene” (VHSV) originally cloned from interferon-stimulated macrophages. This gene is enriched in bone marrow macrophages, is upregulated by cytomegalovirus infection and is similar to human inflammatory response protein-6 (12). Several ESTs such as 5930412E23Rik and 2700094L05Rik have been cloned from hematopoietic stems cells (60), consistent with data suggesting cells in the diseased vessel wall may emanate from the bone marrow (51).
Genes with Decreased Expression in the Atherosclerotic Vessel Wall
The 64 genes that showed decreased expression during progression of atherosclerosis were of interest, given the lack of previous attention to such genes. Genes that are downregulated in atherosclerosis may have a potential protective role and may provide for potential therapeutic strategies. Sparcl1 (Hevin) is an extracellular matrix protein that is downregulated in our data set and may have anti-adhesive (21) and anti-proliferative (13) properties. It has been shown to be downregulated in neointimal formation and suggested to have a possible protective effect in the vessel wall (19). Another gene with decreased expression, Tgfb3, may also have a protective effect. Tgfb3 has been shown to decrease scar formation and to exert an inhibitory effect on granulocyte colony-stimulating factor (G-CSF), suggesting an anti-inflammatory role that would counter proinflammatory factors in the vascular wall (27, 29).
Interestingly, numerous genes characteristic of various muscle lineages were shown to be downregulated. For smooth muscle cells, this might reflect decreased expression of differentiation markers. For example, the smooth muscle cell gene caldesmon encodes a marker of differentiated smooth muscle cells (58), and previous studies have noted that the population of differentiated contractile smooth muscle cells that express caldesmon is relatively lower in atherosclerotic plaque (23). Other potential smooth muscle cell marker genes with decreased expression included Csrp1 and Mylk. Other downregulated skeletal and cardiac muscle genes included calsequesterin, which is expressed in fast-twitch skeletal muscle;Usmg4, which is upregulated during skeletal muscle growth; Xin, which is involved in cardiac and skeletal muscle development; and Sgcg, which is strongly expressed in skeletal and heart muscle as well as proliferating myoblasts. The possible association of these and other myocyte/smooth muscle-related genes identified in our study to normal vascular function is not known, but deserves further study.
To identify important biological themes represented by genes differentially expressed in the atherosclerotic lesions, we functionally annotated the genes using GO terms (20) and curated pathway information. Enrichment analysis with the Fishers exact test demonstrated several statistically significant ontologies (Table 1, Supplemental Table SB), including several associated with inflammation. Inflammatory processes such as immune response, chemotaxis, defense response, antigen processing, and inflammatory response as well as molecular functions such as interleukin receptor activity, cytokine activity, cytokine binding, chemokine and chemokine receptor activity, Tnf-receptor, and MHC I and II receptor activity were noted to be significantly overrepresented in the group of genes upregulated with atherosclerosis. Subanalysis of the inflammatory response pathways revealed genes characteristic of the macrophage lineage as well as both the TH-1 and TH-2 T-cell populations to be overrepresented. Biocarta terms further delineated novel genes that were associated with pathways within the inflammation category, including classical complement, Rac-CyclinD, Egf, and Mrp pathways, as well as those known to be differentially regulated in atherosclerosis, such as Il2, Il7, Il22, Cxcr4, CCr3, Ccr5, Fcer1, and Infg pathways.
In addition to inflammation, other biological processes and molecular functions were overrepresented in the group of differentially upregulated genes. These included expected pathways such as wound healing, ossification, proteo- and peptidolysis, apoptosis, nitric oxide-mediated signal transduction, cell adhesion and migration, and scavenger receptor activity. However, several pathways that are less known for their role in atherosclerosis were also identified, including carbohydrate metabolism, complement activation, calcium ion homeostasis, collagen catabolism, glycosyl bonds and hydrolase activity, taurine transporter activity, heparin activity, and so forth. The lack of oxygen radical metabolism among the significant processes was surprising but consistent with upregulation of genes related to oxygen radical metabolism in all groups with aging.
Taken together, these pathway analyses support prior observations regarding the importance of inflammatory molecular pathways in atherosclerosis but additionally expand the repertoire of molecular pathways that are involved in this disease process. Such analyses can also be used to shed light on molecular mechanisms of action of current and future atherosclerosis therapies.
Identification of Genes Involved in Early Atherogenesis and Other Time-Related Expression Patterns in Atherosclerosis
The above analysis has examined in detail those genes with increasing expression levels that correlate with atherosclerotic plaque development. However, additional patterns of gene expression may also be identified in these longitudinal studies and may identify classes of genes and pathways not previously identified. Almost all previous studies have looked at late stages of atherosclerosis. Because of our longitudinal design, we have also been able to identify genes that are transiently regulated at early stages of the disease which could act as early diagnostic markers of disease. For these analyses, we have employed the AUC algorithm, which measured expression changes over time, made comparisons between the different strain/diet longitudinal data sets to identify gene expression changes specific for the apoE knockout model on high-fat diet, and employed permutation to estimate the FDR (63). Using this methodology, we were able to identify several distinct gene expression patterns and pathways that reflect particular biological processes (Fig. 3, Supplemental Fig. SB, Supplemental Table SD). For instance, some disease-related pathways were upregulated very early in the disease process and downregulated thereafter (pattern 6). Others were upregulated early and maintained at a relatively high expression throughout the time course of the disease (pattern 8). Whereas pattern 6 is enriched in pathways representing biological processes such as extracellular matrix and collagen metabolism as well as DNA replication and response to stress, pattern 8 is enriched in pathways representing biological processes such as fatty acid metabolism, oxidoreductase activity, and heat-shock protein activity. Some disease-related pathways were upregulated in both early and late phases of disease development (pattern 3), including several associated with metabolism such as glycolysis and gluconeogenesis. Other patterns (pattern 4) are represented by key pathways regulating plaque development, including growth factor, cytokine, and cell adhesion activity. Interestingly, inflammation is represented in almost all of the patterns described here. Further study of individual genes in these and other pathways with unique temporal patterns of expression should expand our molecular understanding of the disease process.
Identification of Stage-Specific Gene Expression Signature Patterns
Classification approaches to human cancer have provided significant insights regarding the clinical features of the tumor, including propensity to metastasis, drug responsiveness, and long-term prognosis (24, 33, 43, 59). For atherosclerosis, the clinical utility of classification algorithms will be in prediction of future events. To establish a panel of genes whose expression in the vessel wall can accurately classify disease stage, and which may thus be useful for clinical genomic and biomarker applications, we have employed the SVM algorithm on this comprehensive mouse model disease data set. Employing the SVM classification algorithm, we were able to identify 38 genes that were able to accurately classify each experiment with one of five defined stages of atherosclerosis in mice (Fig. 4A). Our results demonstrated that these genes can distinguish normal from severe lesions with 100% accuracy. The intermediate stages of the disease are also distinguished from the other stages with a high degree of accuracy (88–97%) (Table 2).
The validity of a set of classifier genes can be demonstrated by its ability to correctly classify independent tissues analyzed using different arrays and techniques. We therefore investigated the ability of the gene identified by SVM to accurately categorize an independent group of 16-wk-old apoE knockout mice, which were evaluated with a different array and labeling methodology. In the later case, the microarray utilized different probes for some of the same genes. Moreover, the labeling methodology for these arrays uses a linear amplification step. Using the SVM classification algorithm, we were able to accurately classify each of the four replicate experiments with the correct stage of the disease process (Fig. 4B). As indicated by the greater correlation between gene expression in this independent group of mice and gene expression patterns in the original experimental group aged 24 wk, the classifier genes accurately matched this validation data set to the closest time point in the database. This approach offers a potentially novel platform for drug development where unique signature patterns of gene expression can be used to assay the efficacy of potential therapeutic compounds in altering the progression of atherosclerosis.
Identification of Mouse Disease Gene Expression Patterns in Human Coronary Atherosclerosis
The goal of these mouse model studies was to gain information that could be applied to an understanding of human coronary artery disease. However, the differences in species, the disparate vascular beds involved, and the specific lipid-driven nature of disease in the apoE knockout make direct extrapolation of individual gene expression data to human disease problematic. Thus we investigated the expression profile of differentially regulated mouse genes in human coronary artery atherosclerosis. For transcriptional profiling of human atherosclerotic plaque, we used 40 coronary artery samples dissected from explanted hearts of 17 patients undergoing orthotopic heart transplantation. Of the 21 diseased segments, lesions ranged in severity from grade I to V (modified American Heart Association criteria based on morphological description, Ref. 66). For the purpose of this analysis, human artery segments were classified as nonlesion or lesion (combined all grades). Atherosclerosis-related mouse genes were matched to human orthologs by gene symbol or by known homology (40a). Comparison of expression of the mouse genes between lesion and nonlesion human samples using the SAM algorithm (FDR < 0.025) revealed >100 mouse genes with higher expression in the diseased human tissue (Fig. 5, Supplemental Table SC). Given the differences between the tissue samples used in these gene expression experiments, these are highly likely to constitute an important common set of disease-relevant genes.
To further test the relevance of our findings in mouse atherosclerosis, we studied the accuracy of the mouse classifier genes in human atherosclerotic disease, employing established statistical methods. We first used the mouse classifier genes to predict various stages of coronary artery disease in the human arterial samples. Our results demonstrated a high degree of accuracy in predicting atherosclerotic disease severity (71.2–84.7% accuracy) (Table 2).
Additionally, we used the mouse classifiers genes to categorize human atherectomy tissue obtained from coronary vessels treated for chronic atherosclerosis or in-stent restenosis. The pathophysiological basis of restenosis is quite distinct from that of chronic coronary atherosclerosis, and it was of interest to demonstrate that our classifier genes could distinguish the disease processes (47). Our results (Table 2) demonstrated significant accuracy in distinguishing the two types of lesions (85.4–93.7% accuracy), further validating the significance of the mouse atherosclerosis gene expression patterns in human disease. The greater accuracy of classification with these samples compared with the arterial segments likely reflects less variation in the clinical profile of the patients, who had much less complex medication and fewer comorbid features than the precardiac transplant patients in the above analysis.
In summary, this study is the most comprehensive investigation of gene expression patterns in atherosclerosis to date. Every effort has been made to focus the identification of differentially regulated genes that are associated specifically with the disease process, and we have developed new experimental design and analysis tools to accomplish this goal. In addition to extensive, rigorous, descriptive characterization of differentially regulated genes and their associated pathways, we have expanded the significance of this work through correlation to extensive human microarray data sets. Most significantly, we have identified genes that are capable of distinguishing the grade of human coronary disease and classifying atherectomy samples. Such gene signatures will have application in future human genetic epidemiology studies and disease monitoring and drug assessment efforts.
This work was supported by the Donald W. Reynolds Cardiovascular Clinical Research Center at Stanford University.
We thank Arnold Liao, Jürgen Cox, David Deng, Nick Sampas, and Peng Zhang for expert assistance.
↵1 The Supplemental Material for this article (Supplemental Figs. SA–SC, Supplemental Tables SA–SD, and Supplemental Methods) is available online at http://physiolgenomics.physiology.org/cgi/content/full/00001.2005/DC1.
Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).
Address for reprint requests and other correspondence: R. Tabibiazar or T. Quertermous, Stanford Medical School, Division of Cardiovascular Medicine, 300 Pasteur Drive, Falk CVRC, Stanford, CA 94305 (e-mails: firstname.lastname@example.org or email@example.com).
- Copyright © 2005 the American Physiological Society