Serum inflammatory markers correlate with outcome and response to therapy in subjects with cardiovascular disease. However, current individual markers lack specificity for the diagnosis of coronary artery disease (CAD). We hypothesize that a multimarker proteomic approach measuring serum levels of vascular derived inflammatory biomarkers could reveal a “signature of disease” that can serve as a highly accurate method to assess for the presence of coronary atherosclerosis. We simultaneously measured serum levels of seven chemokines [CXCL10 (IP-10), CCL11 (eotaxin), CCL3 (MIP1α), CCL2 (MCP1), CCL8 (MCP2), CCL7 (MCP3), and CCL13 (MCP4)] in 48 subjects with clinically significant CAD (“cases”) and 44 controls from the ADVANCE Study. We applied three classification algorithms to identify the combination of variables that would best predict case-control status and assessed the diagnostic performance of these models with receiver operating characteristic (ROC) curves. The serum levels of six chemokines were significantly higher in cases compared with controls (P < 0.05). All three classification algorithms entered three chemokines in their final model, and only logistic regression selected clinical variables. Logistic regression produced the highest ROC of the three algorithms (AUC = 0.95; SE = 0.03), which was markedly better than the AUC for the logistic regression model of traditional risk factors of CAD without (AUC = 0.67; SE = 0.06) or with CRP (AUC = 0.68; SE = 0.06). A combination of serum levels of multiple chemokines identifies subjects with clinically significant atherosclerotic heart disease with a very high degree of accuracy. These results need to be replicated in larger cross-sectional studies and their prognostic value explored.
- coronary artery disease
- protein microarray
atherosclerotic cardiovascular disease (ASCVD) is the primary cause of morbidity and mortality in the developed world (3, 4). Despite the chronic nature of the disease, ASCVD is generally undiagnosed before onset of symptoms or complications. The first clinical presentation of more than half of the subjects with coronary artery disease (CAD) is either myocardial infarction or death (16). This grim reality is at least in part due to the lack of markers that accurately identify active atherosclerotic disease before complications occur.
Inflammation has been implicated in all stages of ASCVD and is considered to contribute to the pathophysiological basis of atherogenesis (10, 19, 24). Inflammation may therefore serve as a potential marker of the disease process itself. In large epidemiological studies, various serum markers of systemic inflammation such as C-reactive protein (CRP), fibrinogen, and interleukin-6 (IL-6) have been shown to predict cardiovascular events and to correlate with response to therapy (22, 23). Although potentially useful in risk stratification, the current systemic markers of inflammation lack sufficient disease specificity to be used satisfactory as a screening tool in the diagnosis of CAD (21). The inaccuracy of current markers may reflect the fact that they are neither derived primarily from the vascular wall nor produced primarily by cells involved in the vascular inflammatory process. Furthermore, they may signal inflammation in a number of different organs and tissues, which may or may not have direct implications for the vasculature. Recently, a number of studies have examined several other biomarkers on an individual basis as potential novel risk factors for ASCVD (12, 25). Although these studies demonstrate that some of these inflammatory markers (those known to be expressed in the diseased blood vessel) can predict the onset of clinically significant ASCVD, none of the markers provide clinically meaningful incremental value over traditional risk factors in predicting CAD complications. However, it is highly likely that, due to the heterogeneity of the disease phenotype in the population at risk, a single marker may not provide sufficient biological information for an accurate assessment of vascular damage in the coronary circulation.
Thus, there remains a critical need to develop noninvasive tests that more accurately detect the presence and activity of ASCVD and improve our ability to predict and prevent clinical events. In this proof-of-concept study, we hypothesized that a measure of multiple serum proteins of inflammation derived from the blood vessel wall can be used to reveal a “signature of disease” that reliably identifies individuals with CAD. To test our hypothesis, we studied a subset of “cases” with confirmed CAD and “controls” without history of clinically significant CAD from the ADVANCE (Atherosclerotic Disease, VAscular function, & genetiC Epidemiology) study, a population-based case control study with a focus of uncovering novel genetic determinants of coronary atherosclerosis. Using serum samples collected at the time of first study clinic visit, we simultaneously measured seven chemokines with a commercially available protein microarray and compared the ability of these measurements to identify subjects with clinically significant CAD to that of traditional cardiovascular risk factors.
Source population and study subjects.
The ADVANCE study was approved by the Institutional Review Boards at both Stanford University and the Kaiser Permanente of Northern California (KPNC) Division of Research. The source population included adults (age ≥18 yr) living in or near the San Francisco Bay Area who were receiving medical care within KPNC, a population generally representative of the local and statewide insured adult population. (13, 18) Between October 28, 2001 and December 31, 2003, 3,179 subjects from the source population living in or near the San Francisco Bay Area were recruited into five separate groups: 1) subjects with clinically significant CAD at an age <45 yr for males and <55 yr for females (n = 500), 2) subjects with incident stable angina at an older age (n = 468), 3) subjects with incident acute myocardial infarction (AMI) at an older age (n = 924), 4) young subjects with no history of CAD (n = 264), and 5) subjects aged 60 to 72 at the time of their study clinic visit with no history of CAD, ischemic stroke (CVA), or peripheral arterial disease (PAD) (n = 1,023). Extensive details of the eligibility criteria, sampling strategy and the recruitment statistics for all cohorts are available elsewhere (http://med.stanford.edu/advance/) (11, 15, 29).
For this study, we selected a stratified random sample based on race, age, and gender of 50 subjects from group 3 with incident AMI to serve as our cases of and 48 subjects from group 5 to serve as our healthy controls. All cases were in the age range of 60–69 yr at the time of their qualifying event, and all controls were in the same age range at the time they were identified by the computerized databases as eligible to participate in 2001 (29). All subjects were also of white/European descent, and half in each group were selected to be female.
Clinical information was derived from a comprehensive questionnaire completed by all participants, which was reviewed and collected at the first study clinic visit. For cases, this clinic visit occurred a median of 3.2 (range 1.9–18.8 mo) after the qualifying AMI. Thereafter, a targeted physical exam was performed to document the blood pressure, heart rate, height, weight, and the waist circumference of all participants. Finally, blood was collected for measurements of various serum proteins including fasting insulin, fasting glucose, and CRP. Plasma concentrations of glucose and insulin were measured with standard methodologies. CRP was determined by standard high-sensitivity ELISA.
Hypertension, high cholesterol, and diabetes were defined based on self-report. For cases, each of these risk factors was considered present only if subjects reported that the risk factor was diagnosed by a health care provider prior to the time of the qualifying AMI. Smoking was classified as “ever” vs. “never” relative to the clinic date. “Ever” smokers were subjects who reported smoking cigarettes regularly for at least 3 mo at any time prior to the study visit as well as at least 100 cigarettes throughout their lifetime.
Protein microarray hybridization and data processing.
To assess the concentrations of nine different chemokines CXCL10 [IFN-γ-inducible protein 10 (IP-10)], CCL11 (eotaxin), CCL3 [macrophage inflammatory protein (MIP1α)], CCL2 [monocyte chemotactic protein (MCP1)], CCL8 (MCP2), CCL7 (MCP3), CCL13 (MCP4), CXCL8 (IL-8), and CCL5 [regulated on activation, normal T cell expressed, and presumably secreted (RANTES)], we used a commercially available Schleicher and Schuell protein microspot array (FastQuant Human Chemokine; S&S Biosciences, Keene, NH). This array platform utilizes multiple monoclonal highly specific antibodies spotted onto standard microscope slides coated with a three-dimensional nitrocellulose surface. The sensitivity and specificity of these markers and correlation with conventional ELISA has been demonstrated previously (17). Lack of cross-reactivity among these markers has been established previously (17, 28). Plasma samples were hybridized to protein arrays using manufacturer's instructions, followed by addition of a biotinylated secondary antibody and Cy5-streptavidin conjugate. Resulting fluorescence intensity was measured using an Axon Genepix 4000B microarray scanner in conjunction with feature extraction software (Array Vision Fast 8.0, S&S Biosciences) to convert the scanned image into numeric intensities. Absolute concentrations were measured by interpolation of intensity values with internal standard references run in parallel. Depending on the specific analyte, FastQuant protein arrays present control variability ranging from 3 to ∼15% and sensitivity from 1 to 10 pg/ml. Accuracy of FastQuant protein arrays are comparable to the correspondent ELISA determinations (1, 2) with a similar linear range. Detailed supplemental methods and quality control results for the current study are provided online on publisher's website, including array reproducibility and standard curves (supplemental material).1 Numerical raw data were subsequently both analyzed in local Windows workstations and migrated into an Oracle relational database specifically designed for microarray data analysis.
RANTES and IL-8 were not considered for further analyses. The RANTES standard curve was nonsigmoidal and, therefore, did not have a linear portion for calculating concentrations. In both case and control subjects, most of the IL-8 values were outside the standard curve limits.
Subjects missing only one chemokine value (6 cases and 1 control) were maintained in the analysis by imputing this missing value using the K-nearest neighbor (KNN) method with the nearest 10 neighbors (31). Subjects missing two or more of the seven chemokine values (2 cases and 4 controls) were excluded from further analyses since these methods are not robust with >15% missing values (31). Missing values for glucose, insulin, and CRP for two subjects were also imputed by the KNN method.
We examined relationships between variables in our final dataset independent of case-control status in several ways. First, we performed a two-dimensional hierarchical clustering analysis (2D-HC) and built heat maps using the open-source software TMev, ver. 3.0 (TM4 suite; The Institute for Genomic Research, Rockville, MD) (26) with complete linkage and Pearson's correlation as distance metrics. Second, we used multidimensional scaling (MDS) implemented as function “cmdscale” in R language. Third, we performed a multivariate linear regression analysis to test the marginal effect of each traditional risk factor of CAD and the use of specific medications on each of the seven chemokines independent of case-control status. Finally, we calculated Spearman correlation coefficients between chemokines levels and the number of days elapsed from the time of the qualifying AMI to the time of the clinic visit in cases. Detailed methods are provided in the online supplemental material.
Differences in clinical characteristics between cases and controls were examined using standard parametric and nonparametric methods including a t-test for normally distributed continuous variables, a Wilcoxon test for nonnormally distributed variables, and a χ2-test for all binary variables.
We explored the performance of three different classification algorithms on our data: logistic regression (LR), linear discriminant analysis (LDA), and recursive partitioning (RP). Three different algorithms were used to maximize the probability of identifying the combination of clinical and biomarker variables that would most accurately predict case-control status: For logistic regression, we used a forward sequential automated model selection technique, setting the α (SLENTRY) to 0.10 to select independent predictors of case-control status. A stepwise approach was used with LDA with the performance metric being the “correctness rate” (9). For Recursive partitioning, we used the library “rpart” of the R language (30). Further details on the LDA and RP algorithms we used are provided in the supplementary appendix. We also used logistic regression to test whether a forward sequential automated selection technique would select any of the chemokines after first forcing all traditional risk factors into the model.
To quantify the diagnostic performance of the final model derived from each classification algorithm, we built receiver operating characteristic (ROC) curves and calculated the area under the curve (AUC) (33). Using LR, we also generated ROC curves to calculate the AUC for each biomarker of inflammation in isolation and for a model that included all traditional factors without and with CRP. To test for differences in the AUCs of any two models derived by LR, we used a cross-validation approach (see supplementary appendix for detailed methodology).
Unsupervised data analysis and independent predictors of chemokines levels.
Two-dimensional hierarchical clustering indicated that case and control subjects tend to form large homogeneous clusters, although occasionally individual cases and controls remain outside these large clusters (Fig. 1), suggesting a common profile within each group. MDS demonstrates that chemokines cluster together and away from traditional risk factors and parameters related to insulin resistance (Fig. 2). They explain the most variance within the dataset. Of interest, CRP clusters more closely with metabolic parameters rather than chemokines. After adjustment for case-control status, levels of all seven chemokines were significantly correlated with diabetes and body mass index (BMI) (Table 1). Chemokine levels varied less consistently with age and hypertension. There were no significant associations between chemokine levels and other clinical variables including the use of aspirin, statins, and ACE inhibitors. We did not find an association between chemokine levels and days elapsed from the time of the qualifying event in cases to the time of the sample collection (Spearman's correlations range: −0.08 to 0.06, P value range: 0.59 to 0.94).
Characteristics of the subjects stratified by case-control status.
Most clinical characteristics of cases and controls differed as expected (Table 2). Despite sampling cases and controls in the same age range, controls were slightly older than cases. This association, in the direction opposite of what one would expect, is a consequence of the eligibility criterion for age set in the two cohorts from which we randomly selected our cases and controls (5, 11, 29). Despite this slight difference in age, cases as expected had a higher BMI, waist circumference, insulin, and glucose compared with controls, as well as a higher prevalence of smoking, hypertension, high cholesterol and diabetes. Cases also had a much higher prevalence of use of drugs that are routinely prescribed for secondary prevention of CAD, including aspirin, statins, and ACE inhibitors. The majority of chemokines measured were also significantly higher in cases compared with controls. None of the cases or the controls reported suffering from a chronic inflammatory disorder such as rheumatoid arthritis, lupus, or inflammatory bowel disease (details not shown).
Variables selected for prediction of case-control status using different classification algorithms.
Each of the three classification algorithms entered three chemokines into its final model although each selected a slightly different combination (Table 3). Only the logistic regression algorithm entered traditional risk factors into the final model. The classification algorithm that produced the best AUC was logistic regression (AUC = 0.95). Of note, the variable “age” was forced into the LR and LDA models given the spurious association between age and case-control status described above. For recursive partitioning, we could not force age into the model. For LR and LDA, we also initially attempted to force in the variables indicating aspirin, statin, and ACE inhibitor use, but doing so led to model converge problems probably as a consequence of our small sample size and the high correlation between these variables and case-control status (Table 1). Therefore, these medication use variables were omitted from these analyses. In the absence of any significant association between these variables and each of the seven chemokines (Table 1), it is unlikely that their omission had any meaningful impact on the model derived by each classification algorithm. For the LR model where all traditional risk factors were forced in, we had to reduce the SLENTRY from 0.10 to 0.01 to avoid convergence problems. At this SLENTRY, two chemokines entered the model (IP-10 and MCP3).
ROC plots and AUC for individual biomarkers of inflammation, models using traditional risk factors, and best models derived by classification algorithms.
Figure 3A shows the ROC plots for each of the chemokines individually and for CRP, while Fig. 3B shows the ROC plots for the combination of traditional risk factors of CAD (age, waist circumference, diabetes, hypertension, high cholesterol, and smoking) without and with CRP and for the model derived by our LR classification algorithm. Several of the chemokines alone have a higher AUC than the models using traditional risk factors, but none had an AUC higher then the best model derived by LR. CRP did not improve the AUC of the model using traditional risk factors. The best model derived by LR appears far more accurate than the either model using traditional risk factors. The best model derived by LDA (AUC = 0.94; SE 0.03) and the best model derived by RP (AUC = 0.89; SE 0.03) is also more accurate than either model using traditional risk factors (data not shown).
Pair-wise comparisons of AUC for models derived by logistic regression.
The AUC of the best LR model derived by a forward sequential automated model selection technique was statistically significantly higher than the AUC of the model using all the traditional risk factors (Table 4). The AUC for the best LR model was also higher than the AUC of the two chemokines that best predicted case-control status in isolation. Although the difference in these AUCs did not reach statistical significance at the 0.05 level, the results demonstrated a very strong trend toward significance (for IP-10, P = 0.06; for MCP4, P = 0.1). IP-10, eotaxin, MCP1, and MCP4 each in isolation performed better than the model using traditional risk factors without or with CRP (P < 0.05 for all, details not shown).
There is a great need for improved tools to diagnose active ASCVD prior to clinical presentation. Although insights into the mechanisms and circumstances of atherosclerosis are expanding, methods for identifying subjects with active disease and predicting the efficacy of primary prevention strategies remain suboptimal. We hypothesized that a multidimensional approach utilizing profiles of several biomarkers of inflammation could reveal a “signature” of atherosclerosis-related vascular inflammation. The present study provides strong preliminary experimental support for this hypothesis and suggests that measurement of multiple biomarkers may reliably identify subjects with confirmed coronary heart disease.
Since vascular inflammation is an underlying pathophysiological basis of atherosclerosis, chemokines, which are produced in the atherosclerotic vessel, are prime candidates as markers of CAD. Chemokines form a network of chemotactic proteins produced by activated leukocytes as well as vascular, endothelial, and smooth muscle cells (7). Their main role is to promote accumulation and activation of leukocytes in tissues, and their interaction with several cellular receptors contributes to the specificity of the inflammatory infiltrate (20, 27). Chemokines are often present as groups with varying composition, and the biological effect of such groups can be quite different from that of individual factors in isolation. Thus, measuring global patterns of cytokine and chemokine expression plausibly may yield more relevant biological information than individual protein assays.
Our data demonstrate that serum concentrations of the chemokines are differentially regulated in individuals with clinical CAD compared with subjects with no history of CAD. Although chemokines were correlated with traditional risk factors, these correlations were weak for most risk factors. Furthermore, our MDS analyses showed that chemokine levels did not cluster with traditional risk factors or CRP. Models using multiple chemokines more accurately distinguished cases and controls compared with models using traditional risk factors. These comparisons reached statistical significance despite a majority of cases being on medical therapy, which can acutely suppress markers of inflammation (22). Models using multiple chemokines also demonstrated a very strong trend toward being more valuable than individual chemokines despite the relatively small sample size. Of note, CRP was not selected by any of the classification algorithms. Furthermore, CRP had relatively poor diagnostic performance alone or in combination with traditional risk factors. Taken together, our findings suggest that the chemokine profile represents a strong signal of vascular disease which appears to be much more specific than traditional risk factors and CRP.
Our study has several limitations. First, the serum samples from the case subjects were generally collected several months after the myocardial infarction. Prior studies indicate that inflammatory markers such as CRP (6) and metalloproteinases (32) are labile after acute events. However, little is known regarding circulating chemokines under these circumstances. As such, we cannot be certain that the levels we measured in cases after initial AMI are a stable reflection of the underlying biology. However, given that measures of inflammation taken after acute events generally subside to preevent levels within days, it appears unlikely that the chemokine profile at 3–18 mo is simply a reaction to myocardial infarction. Moreover, a linear model to evaluate a possible relationship between chemokine levels and time-from-event did not reveal a correlation supporting the notion that the chemokine levels are not due to the acute event. Furthermore, the complete lack of correlation between chemokine levels and time elapsed from the qualifying event to the clinic visit also suggests that the AMI had no enduring impact on serum chemokine levels beyond perhaps a few days postevent. Second, some of the clinical variables not related to chemokine levels were not optimally measured which may have reduced the AUC for the models using these variables. For example, replacing self report of hypertension and high cholesterol with accurate measures of blood pressure and serum cholesterol would probably improve the diagnostic ability of the models using traditional risk factors. Furthermore, the model including traditional risk factors and CRP may have had a higher AUC if CRP was measured off medications known to influence its levels such as statins and ACE inhibitors (22). Third, this cross-sectional study does not establish prognostic value for the models derived by the classification algorithms. However, given their high discriminatory performance and the participation of chemokines in atherosclerosis, it is likely that some prognostic information can be gleaned with appropriate study design. In fact, two recent prospective studies (8, 14) have demonstrated the prognostic value of MCP1. A third prospective study did not clearly demonstrate incremental prognostic value of chemokines. The different results of the three studies could be explained by the various follow-up periods of each study. Whereas the first two studies had a mean follow up of <10 mo and 5.3 yr, the third study had a mean follow-up time of 11 yr. Based on these trends in prospective studies and the results of several case control studies including ours, it appears that plasma chemokine level may be most valuable in the prediction of near-term CAD events. Lastly, chemokine measurements used in this study have not been standardized, which precludes their routine use in clinical practice at the present time. In our study, interassay coefficients of variation for these biomarkers were relatively high (>5%), which might have reduced the strength of the associations uncovered.
We do not consider the panel of biomarkers of inflammation studied here to be comprehensive. Indeed, the use of a wider array of analytes may improve sensitivity and specificity for the diagnosis of active ASCVD. However, this study demonstrates the feasibility of using protein microarrays to simultaneously measure multiple biomarkers and the potential usefulness of these measurements for identifying signatures of active ASCVD based on the levels of these biomarkers.
In summary, serum levels of a combination of chemokines measured using a multimarker approach can accurately distinguish subjects with clinically significant CAD from those with no prior history of CAD. A larger-scale study is needed to validate our findings as measures of model discrimination are generally higher in derivative samples, and prospective studies are also necessary to determine the incremental prognostic value of these measures over current risk stratification tools. Algorithms based on panels of informed biomarkers may ultimately lead to improved screening, diagnosis, and monitoring of cardiovascular disease.
This work was supported by a grant from the Donald W. Reynolds Foundation, Las Vegas, Nevada.
Aviir Inc. has patent-licensing arrangements with Stanford University and Kaiser Permanente of Northern California related to the work that has been described. R. Tabibiazar, E. Hytopoulos, P. S. Tsao, and T. Quertermous are named co-inventors on the patents. R. Tabibiazar and E. Hytopoulos are currently employees of Aviir Inc., while T. L. Assimes, P. S. Tsao, and T. Quertermous are paid advisors to the company with P. S. Tsao and T. Quertermous maintaining stock ownership in company. S. P. Fortmann, M. Hlatky, A. S. Go, and C. Iribarren are recipients of a grant from Aviir to support additional studies of the ADVANCE cohort.
We thank the many individuals who contributed to the ADVANCE study, including Joan Fair (co-investigator); Phenius Lathon (patient recruitment); Malini Chandra, Solomon Henry, Gail Husson, Mohammed Mahbouba, Balasubramanian Narasimhan, Richard Olshen, Ann Varady (data management); and Mary Chen, Heideh Fattaey, Emiko Miyamoto (project administration).
↵* D. Ardigo and T. Assimes contributed equally to this work.
↵1 The online version of this article contains supplemental material.
Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).
Address for reprint requests and other correspondence: R. Tabibiazar or T. Quertermous, Stanford Medical School, Div. of Cardiovascular Medicine, 300 Pasteur Dr., Falk CVRC, Stanford, CA 94305 (e-mail:, ).
- Copyright © 2007 the American Physiological Society