In this report we evaluate three methods for labeling nucleic acids to be hybridized to a cDNA microarray: direct labeling, indirect amino-allyl labeling, and the dendrimer labeling method (Genisphere). The dendrimer method requires the smallest quantity of sample, 2.5 μg of total RNA compared with 20 μg with the direct or indirect methods. Therefore, we wanted to know whether the performance of the dendrimer method is comparable to the other methods, or whether significant information is lost. Performance can be considered in terms of sensitivity, dynamic range, and reproducibility of the quantitative signals for gene intensity. We compared the three labeling methods by generating three sets of eight self-to-self hybridizations using the same total RNA sample in all cases (“replicate study”). In our analysis, we controlled for the effects of print-tip and background subtraction biases. We also performed a smaller study, namely, a dilution series study with five dilution points per labeling method, to evaluate one aspect of predictive ability. From the replicate study, the dendrimer method appeared to perform as well, and often better, with respect to reproducibility and ability to detect expression. However, in the dilution series study, this method was outperformed by the other two in terms of predictive ability and did not perform very well. These findings are helping to guide our decisions on what labeling method to use for subsequent studies, based on the purpose of a specific study and its limitations in terms of available material.
- evaluation of labeling protocols
- direct labeling
- indirect amino-allyl labeling
- dendrimer labeling
microarray technology has been available for several years (7, 8); however, optimization of many of the steps involved in performing a microarray experiment has yet to be published. This is likely due to the complexity of these steps (array preparation, sample preparation, labeling the target and array hybridization, image quantification/analysis), a growing understanding of the issues involved, and a continuing evolution of methods. In this report we focus on one of these steps: the labeling of target RNA. (For an overall review of a microarray experiment, the reader is directed to Ref. 2.) Labeling of target RNA is both common to all microarray experiments and critical. The three labeling methods compared here are direct labeling, indirect labeling, and dendrimer labeling (Genisphere). The direct labeling method incorporates dUTP fluorescently labeled with bulky dye adducts (Cy3 or Cy5) during reverse transcription of RNA (2). The indirect labeling method also incorporates a modified (amino-allyl) dUTP during reverse transcription. Subsequent to the reverse transcriptase reaction, the fluors are covalently coupled to the cDNA. Therefore, the direct and indirect labeling methods are dependent upon the efficiency of the incorporation of modified dUTPs and the sequence of the clone itself for the amount of label incorporated. In contrast, the dendrimer labeling method is entirely dependent upon nucleic acid hybridization kinetics. The initial reverse transcriptase reaction is primed with an oligonucleotide containing a specific Cy3 or Cy5 “capture” sequence. The cDNAs containing the “capture” sequences are first hybridized to the dendrimers and then to the array. A dendrimer is a complex nucleic acid structure created by hybridizing nucleotide oligomers to specifically promote the formation of a complex branched structure (5). The dendrimer used in this study contains 250 fluor molecules. Therefore, each cDNA is labeled with a relatively constant number of fluors. Dendrimer labeling has the advantages of requiring less starting material and exhibiting minimal sequence or length dependencies.
We performed a side-by-side comparison of the three methods using a single source of total RNA and multiple replicates to assess the reproducibility and the ability to detect expression of the methods. This study is referred to as “the replicate study.” We also performed a smaller study, with five arrays per labeling method, where we varied the amount of total RNA used in one channel (Cy3), hence, the ratio of Cy3-labeled RNA to Cy5-labeled RNA, to assess one aspect of predictive ability of each of the methods. This study is referred to as “the dilution series study.”
MATERIALS AND METHODS
Glass Microarray Preparation
The Pancreas 2 glass microarrays are described in detail in Ref. 6. Briefly, clones were selected (based on experiments using genome-wide Incyte GEM arrays and ESTs from pancreatic libraries deposited in dbEST) and ordered from Research Genetics and Incyte Genomics. Plasmid DNA was prepared for each clone, and the purified DNA was used as template to amplify the inserts with PCR. The PCR products were purified, eluted in deionized sterile water, diluted 50% with DMSO (Sigma product no. 8-8418) (2), and printed on poly-l-lysine-coated slides with a MicroGrid II arrayer (BioRobotics).
The microarray used for all experiments in the replicate study was the Pancreas 2.1 Array containing 3,840 spots. The list of clones arrayed is available at http://www.cbil.upenn.edu/EPConDB. Sixteen print-tips were used, resulting in a 4×4 grid layout, with 15×16 spots per grid (print-tip group). Of these 3,840 spots, 155 were left blank, 16 represent yeast negative controls (8 yeast controls, from Incyte Genomics, spotted in duplicate), 3 represent cDNA controls, 8 represent anchors (Cy3 end-labeled random 70-mers), and the remaining 3,658 represent primarily mouse genes expressed in pancreas as described above. Of the 3,840 spots on the array, 10 (the 8 anchors and 2 of the blanks) had no values upon image quantification.
The microarray used for all experiments in the dilution series study was the Pancreas 2.1.1 Array. This was identical to the Pancreas 2.1 Array, except that the eight anchors in the latter were replaced by blanks in the Pancreas 2.1.1 Array.
Preparation of RNA
Six adult CD1 female mice were euthanized; the pancreas of each and the livers from four were immediately homogenized in 10 ml denaturing solution (4 M guanidium thiocyanate, 0.1 M Tris-Cl pH 7.5, 1% β-mercaptoethanol). Total RNA was extracted using an acid-phenol extraction procedure (1). Approximately 150 μg of total RNA from each individual pancreas sample was pooled, resulting in 900 μg of a pooled pancreas RNA. This pancreas RNA pool was used for both channels of the labeling experiments in the replicate study. For the dilution series study, 800 μg of liver total RNA (200 μg from each liver) were pooled with 1,800 μg of pancreas total RNA (300 μg from each pancreas) to prepare a liver/pancreas RNA pool.
Labeling Methods and Hybridization
Prehybridization was performed for all arrays (2). A coplin jar containing 50 ml of prehybridization buffer (5× SSC, 0.1% SDS, and 1% BSA) was brought to 42°C. The arrays were incubated for 45 min, rinsed five times in deionized water at room temperature, once in isopropanol, and then placed into a 50-ml conical tube and centrifuged 1 min at 1,000 rpm. The prehybridization was done no more than 1 h prior to hybridization.
The direct labeling protocol was adapted from Ref. 2. Briefly, RNA was labeled in a reverse transcription reaction with fluorescently labeled dUTP (Cy3 PA53022 and Cy5 PA55022, Amersham, Pharmacia) and purified. The purified labeled cDNA was precipitated with 1/10 vol sodium acetate and 3 vol ethanol. In the replicate study, 20 μg of pooled pancreas RNA was labeled with both fluorophors. In the dilution series study, 20 μg of the pooled liver and pancreas RNA was labeled with Cy5 for each slide. The amount of RNA labeled with Cy3 was 5, 10, 20, 40, and 80 μg, respectively, so that the ratio of the Cy3-labeled RNA to Cy5-labeled RNA was ¼, ½, 1, 2, and 4, respectively.
Indirect or amino-allyl labeling was adapted from the Brown web site (http://cmgm.stanford.edu/pbrown/protocols). Total RNA and 2.5 μg of oligo dT were brought to 25 μl with sterile, deionized DEPC-treated water and denatured for 5 min at 70°C. The reaction was then cooled to 42°C and an equal volume of RT reaction mix [2× first-strand buffer (InVitrogen, Y02321), 400 U SuperScript II (InVitrogen, Y02226), 1 mM dATP, 1 mM dGTP, 1 mM dCTP, 0.4 mM amino-allyl-dUTP, 0.6 mM TTP, 20 mM DTT, and 20 U RNasin (Promega, product no. 12772505)] was added, and the reaction incubated at 42°C for 2 h. The reaction was stopped by incubating at 70°C for 5 min. RNase H (2.5 U, US Biochemicals product no. 70054Y) was added and the mixture incubated at 37°C for 15 min. The reaction was denatured by bringing it to 0.2 N NaOH, 0.1 M EDTA, and then neutralized by adjusting the reaction to 0.29 M Tris-Cl pH 7.5. The buffer was removed with a Microcon YM-30 (Amicon no. 42410), and the cDNA was dried in a SpeedVac. Monofunctional Cy3 (PA23001) or Cy5 dye (PA25001, Amersham Pharmacia) was coupled to the cDNA in 30% DMSO, 66 mM sodium bicarbonate buffer, pH 9.0, in the dark at room temperature for 1 h. The reaction was quenched with 4.5 μl hydroxylamine (Sigma) for 15 min. The two dye reactions were combined, and the labeled cDNA was purified with a Qia-Quick PCR purification kit (Qiagen). The purified labeled cDNA was precipitated with 1 μl polyacryl carrier (Molecular Research Center, no. PC 152), 1/10 vol of 3 M sodium acetate, pH 5.2, and 3 vol ethanol (−20°C). In the replicate study, 20 μg of pooled pancreas RNA were labeled with both fluorophors. In the dilution series study, 20 μg of the pooled liver and pancreas RNA were labeled with Cy5 for each slide. The amount of RNA labeled with Cy3 was 5, 10, 20, 40, and 80 μg, respectively, so that the ratio of the Cy3- to Cy5-labeled RNA was ¼, ½, 1, 2, and 4, respectively. In preparation for hybridization the labeled cDNA from the direct or the indirect labeling reactions were resuspended in 15 μl of sterile, deionized water with 2.5 μg of oligo dT21 and 2.5 μg of mouse cot1 DNA (500 mg/ml, GIBCO-BRL, no. 1844-016) and denatured for 5 min at 95°C. An equal volume of hybridization buffer (50% formamide, 10× SSC, 0.2% SDS) was added. The cDNA hybridization mix was placed on a prehybridized glass microarray and incubated overnight at 42°C in a Corning hybridization chamber. Both the direct and indirect slides were washed with the same conditions post hybridization: once in 2× SSC, 0.1% SDS to remove coverslip, once in 0.2× SSC, 0.1% SDS at 40°C for 5 min with agitation, and once in 0.2× SSC at room temperature for 5 min with agitation.
3DNA dendrimer (Genisphere) labeling was done with the 3DNA submicro Array kit (Genisphere, Cy3 A100731V12, Cy5 A100741V12) according to the manufacturer’s protocol and recommendations. Common control total RNA (2.5 μg) and 2 pmol Cy3 capture sequence primer or Cy5 capture sequence primer were brought to 10 μl with DEPC-treated water and incubated for 10 min at 80°C. At 42°C an equal volume of reaction mix [2× first-strand buffer (InVitrogen, Y02321), 1 mM dATP, 1 mM dGTP, 1 mM dCTP, 1 mM TTP, 20 mM DTT, 40 U RNasin (Promega, 12772505), and 200 U Superscript II reverse transcriptase (InVitrogen, Y0226)] was added, and the reaction was incubated 2 h at 42°C. The reaction was terminated by bringing it to 0.074 N NaOH and 7.4 mM EDTA and incubating it at 65°C for 10 min. The reaction was neutralized by bringing it to 0.175 M Tris-Cl, pH 7.5. The Cy3 and Cy5 reactions were then combined and precipitated with 20 μg of linear polyacrylamide (Ambion no. 9520), 1 vol 7.5 M ammonium acetate, and 9 vol ethanol at −20°C for 30 min. Following precipitation the pellet was air dried. In preparation for hybridization the cDNA pellet was resuspended in 5 μl sterile deionized water. Then, 2.5 μl of the Cy3 dendrimer, 2.5 μl of the Cy5 dendrimer, and 1 μl high-end differential enhancer (Genisphere, vial 10) were added to the cDNA. Mouse cot1 DNA, 2.5 μg (1 mg/ml, GIBCO-BRL, no. 1844-016), 2.5 μg oligo dT21 and 1 μl of anti-fade reagent (Genisphere, vial 8) were added to 100 μl of hybridization buffer (40% formamide, 4× SSC, 1% SDS, Genisphere, vial 7), and the hybridization buffer was brought to 45°C. The prepared hybridization buffer, 19 μl, was added to the cDNA/dendrimer mix and incubated at 45°C for 15 min. This hybridization mix was added to a prehybridized glass microarray, covered with a glass coverslip, and incubated in a Corning hybridization chamber containing 10 μl of water in each reservoir overnight at 45°C. The Genisphere labeled arrays were washed for 10 min each, once at 55°C in 2× SSC, 0.2% SDS, once in 2× SSC at room temperature, once in 0.2× SSC at room temperature, and dried by placing them into a 50-ml conical tube and centrifuging them at 1,000 rpm for 3 min. In the replicate study, 2.5 μg of pooled pancreas RNA was labeled with both fluorophors. In the dilution series study, 2.5 μg of the pooled liver and pancreas RNA was labeled with Cy5 for each slide. The amount of RNA labeled with Cy3 was 0.625, 1.25, 2.5, 5, and 10 μg, respectively, so that the ratio of the Cy3- to Cy3-labeled RNA was maintained at ¼, ½, 1, 2, and 4, respectively.
Scanning and Image Analysis
All slides were scanned immediately following hybridization using an Affymetrix (formerly GMS) 418 scanner. The laser power was set to 100%, and the PMT settings varied depending upon the intensity of the array. (Note that in the dilution series study, the PMT settings varied from method to method but were constant across the five arrays of each method.) Our goal was to scan at a setting that would avoid signal saturation in any spots. For the replicate study, 9–10 hybridizations were performed for each labeling. After scanning the arrays, images were visually inspected, and eight slides with no apparent major flaws selected for each labeling procedure.
The image analysis was performed with ArrayVision 6.0 (Imaging Research). The segmentation adaptive function of the program was enabled, and the local background values were computed from diamond-shaped regions between the spots. The mean measure of pixel intensities was used both for the foreground and for the background at each spot.
After quantification, the data were stored in the relational database RAD (4, 10). Access to the data will be provided through the EPConDB web site (http://www.cbil.upenn.edu/EPConDB). Our data will also be deposited in the public repository ArrayExpress (http://www.ebi.ac.uk/microarray/ArrayExpress/arrayexpress.html).
Preprocessing and Analyses
We used version 1.3.1 of the statistical software package R (3) for all the statistical calculations and plots in this paper. To fit curves to scatter plots, we used the “lowess” function (a robust scatter plot smoother) implemented in R, with f typically set to 0.3 or 0.4.
For the comparisons in the Labeling Method Reproducibility (below, in results) (using the replicate study), the M values have been normalized via the print-tip group lowess normalization described in Ref. 12 (http://www.stat.Berkeley.EDU/users/terry/zarray/Html/normspie.html) and implemented in the (R) SMA package by those authors. This method provides the means for normalizing log (base 2) ratios of Cy5 signal to Cy3 signal in a way that is not only array dependent, but also intensity and print-tip dependent. The method is applicable to the experiments in the replicate study since they consist of self-to-self hybridizations (of equal amounts of total RNA); therefore, the assumptions underlying this kind of normalization are satisfied.
In what follows, we adopt the notation of Yang et al. (12): for a given array and a given spot, the A and M values are defined as A = (log2R + log2G)/2 and M = log2(R/G). Here, R and G denote, respectively, the red (Cy5) and green (Cy3) signals for that spot on that array. Because there is currently much debate on whether it is best to subtract background, when this has been obtained with local background techniques (like in ArrayVision), we have carried out our comparisons using signals both with and without background subtraction.
Thresholds for Useful Signals
Evaluation of sensitivity and reproducibility of microarray data must take into account which signals are likely to be real indicators of expression and which simply reflect autofluorescence or nonspecific binding of labeled material. To investigate this issue, we used as controls the signals obtained in the replicate study from the blank spots and the yeast sequences. One of the yeast sequences (yeast 400) exhibited high signal most likely due to excessive cross-hybridization with a mouse mRNA and was therefore excluded. All other yeast spots consistently gave very low fluorescence signals. Two of the blank spots had missing values after analysis with ArrayVision and could not be used. Therefore, we utilized 167 control spots on the array. For each array, the 90th percentile of the A values for these controls was computed to provide a threshold for useful signals. On each array, spots with A failing to exceed the threshold were flagged. (Blank, yeast controls, and anchor spots were flagged as well.)
In this study, the three different labeling procedures were tested using Cy5 and Cy3 on the same amount and source of total RNA extracted from adult mouse pancreas. Since we used a pancreas array, hybridization of pancreas total RNA should result in useful signals for a large fraction of spots. We considered a signal useful on an array if the spot was not flagged by the procedure described above. For each labeling method and each spot, we counted the number of unflagged replicates for that spot across the eight arrays for that labeling method. Then, for each labeling method, we examined the number of spots with a high number of unflagged replicates (7 or 8) to check sensitivity together with reproducibility. All of the above was done both with and without background subtraction, and the results are reported below. Briefly, both the indirect and the dendrimer methods outperform the direct method in this respect, but they do not differ significantly between each other if the number of spots with at least seven unflagged replicates is considered. If only the number of spots with eight unflagged replicates is considered, then the dendrimer method outperforms the indirect.
It should be noted that the tests above check one of the measures of sensitivity, namely, the ability to detect expression, and they do not establish the sensitivity for each gene. In the Dilution Series Study (below), we use that smaller set of experiments as a first step into the investigation of another aspect of sensitivity, namely, the degree of linear response, with adequate slope, to different dilutions. In what follows, we often use for short the term “predictive ability” to denote the latter, which in reality is just one aspect of predictive ability.
Without background subtraction.
For each labeling method and each spot, we computed the average value of A over the eight replicates for that spot. The ranges of such average A values for the three labeling methods over all spots are reported in Table 1. The A thresholds for each replicate in each labeling method are also reported in this table.
Of the 3,658 spots representing primarily mouse pancreas genes, the numbers with at least 7 unflagged replicates (denoted as “good” spots in Table 1) were: 1,657 for the direct labeling, 2,435 for the indirect labeling, and 2,507 for the dendrimer labeling. Both the indirect and the dendrimer methods differ significantly from the direct method in this respect (in both cases P < 2.2e−16, using a χ2-test), but they do not differ significantly between each other (P = 0.08). If we consider the numbers of spots with 8 unflagged replicates (denoted as “very good” spots in Table 1) for the latter two methods, then there are 1,954 for the indirect and 2,354 for the dendrimer, which differ significantly (P < 2.2e−16).
With background subtraction.
For each labeling method and each spot, we computed the average value of A over the eight replicates for that spot. The ranges of such average A values for the three labeling methods over all spots are reported in Table 2. The A thresholds for each replicate in each labeling method are also reported in Table 2.
Of the 3,658 spots representing primarily mouse pancreas genes, the numbers with at least 7 unflagged replicates (denoted as “good” spots in Table 2) were 2,088 for the direct labeling, 2,229 for the indirect labeling, and 2,184 for the dendrimer labeling. The indirect method differs significantly from the direct method in this respect (P = 0.0009, using a χ2-test). The dendrimer method differs significantly from the direct method (P = 0.02) and even more sharply if we consider the numbers of spots with eight unflagged replicates (1,705 for the direct method, and 2,008 for the dendrimer method, P = 1.637e−12; these are denoted as “very good” spots in Table 2). The indirect and the dendrimer methods do not differ significantly between each other with respect to the number of spots with at least seven replicates (P = 0.29). If we consider the number of spots with eight unflagged replicates for the latter two methods (1,847 for the indirect and 2,008 for the dendrimer), they differ significantly (P = 0.0002).
Correlation of Relative Signal Intensities Between Labeling Methods
To have a better sense of how the three labeling methods perform with respect to quantities of interest (e.g., the signal-to-noise ratios, or the Cy5-to-Cy3 ratios), it is useful to examine, for each method, how each such quantity depends on the relative intensity of the signals. To do this across the three methods, it is convenient to first know whether genes are high or low expressers consistently across the three methods. In other words, was the position of a gene’s intensity in the spectrum of intensities dependent on the labeling method used? To determine this, we utilized the replicate study, and, for each labeling method and each spot, we calculated the average value of A over the eight replicates for that spot and determined which quantile such a value represented over the distribution of average A values across all spots. Then, for each pair of labeling methods, we examined the correlation between such numbers. We did this both with and without background subtraction, figures of the scatter plot matrices in each case are available in the Supplementary Material1 published online at our web site. Correlation coefficients between the indirect and each of the other two methods were greater than 0.9, and between the direct and the dendrimer methods they were around 0.8, regardless of whether background was subtracted. Therefore, there was a good correlation between the relative intensities for a spot between pairs of labeling methods. This justifies the use of the average A value across all 24 arrays (which we refer to as the “grand average A”) as a measure of spot intensities common to all three methods, based upon which fair comparisons between methods can be made. The correlation results above show that genes with high grand average A are relatively high expressers in all three methods and similarly for those with low grand average A.
When no background subtraction was performed, the range of the grand average A was [7.2, 14.6] with interquartile range [7.8, 9.9] and 90th percentile 11. When we assigned missing values to all spots flagged as above, in Thresholds for Useful Signals, and then computed the grand average A, the interquartile range was [8.2, 10] and the 90th percentile was again roughly 11. If background subtraction was performed, then the range of grand average A was [1.9, 14.6] with interquartile range [5.2, 9.5] and 90th percentile 10.9. The interquartile range after assigning missing values to all flagged spots was [7, 9.8] and the 90th percentile was roughly 11.
One of the measures generated by ArrayVision is the signal-to-noise ratio (S/N). This is calculated, for each spot and each channel on an array, as (foreground intensity-background intensity)/(standard deviation of the background). The higher the S/N for a spot the better. Using the replicate study, we explored how S/N behaved as a function of signal intensity in each of the methods in two different ways.
First, for each labeling method and each spot, we computed the average S/N over all the 16 S/N values (8 replicates × 2 channels) for that spot. We then analyzed the scatter plot of the average S/N vs. the grand average A for each method and fitted a curve to this scatter plot, as described in Preprocessing and Analyses, in methods, above (f = 0.3). We did this using A values both with and without background subtraction. The three scatter plots and curves for the no-background subtraction case are shown in Fig. 1. These indicate that the dendrimer method tends to outperform the other two methods for values of grand average A greater than 9, which represents most of the range for useful signals.
To do an independent analysis, a method of casting votes was used based on the S/N. In this case the spots were divided into 10 bins, the nth being those spots whose grand average A were between the 10(n − 1)th and 10nth percentile of all grand average A values. Therefore, each bin had 383 spots (1/10th of the 3,830 spots for which we had values from ArrayVision). For each bin and each method, we calculated the percent of spots whose S/N was greatest in that method. One can think of it as each spot casting a vote for one of the methods, and the “winner labeling method” being calculated separately for each bin to gauge how the results depend on the intensity level of the spots. We refer to this below as the “vote casting” method of analysis. As before, the dendrimer method performed well, the winners and their percentages of winning votes for each bin are given in Table 3. Thus, for data in the top 80% of grand average A, the dendrimer method won over the other two methods with a margin of victory increasing with the intensities.
With background subtracted A values the results were very similar, both for the scatter plots and for the vote casting (figures are available in the Supplementary Material).
Labeling Method Reproducibility
There are various statistical measures that can be used to assess reproducibility when replicate experiments are available, e.g., the coefficient of variation of a collection of measurements. In our replicate study, a more natural statistic was the root mean square (rms) of M to measure the deviations of M from its ideal value of 0 at every spot on every slide (modulo removal of dye and print-tip/spatial biases). This is justified because the experiments are self-to-self comparisons (with the same amount of labeled material for each channel). We normalized the M values within each array as illustrated in Preprocessing and Analyses (in methods, above). For each labeling method and each spot, we then examined the rms of M for that spot over the eight replicates [i.e., the square root of the average M2 for that spot; the smaller the rms(M) for a spot, the better]. After generating scatter plots (as described below) of rms(M) vs. the grand average A and fitting curves to these (using lowess with f = 0.4) for each labeling method, we evaluated the three resulting graphs obtained in each case to compare the rms(M) across the range of intensities. Of the normalization methods proposed in Ref. 12, we chose print-tip group lowess normalization, preferring this to the scaled-print-tip group lowess normalization and to the across slide scale normalization also described in the same paper (one of the reasons for this choice was that this was the method which seemed to perform best according to the studies carried out in Ref. 12). Because of this, we decided to examine the rms(M) vs. grand average A not only across all spots on the array, but also print-tip by print-tip.
Finally, we applied also in this situation a vote casting procedure, this time based on rms(M).
We did all of the above using values both with and without background subtraction. Results are summarized below. All relevant plots are available in the Supplementary Material.
Without background subtraction.
Figure 2 shows the scatter plots and the corresponding lowess curves of rms(M) vs. grand average A for the three labeling methods: rms(M) values are in general very close to 0 in all three methods, and the methods are roughly equivalent in this respect. The dendrimer method slightly outperforms the other two methods over the interquartile range of grand average A. The indirect method slightly outperforms the other two for higher values of grand average A, but one should keep in mind that 11 is roughly the 90th percentile of these values. We repeated the same procedure print-tip by print-tip and obtained analogous results.
As with the S/N analysis, we also performed the voting method; this time spots cast votes based on the method with lowest rms(M). As before, dendrimer labeling performed well; the winners and their percentages of winning votes for each bin are given in Table 4.
With background subtraction.
If background is subtracted, then the rms(M) values typically show decreasing patterns over the range of grand average A. This is explained by the fact that background subtraction most affects the variability of low-intensity signals. Figure 3 shows the scatter plots and the corresponding lowess curves of rms(M) vs. grand average A for the three labeling methods. For the lowest intensity signals the direct method performs best, but for the range of useful signals (those above the first quartile of grand average A after excluding flagged spots) the dendrimer and the indirect methods slightly outperform the direct one. A similar behavior occurs in most of the print-tip by print-tip plots.
Casting votes based on rms(M) led to concordant conclusions. The winners and their percentages of winning votes for each bin are given in Table 5.
Dilution Series Study
Besides the ability to detect expressed transcripts, which was discussed above (Thresholds for Useful Signals), another aspect of sensitivity that is of interest in array experiments is predictive ability. We used a dilution series study as a first pass investigation of the latter. For each labeling method, five hybridizations were performed where the amount of RNA labeled with Cy5 was kept constant in all five, whereas the amount of RNA labeled with Cy3 was varied so that the ratio of the latter to the former was, respectively, ¼, ½, 1, 2, and 4. We excluded from the analyses reported below spots for which we had evidence of PCR failures and spots that were flagged during visual inspection of the images, due to artifacts (specks of dust, etc.). Scatter plots of the measured Cy3 intensities vs. the measured Cy5 intensities (both with and without background subtraction) for each of the 15 hybridizations are available in the Supplemental Material. The values for r2, and for the slope and intercept of the regression line are given in Table 6. The indirect and dendrimer methods have r2 values closer to 1 than the direct method. However, the slopes of the regression lines of the latter method are closer to the ideal slopes of 0.25, 0.5, 1, 2, and 4, respectively. These scatter plots refer to values as derived by the ArrayVision software, without normalization. A possible normalization for this kind of experiment would be a lowess-type normalization which would center the M values about the lines M = 2, 1, 0, −1, and −2, respectively, for the five dilution points, but since we were interested in exploring the actual “raw” response of the three methods to different dilution ratios, such a normalization would have obscured what we were looking for. On the other hand, also because the PMT settings used varied from method to method (but not within a method), to put the methods on equal footing, we examined quantities of interest for each method relative to their values at the 1:1 dilution point (which we simply refer to as “dilution 1”), in effect utilizing this point as a baseline within each method. For example, we compared each of the slopes listed in Table 6 to the slope at dilution 1 of the corresponding method (whose ideal value should be 1). Figure 4 shows, for each method, in log2 scale, the observed vs. the expected ratios of each of the five slopes to the slope at dilution 1, in other words, the observed vs. the expected relative slopes. The direct method is the closest to the ideal values, followed by the indirect method. Note that at the highest dilution we start seeing some saturation in the dendrimer method, which is most likely explained by the fact that the amount of Cy3 dendrimers added was not in sufficient excess to cover the amount of RNA used for this dilution point.
We have also investigated this dataset from a different angle, to examine the average gene response across the five dilution points for different expression level categories. We divided the spots with defined values in all 15 experiments into different expression level categories: “very high,” “high,” “medium,” and “low.” This was done by examining the average A values of the three experiments at dilution 1 (one experiment per labeling method). The “very high expressers” category consisted of the spots in the top 5% of the average A range; the “high expressers” category consisted of the spots in the top 25% but not the top 5%; the “medium expressers” category consisted of the spots in the top 50% but not the top 25%; and the “low expressers” category consisted of the spots in the top 75% but not the top 50%. Spots in the bottom 25% were not considered, as their average A values were at the level of the blanks and yeast controls. For each expression level category and each labeling method, we averaged the log2(Cy3/Cy5) values of the spots in that category for each of the five dilution points and again examined these results relative to the dilution 1 experiment, by subtracting the average log2(Cy3/Cy5) at dilution 1 from each of the five average log2(Cy3/Cy5). We plotted the results vs. the ideal values of −2, −1, 0, 1, and 2. Figures 5 and 6 show the plots for the very high expresser category and for the high expresser category. The plots for the other two categories are available in the Supplemental Material (we have also included in the Supplemental Material a graphical display of the slope and r2 of the regression line for each gene as functions of the expression level). Using the median instead of the average yielded similar plots. We also examined analogous plots [i.e., relative log2(Cy3/Cy5) values compared with ideal ones] for individual genes in different expression level categories. In addition to evidence of saturation at the highest dilution, which we have discussed above, we noticed from these plots that in general the dendrimer method did not perform as well as the other two methods. This is most evident for lower levels of expression (where the difference in S/N between the three methods is not as sharp, as from Fig. 1), where more “flattening” of the values occurs (see figures 28 and 29 of the Supplemental Material). We also observed that for medium and low expressers, for the direct and indirect methods, subtracting the background yielded plots closer to ideal than not subtracting it. The indirect method shows some compression at higher dilutions. This compression is probably not related to the reverse transcriptase reaction itself, since the parameters were the same as were used for the direct labeling technique. It could be an effect of the specific technique, which could have been anything from the dye coupling to the more extensive purification procedures involved with the indirect method. In contrast to the dendrimer method where the five dilution points show a trend of compression on both ends, the compression observed here is for a single point. More experiments with replicates are needed before definitively stating that the indirect method shows compression at high RNA concentrations.
Many factors have an impact on the reliability of signals. Not all were addressed in this study. The scanner used is important to consider, as it is generating the actual signals to be quantified. For one of the labeling conditions (indirect amino-allyl), the hybridized slides for the replicate study were also scanned using another scanner, namely, the Genepix 4000 (Axon Instruments). The images generated were quantified using the same software (ArrayVision). The Affymetrix (formerly GMS) 418 scanner used for the major study did consistently have a slightly higher S/N, but our assessment is that the results of this study were not highly dependent on which scanner was used.
The choice of image analysis software is clearly critical in the production of reliable intensity values. In this study, a single software package (ArrayVision) was used to quantify the results from the three labeling methods under evaluation. A (growing) number of image quantification software packages are available with varying abilities to accurately identify a spot and to measure foreground and background signals. Yang et al. (11) have carried out a comparison of several of these packages. A relevant conclusion drawn from that study is that the method for determining background varies between these packages and can significantly affect the final signals used. There is some debate on whether it is best to subtract background when this has been obtained with local background techniques (here background subtraction typically has a strong impact on low-intensity values), as was the case with the package we used. This prompted us to carry out our comparisons both with and without subtracting background.
In each of our two studies (the replicate study and the dilution series study), we have utilized a single RNA source and a single print run of microarrays to evaluate three different labeling methods. Yue et al. (14) utilized a series of replicate self-to-self hybridizations with one particular array technology and protocol to assess reproducibility of this method as well as performance in terms of detecting differential expression. In our studies, the focus was on the comparison of three different labeling protocols in terms of reproducibility and sensitivity.
For reproducibility evaluation in the replicate study, we opted for the rms(M) instead of the coefficient of variation (cv) of Cy5-to-Cy3 ratios, as the former is a more natural statistical measure to evaluate deviations in self-to-self comparisons (the cv of Cy5/Cy3 would be measuring how the variance of these ratios compares to their mean and could be small even in cases where the mean is very different from 1, the ideal value). We also found it useful to examine this measure as a function of signal intensity.
As for sensitivity, we used our replicate study to investigate the ability of the methods to generate sufficient signal to detect expression from most of the genes on our pancreas-specific microarray, as pancreas RNA was hybridized to it. We also carried out a smaller study, consisting of three five-point dilution series (one per labeling method) as a first step investigation into predictive ability.
In terms of the parameters we have investigated in this study, the Genisphere dendrimer method performed at least as well as the other two and often slightly better as it pertains to reproducibility and detection of expression, but not as well as it pertains to response to different dilutions. Using dendrimers to label cDNA for microarrays was first described as a means of amplifying the fluorescent signal by Stears et al. (9), who concluded that the dendrimer method has many desirable properties. In that paper they compared the dendrimer labeling with direct labeling, as we have done. In their comparison of methods they focused on two main issues: the amount of RNA required to obtain comparable signal strengths, and the levels of background as a function of the amount of RNA. They found that the dendrimer method gave a signal strength with 2.5 μg that was comparable to the direct method when 40 μg were used. With the direct labeling method they showed that the background signal increased significantly with amount of total RNA, whereas with the dendrimer method the background signal remained nearly constant. They did not, however, compare the methods in terms of the signal to noise or signal variation. We have done this, and further investigated how such comparisons depend on the intensity level of the signal. In Ref. 9 there is also a claim that with the dendrimer method signal strength was proportional to the amount of RNA probe (this was apparently done using four hybridizations with total RNA amounts ranging from 1 to 20 μg, see figure 2A of Ref. 9), albeit some signal saturation was observed for the most highly expressed genes. We carried out a first pass comparison of responses to different total RNA amounts between the three labeling methods in our dilution series study. We, too, observed some saturation in the dendrimer method (in our case at the highest dilution point), but from our results we also observed that in general the dendrimer method did not seem to perform as well as the others in terms of response to different dilutions, with compression at both ends of the spectrum.
The results from our two studies do not contradict each other, as these studies examine different aspects of the performance of the three labeling methods providing a more rounded view of the latter. On one hand, from our replicate study, the dendrimer method appeared to perform well in terms of ability to detect expression, in terms of S/N, and in terms of reproducibility of log ratios in self-to-self comparisons, which is concordant with some of the results of Ref. 13. On the other hand, the compression observed in our dilution series study at both ends of the spectrum in the dendrimer data raises some concerns.
The dendrimer method has the advantage of requiring less starting material than the other two methods. When that is a limiting factor and when the study of interest focuses on screening for genes that are expressed in a given sample, our analysis supports its use (at least with an experimental design that utilizes this sample in self-to-self hybridizations to find A values above a certain threshold). Otherwise, for the moment we will continue our use of the indirect method, which over all the parameters we have investigated in our two studies performed relatively well (and was consistently not the worst).
We realize that our dilution series study was a relatively small one (15 hybridizations in total) with no replicates, and replicates are always desirable. We have also just learned that Genisphere is now releasing a new set of protocol recommendations. The latter might show a different performance in terms of response to different dilutions. Moreover, other types of studies could be carried out to investigate this issue. For example, instead of varying the total amount of RNA used in one channel, this could be left constant and the amount of labeled RNA for this channel could be varied, or spiking studies could be used. Response to different dilutions is only one aspect of predictive ability. Ultimately it is the ability to identify real differences in gene expression levels that is desired, so an interesting direction for future work is to carefully design and carry out an appropriate set of experiments to investigate how the three methods compare in a scenario where the Cy5-to-Cy3 ratios might vary from gene to gene (the availability of our replicate study data should provide a better understanding of microarray data under the null hypothesis of no change). Thus, for a fuller investigation of predictive ability, a combination of different types of studies, each involving replicate hybridizations, is called for and is a direction for future work.
We thank Shannon McWeeney, Angel Pizarro, and Phillip Phuc Le for help with the loading of the labeling study data into RAD and with the web interfaces. Angel Pizarro is also thanked for his help in submitting the data to ArrayExpress. We also thank Deborah Pinney, Joan Mazzarelli, and the other members of the EPConDB team for helpful discussions.
We gratefully acknowledge support through National Institute of Diabetes and Digestive and Kidney Diseases Grant DK-56947 (to K. H. Kaestner). At the time this paper was written, G. R. Grant was under the support of National Institutes of Health (NIH) Award K25-HG-00052 and E. Manduchi was under the support of NIH Award K25-HG-02296.
↵1 Supplementary material (all additional figures relative to this work) is available at http://www.cbil.upenn.edu/EPConDB/labeling_method_comparisons. Moreover, any future feedback or possible errata will be posted at this web site.
Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).
Address for reprint requests and other correspondence: E. Manduchi, Center for Bioinformatics, Univ. of Pennsylvania, 1428 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104-6021 (E-mail:).
- Copyright © 2002 the American Physiological Society