## Abstract

Over the last few years, there has been a dramatic increase in the use of cDNA microarrays to monitor gene expression changes in biological systems. Data from these experiments are usually transformed into expression ratios between experimental samples and a common reference sample for subsequent data analysis. The accuracy of this critical transformation depends on two major parameters: the signal intensities and the normalization of the experiment vs. reference signal intensities. Here we describe and validate a new model for microarray signal intensity that has one multiplicative variation and one additive background variation. Using replicative experiments and simulated data, we found that the signal intensity is the most critical parameter that influences the performance of normalization, accuracy of ratio estimates, reproducibility, specificity, and sensitivity of microarray experiments. Therefore, we developed a statistical procedure to flag spots with weak signal intensity based on the standard deviation (δ* _{ij}*) of background differences between a spot and the neighboring spots, i.e., a spot is considered as too weak if the signal is weaker than

*c*δ

*. Our studies suggest that normalization and ratio estimates were unacceptable when this threshold (*

_{ij}*c*) is small. We further showed that when a reasonable compromise of

*c*(

*c*= 6) is applied, normalization using trimmed mean of log ratios performed slightly better than global intensity and mean of ratios. These studies suggest that decreasing the background noise is critical to improve the quality of microarray experiments.

- microarray
- gene expression
- statistics
- normalization
- functional genomics

the tremendous advance of the human genome project and development of new high-throughput technologies has created unparalleled opportunities to study the mechanism of disease, monitor disease progression, and evaluate effective therapies. As more and more genes are being identified, it has become extremely important to understand the function of these genes and their pathways. Global gene expression analysis is a critical component of this ambitious endeavor. Microarray technologies offer investigators an opportunity to simultaneously monitor the expression of a large number of genes in the context of their biological system.

For this study, we concentrated on microarray-based studies monitoring RNA expression levels using cDNA microarrays printed on glass microscope slides. Brown and coworkers (2, 3, 5, 11, 12) developed the protocols widely used to do this type of assay. The basic strategy for this type of analysis is to isolate RNA from two sources, a reference and an experimental sample. The RNA samples are converted to cDNA and labeled with a fluorophore, typically Cy3 or Cy5. The two probes are combined and hybridized to the microarray. Following the hybridization and washes, the array is scanned at both 532 nm (Cy3) and 635 nm (Cy5) to detect the labeled cDNA that has hybridized to the array. The two images produced from the scanner are combined, and the data for each spot (gene) is collected (along with background and error measurements). The data used for subsequent analysis is typically expressed in the form of a ratio of experimental expression to reference expression. The hybridizations are repeated multiple times to ensure reproducibility and confidence in the measurement. Once the data from several hybridizations are available, a variety of clustering and statistical methods can be used to determine which genes to study further (6, 7, 13).

One of the most critical steps in microarray experiments is to accurately assess the expression ratios between the sample and reference in a dual color experiment because most subsequent analyses of the data, such as cluster analysis (6), depend on the accuracy of these ratios. Even statistical decision rules such as shrinkage estimation (9) or hypothesis testing on yes/no types of expression differences depend on these original ratios. The ratio estimation consists of two major steps: *1*) to eliminate elements with weak expression that are statistically too close to the background to avoid the detrimental effect in the ratio and *2*) to adjust the expression for each gene by the overall expression. The ratio estimate follows by simply taking the ratio of the two adjusted expressions or the difference of the log transformation of the expressions. We used experimental replicates to determine the appropriate rules for each of the two steps.

## MATERIALS AND METHODS

### cDNA Clones and PCR Products

Each microarray used in this study contained 11,520 mouse cDNA clones. This set of clones was derived from a normalized nonobese diabetic (NOD) spleen cDNA library created in our laboratory by the subtraction of NOD spleen cDNA from B6 spleen cDNA using a published PCR strategy and a kit from Clontech (4). To produce microarrays, cDNA clone inserts were amplified directly from bacterial culture in 96-well plates using PCR primer pairs that anneal to the vector adjacent to the cloning site for these inserts. PCR products were purified by ethanol precipitation. The target DNA was then resuspended in 3× SSC and stored at −20°C until needed for printing.

### Microarray Printing

The glass slides used in the production of the arrays were coated in-house with poly-l-lysine using a previously described protocol (10). The microarrays were printed using a GMS 417 arrayer (Affymetrix, Santa Clara, CA) and subsequently processed using previously described protocols (10). The details of these protocols can be found at **http://www.genomics.ufl.edu/microarray**. The microarrays can then be stored for several months in a dehumidified chamber.

### RNA/Probe Preparation, Hybridization, and Data Capture

Total RNA was isolated from the spleen of NOD mice at 4 wk, 5 wk, and 20 wk of age using the Qiagen RNA extraction kits per the manufacturer’s instructions. The reference RNA was a pool of total RNA from 4-wk-old spleens from 10 NOD and 10 C57BL/6 animals. In our current assays, we can obtain excellent hybridization signals with 10–15 μg of total RNA. The hybridization probes are produced using an amino-allyl labeling strategy. This labeling strategy incorporates an amino-allyl modified dUTP [5-(3-aminoallyl)-2′-deoxyuridine 5′-triphosphate (aadUTP), from Sigma] during cDNA synthesis. Following the synthesis, a monofunctional NHS-ester Cye dye (either Cy5 or Cy3) is coupled to the modified dUTP in a sodium bicarbonate buffer. This labeling procedure circumvents the problem of a low incorporation rate with either the Cy3- or Cy5-coupled dUTP. The aadUTP will incorporate into the growing cDNA strands at rates similar to the unmodified nucleotide. After coupling the Cye dye to the aadUTP, the unincorporated dyes are removed using a Qia-Quick PCR purification kit (Qiagen). Once the reference and experimental probes are created, the hybridization is accomplished under a coverslip in a specially designed hybridization chamber (Array-It). The hybridization of the probes to the array is allowed to proceed for 16 h. Once the hybridizations were complete, detection of the hybridized probes to the array is accomplished using the GMS 418 Scanner (Affymetrix, Santa Clara, CA). The image data was captured by scanning the slide twice, the first time at 532 nm (Cy3-labeled probes) and the second time at 635 nm (Cy5-labeled probes). This process produces two 16-bit TIFF files, one for each wavelength. These images were then merged and analyzed using Scanalyze by Michael Eisen (6).

Four experiments (*exp 1*, *2*, *3*, and *4*) were carried out for this study. The first experiment consists of three independent reference/reference hybridizations (*RR1*, *RR2*, and *RR3*) using the reference RNA (a pool of total spleen RNA samples from ten 4-wk-old NOD and ten 4-wk-old B6 mice) labeled independently with both Cy3 and Cy5. Since the same RNA was labeled with both fluorophores, the expected ratio is one for every target gene on the slide. *Experiments 2*, *3*, and *4* were hybridizations using the same reference sample against spleen RNA from an NOD mouse at 4, 5, and 20 wk (diabetic) of age, respectively. Each experiment consists of three replicates.

### Statistical Model, Estimation Theory, and Justification

Data output from a dual color microarray experiment contains the average fluorescence intensities for the target and background intensities for both the reference and experimental samples. On each microarray, let 1represent the target and background fluorescence intensities for sample (color) *i*, (*i *= 1,2), at spot *j* (*j* = 1, 2,…, *n*), where *n* is the number of spots in the microarray.

The observed fluorescent intensities for each spot *j *is determined primarily by two parameters: the amount of probes available for hybridization and the amount of target DNA on the microarray (*d _{j}* for spot

*j*). In dual color microarray experiments, the amount of target DNA for a given spot is identical for the reference and experimental sample. The amount of probe is proportional to the mRNA expression levels (the fraction of mRNA for gene

*j*among the total mRNA for all expressed genes) in the reference and experimental samples (θ

_{1j}and θ

_{2j}for gene

*j*) and the total amount of mRNA used for labeling in the corresponding samples (

*a*

_{1}and

*a*

_{2}, respectively). The amount of probe available for hybridization to spot

*j*should be

*a*

_{1}θ

_{1j}and

*a*

_{2}θ

_{2j}, and the fluorescence intensities at spot

*j*should be

*a*

_{1}

*d*θ

_{j}_{1j}and

*a*

_{2}

*d*θ

_{j}_{2j}after hybridization. Both

*a*

_{1}

*d*θ

_{j}_{1j}and

*a*

_{2}

*d*θ

_{j}_{2j}are subject to random error due to variation in measuring mRNA concentration (measurement error and purity), efficiency of probe preparation (labeling and purification), and hybridization efficiency. It is reasonable to assume that the variation (noise) is proportional to the true values, i.e., the fluorescence after hybridization can be represented as

*a*θ

_{i}d_{j}*× exp(ε*

_{ij}*) where ε*

_{ij}*is a random variable with an ideal value of 0. A proportional constant may be added to the expression*

_{ij}*a*θ

_{i}d_{j}*× exp(ε*

_{ij}*); however, it is not necessary, because we are only interested in the ratio of expressions. A proportional constant would disappear in the ratio. We let ω*

_{ij}*=*

_{ij}*a*θ

_{i}d_{j}*as the ideal expression level for gene*

_{ij}*j*in color

*i*.

There is another type of noise from the fluorescence of the background due to the experimental conditions of the hybridization. The magnitude of this variation (noise) is variable across the glass slide. We assume that this noise is additive. Combined with the multiplicative variation (ε* _{ij}*), the observed target intensity for sample

*i*at spot

*j*is 2where

*b*

^{′}

*represents the additive background intensity inside the spots. This inside background intensity cannot be measured, although the background noise*

_{ij}*b*surrounding the spots can be measured. The variables

_{ij}*b*

^{′}

*and*

_{ij}*b*are not identical, but we assume they have the same distribution.

_{ij}The true sample to reference expression ratio (*r _{j}*) and its log transformation (λ

*) are defined as 3Since the observed values (*

_{j}*t*and

_{ij}*b*) are not sufficient to estimate

_{ij}*a*

_{1},

*d*, θ

_{j}*, ε*

_{ij}*, and*

_{ij}*b*

^{′}

*, we will make the following assumptions on the distribution of*

_{ij}*b*

^{′}

*−*

_{ij}*b*and ε

_{ij}_{1j}− ε

_{2j}.

#### Assumption 1.

The background noise is additive as defined in *Eq. 2*. Moreover, the differences of *b*^{′}* _{ij}* and

*b*(

_{ij}*b*

^{′}

*−*

_{ij}*b*) are independent random variables with 0 mean and variance δ

_{ij}_{ij}

^{2}, where δ

_{ij}

^{2}is almost a constant among neighboring spots. In other words, the large variation in background noise can be stabilized by taking the difference of adjacent spots.

#### Assumption 2.

The differences ε_{1j} − ε_{2j} are independent and identically distributed normal random variables for all spots *j*.

We will adjust the observed intensities by subtracting the background measurement as 4A spot will be eliminated from further analysis if *t*^{′}* _{ij}* is small. The appropriate threshold for elimination depends on the background noise. When the background standard deviation (δ

*) defined in*

_{ij}*assumption 1*is estimated as δ̂

*, a spot is discarded if 5where*

_{ij}*c*is a user-defined constant that will be determined later. It turns out, quite intuitively, that

*c*cannot be too small. When

*c*is large, the background noise contributes very little to the observed signal in

*Eq. 4*, and then if we take the logarithm on

*Eq. 4*, it becomes 6where γ

_{ij}= ln[1 + (

*b*

^{′}

*−*

_{ij}*b*)/(

_{ij}*a*θ

_{i}d_{j}

_{ij}e^{εij})] ≈ (

*b*

^{′}

*−*

_{ij}*b*)/(

_{ij}*a*θ

_{i}d_{j}

_{ij}e^{εij}) is approximately a small zero mean random variable and becomes negligible if the signal is strong. Therefore, the observed log ratio (defined as

*y*) can be represented as 7 where

_{j}*ρ*= ln(

*a*

_{2}/

*a*

_{1}) and is the normalization constant, λ

*= ln(*

_{j}*r*), and τ

_{j}*represents the last four random terms. The value*

_{j}*ρ*(the normalization constant) is used to compensate for the fact that each color is scanned independently under a different laser power and gain for each wavelength in a two-color microarray experiment. The true ratio

*r*(or log ratio λ

_{j}*) is not estimable without first determining this constant. However, to estimate the normalization constant*

_{j}*ρ*, we face the confounding problem of

*ρ*and the mean value of λ

*. Actually the mean of λ*

_{j}*is inseparable from*

_{j}*ρ*because an increase in

*a*

_{2}is equivalent to the increase in all

*r*. Using mathematical notation, the confounding effect can be expressed as

_{j}*y*= (

_{j}*ρ*+ μ) + (λ

*− μ) + τ*

_{j}*, i.e., we can claim the log ratio to be (λ*

_{j}*− μ) for any μ without affecting the validity of the model. If we are interested in the relative expression level of genes within the same array, then the constant μ plays no role. But, if we are interested in the expression of the same gene in different arrays, then the values of μ affect the level of their expressions when compared. Normalization could be used to put the reference mRNA expression in two different arrays on an equal footing. For example, we may adjust the expressions so that the total expression of the target and reference are the same. This is reasonable for normalization; however, it is not compatible with the commonly used logarithmic transformation for the ratio. In addition, because the majority of the genes have weak expression, they affect the accuracy of the total estimation whether or not they are eliminated. When the logarithmic transformation is taken for the ratio, we may use the model in*

_{j}*Eq. 7*and assume that ∑λ

*= 0 as in Kerr et al. (8). This assumption is convenient for the usual analysis of variance model setting,*

_{j}*Eq. 7*, but not so easy to interpret from biological viewpoint. Another assumption would be that ”most“ of the genes (so-called housekeeping genes) have the same expression. In other words, ”most“ of the λ

*are 0. We will define ”most“ later and will see that each of the assumptions will lead to a different method to estimate the*

_{j}*ρ*. We will compare them. All the assumptions are equivalent to setting μ = 0 in the expression

*y*= (ρ + μ) + (λ

_{j}*− μ) + τ*

_{j}*. Once*

_{j}*ρ*can be estimated, the estimate of the log ratio (λ

*) should be 8where*

_{j}*ρܐ*is the estimate of

*ρ*.

We have not found a method that can estimate the variance of τ* _{j}* in a single experiment, because the variances of λ

*and τ*

_{j}*are confounded. However, if the hybridization is repeated, then we can find the variance of τ*

_{j}*and consequently a confidence bound of λ*

_{j}*. Let the second experiment be represented in a similar notation to*

_{j}*Eq. 7*as 9The difference between

*Eq. 7*and

*Eq. 9*produces 10Therefore, the variance of τ

_{j}− τ

^{′}

_{j}can be estimated by the sample variance of Δ

*y*. We will denote this as

_{j}*s*

^{2}. If we let the variances of τ

*and τ*

_{j}^{′}

*be ς*

_{j}_{τ}

^{2}and ς

_{τ′}

^{2}, then

*s*

^{2}estimates ς

_{τ}

^{2}and ς

_{τ′}

^{2}. The estimate for the log ratio (λ

*) based on the two experiments should be 11where λ̂*

_{j}^{′}

_{j}is the λ̂

*in*

_{j}*Eq. 8*for the second experiment. Moreover, a 1 − α confidence interval for the true log ratio λ

*is 12where*

_{j}*z*

_{α/2}is the upper α/2 quantile from a standard normal distribution. It can be easily shown that

*Eq. 12*does not need to assume that the variances of τ

*and τ*

_{j}^{′}

*are equal. Extension to more than two replicates can be worked out; however, we will omit the details here.*

_{j}## RESULTS

### Validation of the Model

#### Validation 1a: The background noise is additive.

If the background noise is additive, then the distribution of the difference between signal and background (*t _{ij}* −

*b*) for very weak genes (

_{ij}*t*is close to

_{ij}*b*) should be the same as

_{ij}*b*

^{′}

_{ij}_{−}

*b*, where

_{ij}*j*′ is an adjacent spot. We define a target signal as very weak if

*t*−

_{ij}*b*is smaller than 0.5 standard deviation of

_{ij}*b*. Comparisons of the standard deviation and the quarter range of

_{ij}*t*−

_{ij}*b*for very weak spots and

_{ij}*b*

^{′}

*−*

_{ij}*b*revealed that the two distributions match very well (Table 1), except for a small increase (with respect to std) in the mean of S vs. B (Table 1). This is not unexpected, because some of the weak spots may still contain a small amount of true signal.

_{ij}#### Validation 1b: The difference between two neighboring background values (Δb_{ij}) is nearly normally distributed.

Figure 1 shows the histograms of the difference in background values between adjacent spots (Δ*b _{ij}* =

*b*

_{2j}−

*b*

_{2(j+1)}) from the replicates in

*experiment 1*. These histograms illustrate that Δ

*b*tends to have a bell-shape histogram with a heavier concentration at the center than the normal distribution. Comparison of the standard deviations of

_{ij}*b*and Δ

_{ij}*b*revealed a great reduction of variation for Δ

_{ij}*b*compared with the variation for

_{ij}*b*(Table 2). The correlation between |Δ

_{ij}*b*| and the average background noise (

_{ij}*b*+

_{ij}*b*

_{i}_{(j+1)})/2 (last column in Table 2) assesses whether the local variation of

*b*(measured by |Δ

_{ij}*b*|) is affected by the magnitude of the background noise [measured by (

_{ij}*b*+

_{ij}*b*

_{i}_{(j+1)})/2]. The small correlation indicates that the variation of Δ

*b*is not related to the uneven spread of the background.

_{ij}The reduction of variation for Δ*b _{ij}* compared with the variation for

*b*is also illustrated by the distribution of

_{ij}*b*(Fig. 2

_{ij}*A*) and Δ

*b*(Fig. 2

_{ij}*B*) associated with each spot. As can be seen, the distribution of Δ

*b*(Fig. 2

_{ij}*B*) is much more uniform than the original background (

*b*) (Fig. 2

_{ij}*A*). Although our data showed that assuming Δ

*b*as independent and identically distributed was acceptable, a few outliers of Δ

_{ij}*b*still exist in microarrays containing a large number of spots (Fig. 2

_{ij}*B*). Therefore, we estimated the background noise standard deviation δ

*for each spot locally. We tried several spot matrix to define “local” for the δ*

_{ij}*estimation and found that using the eight nearest neighbors gives the best estimates δ*

_{ij}*.*

_{ij}#### Validation 2: The multiplicative variation (ε_{1j} − ε_{2j}) is independent of the magnitude of the signal and normally distributed.

From *assumption 2*, the log ratio estimates λ̂* _{j}* in

*Eq. 8*should be normally distributed with 0 mean for two identical RNA samples, in which all log ratios (λ

*) are 0. This expectation is tested using the reference vs. reference hybridizations in*

_{j}*experiment 1*. As the assumption predicted, the histograms for the three replicates are all very close to a normal curve (Fig. 3). To evaluate that “(ε

_{1j}− ε

_{2j})” is independent of spot

*j*as predicted in

*assumption 2*, we used a linear regression analysis to assess the relationship between the absolute mean value of three log ratio estimates and their variance. We expect the variance to be independent of the mean. Indeed, the

*R*

^{2}values from the regression analysis are close to zero (

*R*

^{2}= 0.136, 0.014, 0.00, and 0.013 for

*experiments 1*,

*2*,

*3*, and

*4*, respectively). The slightly positive value for

*experiment 1*(Ref vs. Ref) is not surprising given that all true log ratios for this experiment are zero. Any large value in the three ratios will produce both a large mean and a large variance and consequently a positive correlation. However, the

*R*

^{2}is not large enough to warrant a modification of the identical distribution assumption.

### Applications of the Model

With the validated assumptions, one can use the model for a number of purposes. We have developed a statistically based procedure to flag weak spots before further analyses and to estimate the log ratios and their accuracy.

#### Flagging of weak spots.

Intuitively, the quality of the spots on microarrays is an important parameter that determines the accuracy and reproducibility of microarray data. As discussed in *validation 1*, the quality of the spots can be evaluated by comparing the background adjusted signal intensity (*t*^{′}* _{ij}*) and the standard deviation (δ

*) of the local background difference (Δ*

_{ij}*b*). If the background-adjusted intensity

_{ij}*t*

^{′}

*is smaller than a certain threshold (*

_{ij}*c*) of δ

*, the spot is then flagged. We have found that the threshold (*

_{ij}*c*) is the most important parameter for normalization and accuracy of the microarray data.

#### Normalization.

As we explained, after *Eq. 7* that there are various ways to do the normalization. The method using the two-color total expressions to adjust each gene expression is equivalent to 13where the sum can be all the spots or only the spots surviving an initial elimination based on background. Under the assumption that “most” genes have the same expression, if we define “most” of the genes as more than 50%, then we should estimate ρ using a 25% trimmed mean method, i.e., to trim the top and bottom 25% of *y _{j}* before taking the average. If we do not want to assume 50%, then we may use the mode to estimate ρ. Obviously, under the assumption ∑λ

*= 0, ρ should be estimated by the average of all the*

_{j}*y*or the 0% trimmed mean. Another often-used method which uses the mean of the unadjusted ratios is also considered. The estimation of ρ in this case is −ln[∑(

_{j}*t*

^{′}

_{2j}/

*t*

^{′}

_{1j})/

*n*], where

*n*is the sample size, and this is referred to as the mean of ratio method. The mode estimation was based on an Splus subroutine with bandwidth equal to 1.57 × (

*q*

_{0.75}−

*q*

_{0.25})/

*n*, where

*q*is the αth quantile of data with size

_{α}*n*. We have evaluated the consistency of five different normalization methods using the average standard deviation of the log ratio estimates (

*Eq. 8*) from the three replicates (Table 3). The smaller the standard deviation, the more consistent the method. Since the estimates in general depend on how weaker signals are eliminated, three thresholds,

*c*= 0,

*c*= 3, and

*c*= 6 in

*Eq. 5*were evaluated. As shown in Table 3, little difference was observed between the different normalization methods. However, Table 3 demonstrates a flagging threshold of six (

*c*= 6) performs better than a threshold of three (

*c*= 3), and

*c*= 0 is much worse than the other two thresholds in terms of consistency of ratio estimates. Evaluation of different thresholds (

*c*) suggested that the larger the

*c*, the more consistent the results. This is because a stronger signal has less chance of being blurred by the background noise. However, a large

*c*will leave one with a smaller number of genes for analysis. Thus there is a trade-off between consistency and possibly missing a positive. We recommend the threshold of six as a reasonable compromise between the number of remaining genes and the quality of the data.

#### Estimation of multiplicative variation τ_{j} in Eq. 7.

Note that the *s*^{2} mentioned in *Eq. 10* estimates ς_{τ}^{2} + ς_{τ′}^{2}. The two variances are inseparable. When there are three replicates, we have one *s*^{2} for each pair, and we can find ς^{2}_{τ} for each replica by solving three simultaneous equations. Table 4 shows ς_{τ} obtained for all the replicas in each experiment. The data indicate that there is considerable variation between replicates of the experiments. Note the ς_{τ} is the standard deviation of the noise ε* _{ij}* in the exponent in

*Eq. 4*. To recover the multiplicative noise, we list

*e*

^{ςτ}and a 95% confidence interval for the ratio.

In our experiments, as in most microarray experiments, the true expressions are unknown. Simulation studies can fill this void to a certain extent, because the true expressions are known. Therefore, we performed a large number of simulations using the observed expression values in the reference sample and the normal and beta distributions for the log ratios ln(*r _{j}*). These two distributions were chosen to mimic the previously observed distributions of the log ratios, i.e., a symmetric bell curve sometimes tilted toward the positive side like a beta density function (9). The background noise in the simulated data was taken from the background noise of real data. All the data presented are based on 1,000 simulations. Since the true ratios are known in the simulation study, we can evaluate the difference between the true and the estimated ratios. Table 5 gives an example of the simulation results. Due to the large simulation sample, the differences between most estimates are statistically significant. It is evident that normalization using spots with strong signal intensities (

*c*= 6) is much more consistent than normalization with all genes (

*c*= 0). Furthermore, both the 25% trimmed mean and mode of log ratios gave better estimates of the true expression levels than global intensity and mean of ratios.

#### Confidence intervals for ratio estimates.

Table 6 presents the confidence intervals (defined by *Eq. 12*) from the simulated data. A 95% confidence interval is suppose to cover the true λ* _{j}* 95% of the time. Table 6 shows that the coverage is very accurate for those spots with a strong signal (ω

_{1j}> 10δ

_{1j}). As expected, the confidence intervals for those spots with weaker signals (ω

_{1j}< 6δ

_{1j}) are not as good. This implies that genes with strong true expression level (ω

*> 10δ*

_{ij}*) are likely to be picked, and the expression ratios can be reliably estimated by the confidence interval formula in*

_{ij}*Eq. 12*.

#### Specificity and sensitivity.

One critical decision for microarray analysis is to determine whether an observed difference between two samples is statistically significant beyond experimental variation. Many investigators consider a twofold change as significant, i.e., two samples are considered to have different expression levels if the observed ratio (r̂* _{j}*) is greater than 2 or less that 0.5. Others have suggested a threefold threshold. To make this rule general, one could claim a difference in expression if 14where

*f*is the suggested threshold measured in folds. To evaluate this decision, we convert it into the usual frame for hypothesis testing 15The conventional way to evaluate this decision rule is to use the specificity (probability of accepting H

_{0}when H

_{0}is true, or equivalently 1 − α) and the sensitivity or power (1 − β, probability of accepting H

_{1}when H

_{1}is true). Determination of the specificity and sensitivity in Eq. 15 is more complicated than the usual hypothesis testing, because these depend on both the expression intensity ω

*and*

_{ij}*r*. A reasonable requirement for easily interpretable microarray results would be:

_{j}*1*) if

*r*<

*R*

_{0}, then the specificity is at least 1 − α, or the false-positive rate is at most α;

*2*) if

*r*>

*R*

_{1}and the expression level ω

*> ω*

_{ij}_{0}, then the sensitivity is at least 1 − β, where

*R*

_{0},

*R*

_{1}, ω

_{0}, α, and β are user-specified values. Based on 1,000 simulations, Table 7 presents the specificity and sensitivity data for a microarray experiment under various values for

*c*and

*f*when ς

_{τ}is set at 0.20 and 0.4, a reasonable error range from our experiments (see Table 4). Table 7 shows the trade-off between specificity and sensitivity for various thresholds of

*c*and

*f*as well as the multiplicative error ς

_{τ}. For example, if the ς

_{τ}is 0.2 and we want a specificity of 0.99 for

*r*< 1.25 and a sensitivity > 0.85 when

*r*> 3.5 for a strong signal (ω

*> 10δ*

_{ij}*), then we should let*

_{ij}*c*= 6 and

*f*= 2.5.

## DISCUSSION

In this study, we have developed a model that fits the observed signal intensities for a two-color microarray experiment. The observed signal (*t _{ij}*) is composed of the true expression level with an additive noise due to background (

*b*) and a multiplicative noise (ε

_{ij}*) due to experimental variation from the probe preparation and hybridization efficiency. Similar models have been developed by other investigators. The assumption of multiplicative variation was first proposed by Chen et al. (1). In their model, the likelihood function was based on the null hypothesis that all genes have the same expression. Therefore, the application of their model is limited to known house keeping genes. Kerr et al. (8) proposed an analysis of variance model that is similar to our*

_{ij}*Eq. 6*. However, they did not consider the background noise in their model. In addition, they did not extend

*Eq. 6*to

*Eq. 7*, and consequently their model requires the independence of the multiplicative noise for sample (ε

_{1j}) and reference (ε

_{2j}) as well as equal variance between experiments. The former assumption is not easy to justify because both errors are from the same spot and the latter assumption was shown not to be a good approximation to the real situation (Table 4). We believe that our model has better assumptions reflecting the true experimental conditions.

Intuitively, the accuracy of microarray data depends on the signal/background ratio of the spots. It is a customized practice to eliminate weak spots on microarrays before subsequent data analyses. However, this practice was based on the user’s discretion, and a statistically sound procedure was lacking. Discovery of the distribution of additive variation allowed us to propose a model-based method to flag spots with weak expression levels, a critical procedure that can influence all subsequent analyses of microarray data such as normalization and ratio estimates. Since the background has a local distribution, the best way to flag weak spots is to identify those spots that have an adjusted signal (*t*^{′}* _{ij}*) less than a certain threshold of standard deviation of background intensity difference between the spot and its neighboring spots (

*c*δ

*). We found that the higher the threshold, the more reliable the results. However, increased threshold will decrease the number of genes that can be evaluated. A compromise would be*

_{ij}*c*= 6. We have developed a computer program for this purpose, and it is available upon request. To provide flexibility to the users, we have used a 10-flag scheme with three different

*c*values (

*c*= 2, 4, or 6) for both the sample [channel 1 (CH1)] and reference (CH2):

*flag 0*(4δ

*> CH1 > 2δ*

_{ij}*and 4δ*

_{ij}*> CH2 > 2δ*

_{ij}*),*

_{ij}*flag 1*(CH1 < 2δ

*and 4δ*

_{ij}*> CH2 > 2δ*

_{ij}*);*

_{ij}*flag 2*(4δ

*> CH1 > 2δ*

_{ij}*and CH2 < 2δ*

_{ij}*);*

_{ij}*flag 3*(CH1 < 2δ

*and CH2 < 2δ*

_{ij}*);*

_{ij}*flag 4*(6δ

*> CH1 > 4δ*

_{ij}*and 6δ*

_{ij}*> CH2 > 4δ*

_{ij}*);*

_{ij}*flag 5*(CH1 < 4δ

*and 6δ*

_{ij}*> CH2 > 4δ*

_{ij}*);*

_{ij}*flag 6*(6δ

*> CH1 > 4δ*

_{ij}*and CH2 < 4δ*

_{ij}*);*

_{ij}*flag 7*(CH1 > 6δ

*and CH2 > 6δ*

_{ij}*);*

_{ij}*flag 8*(CH1 < 6δ

*and CH2 > 6δ*

_{ij}*);*

_{ij}*flag 9*(CH1 > 6δ

*and CH2 < 6δ*

_{ij}*).*

_{ij}Normalization is the next critical step for analyzing microarray data. Our experiments showed that the threshold for background noise (*c*) used for elimination of weak spots is the most critical parameter that influences the performance of normalization. Normalization with all weak spots (*c* = 0) gives high variation for ratio estimates, and it is not acceptable. When *c* = 6 is chosen, all normalization methods tested in this study performed well, although the trimmed mean of log ratios performed better than the global intensity and mean of ratio.

Since a large number of genes are simultaneously analyzed by microarray experiments, it is critical to assess the effects of multiple decisions. Using simulations based on parameters derived from experimental data, we have assessed the specificity and sensitivity of detecting expression differences depending on several variables including *c*, ς_{τ}, *f*, and *r* (Table 7). However, the reproducibility is not solely controlled by these parameters. It also depends on the number of analyzed genes and the distributions of the expression ratios (*r _{j}*) (or number of genes with different expression vs. number of genes with the same expression) and the true expression levels (ω

*). For example, assuming that the ratio is 1 for 99% of the genes (*

_{ij}*r*= 1) and the remaining 1% of genes with strong signal expression (

*ω*> 10δ

_{ij}*) have a ratio of 3.0, we will observe a number of genes with different expression levels using an elimination threshold of six (*

_{ij}*c*= 6) and cutoff of 2.0 (

*f*= 2.0). Using the numbers in

*row 8*of Table 7, the chance that one of the chosen genes really has a different expression is, by the usual Bayes’ formula where 0.01 and 0.99 are the prior probabilities for a gene with

*r*= 0 and

*r*= 3 (respectively), and 0.92 and 0.02 are the probabilities (from the table) that they will be claimed as genes with different expressions. This suggests that most of the claimed differences would be false alarms. This is because the number of false-positive genes is large in proportion to the true positive even though the false-positive rate is low (2%).

The false-positive rate can be dramatically reduced when the experiment is duplicated and only those genes that have different expressions in both hybridizations are considered. The *p* then becomes This is clearly a tremendous gain in confidence. If an experiment is repeated three times and only the genes that agreed all three times are considered, then the chance of making a wrong decision is almost zero. Of course, selecting genes with consistent results in two or multiple experiments decreases the power of detecting a gene with true expression difference. But in most practical situations, it is preferable to select a smaller number of genes with high confidence than a larger number of genes with a high false-positive rate.

In summary, we have developed a statistical model that allows investigators to improve the normalization, accuracy of ratio estimates, and evaluation of the experimental variation, as well as sensitivity and specificity of their microarray experiment.

## Acknowledgments

This work was partly supported by National Institute of Diabetes, Digestive and Kidney Diseases Grant 1R24-DK-58778, National Institute of Child Health and Development Grant 1RO1-HD-37800 and Juvenile Diabetes Research Foundation, and National Institute of Allergy and Infectious Diseases Grant 1P01-AI-42288.

## Footnotes

Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).

Address for reprint requests and other correspondence: J.-X. She, Dept. of Pathology, Immunology and Laboratory Medicine, Box 100275, College of Medicine, Univ. of Florida, Gainesville, FL 32610-0275 (E-mail: she{at}ufl.edu).

- Copyright © 2001 the American Physiological Society