## Abstract

Epistasis effects (gene interactions) have been increasingly recognized as important genetic factors underlying complex traits. The existence of a large number of single nucleotide polymorphisms (SNPs) provides opportunities and challenges to screen DNA variations affecting complex traits using a candidate gene analysis. In this article, four types of epistasis effects of two candidate gene SNPs with Hardy-Weinberg disequilibrium (HWD) and linkage disequilibrium (LD) are considered: additive × additive, additive × dominance, dominance × additive, and dominance × dominance. The Kempthorne genetic model was chosen for its appealing genetic interpretations of the epistasis effects. The method in this study consists of extension of Kempthorne's definitions of 35 individual genetic effects to allow HWD and LD, genetic contrasts of the 35 extended individual genetic effects to define the 4 epistasis effects, and a linear model method for testing epistasis effects. Formulas to predict statistical power (as a function of contrast heritability, sample size, and type I error) and sample size (as a function of contrast heritability, type I error, and type II error) for detecting each epistasis effect were derived, and the theoretical predictions agreed well with simulation studies. The accuracy in estimating each epistasis effect and rates of false positives in the absence of all or three epistasis effects were evaluated using simulations. The method for epistasis testing can be a useful tool to understand the exact mode of epistasis, to assemble genome-wide SNPs into an epistasis network, and to assemble all SNP effects affecting a phenotype using pairwise epistasis tests.

- single nucleotide polymorphism
- quantitative trait
- linkage disequilibrium
- Hardy-Weinberg disequilibrium

the significance of epistasis in complex traits has been well recognized (3, 12, 16, 17). It has been hypothesized that epistasis is ubiquitous in determining susceptibility to common human diseases. Observations supporting this hypothesis include deviations from Mendelian ratios, molecular interactions in gene regulation and biochemical and metabolic systems, nonrepeatable positive effects of single polymorphisms, and gene interaction effects commonly found when properly investigated (16). Large epistasis networks found in yeast (19, 21) also point to the ubiquitous nature of epistasis. The presence of a large number of single nucleotide polymorphisms (SNPs) provides opportunities and challenges to identify DNA variations associated with complex traits using SNPs as candidate genes. A number of studies on SNP-disease association have been reported (2, 7, 8, 20, 24), but most current candidate gene studies focus on single gene effects. Ongoing studies of geneotyping up to 500,000 SNPs on 2,000 individuals have been reported (1). Such large-scale SNP studies would be capable of detecting certain epistasis effects.^{1}

For two bi-allelic genes unaffected by the environment, a total of 512 disease models are possible, but the number of nonredundant models is in the range of 50–102 (12). For two bi-allelic genes such as two SNP loci with quantitative measures, epistasis effects are typically modeled by a linear partition of the nine possible genotypic values in quantitative genetics. Fisher (6) partitioned the nine genotypic values into single gene effects (additive and dominance effects) and epistasis effects assuming Hardy-Weinberg equilibrium (HWE) and linkage equilibrium (LE). Also assuming HWE and LE, Cockerham (5) and Kempthorne (10) partitioned Fisher's epistasis effect into four components, additive × additive, additive × dominance, dominance × additive, and dominance × dominance epistasis effect with the genetic interpretation of allele × allele, allele × genotype, genotype × allele, and genotype × genotype interactions, respectively. This partitioning can be used as a tool for identifying the exact mode of a gene interaction effect. Although these four types of epistasis effects could be further partitioned, such a more detailed partitioning may not have practical value, because nine genotypic values allow only nine independent parameters (common mean and 8 genetic effects) to be estimated and eight independent genetic effects to be tested using linear model methods. A statistical model that includes the main effect of each locus and the interaction between the two loci can be used to detect the gene interaction between the two loci, but this type of epistasis detection does not offer a mechanism to test each of the four epistasis effects under the Cockerham and Kempthorne models. Cockerham's model uses orthogonal contrasts of genotypic values, whereas Kempthorne's model uses the deviation of “genetic combination” from the sum of individual genetic factors. For example, an additive × additive effect (combination effect of 2 alleles, each from a different locus) is a deviation of the gametic mean from the two single allelic means. This is clearly a measure for the gene combination effect and hence has a more straightforward genetic interpretation of the epistasis effect. A genetic model (4) without the derivations of Cockerham and Kempthorne could also be used for testing SNP epistasis effects, but this model yields the same epistasis results as the Cockerham and Kempthorne models under the assumptions of HWE, LE, and equal allele frequencies (see Supplemental Materials; online version of article contains Supplemental Materials). With LD, an epistasis model defining each epistasis effect as a weighted average of genotypic values has been described (9). This model could easily be extended to account for HWD. Among the above epistasis models, the Kempthorne model has more straightforward genetic interpretations of epistasis effects and can be easily extended to allow HWD and LD.

In this article, we extend the Kempthorne model for genetic effects of quantitative traits to allow HWD and LD, define contrasts of the extended Kempthorne genetic effects for testing the four epistasis effects of two SNP loci, develop a linear model procedure for testing each epistasis effect, and present analytical and simulation results on statistical power and sample size required for detecting each type of epistasis effect. The accuracy in estimating each epistasis effect, the rate of false-positive results in the absence of the true epistasis effects, and the rate of potential confounding between the four types of epistasis effects are evaluated using simulations. The methods in this article should provide a useful tool for genome-wide pairwise testing of SNP epistasis where HWD and LD may exist, and for the assembly of total SNP effects or an epistasis network of a phenotype.

## MATERIALS AND METHODS

The development of SNP epistasis testing methods in this article consists of three steps: extend the Kempthorne model for genetic effects to allow HWD and LD, construct genetic and statistical contrast for testing each epistasis effect, and evaluate the performance of the resulting method for testing epistasis with HWD and LD. Two bi-allelic linked loci, *locus 1* with *A* and *a* alleles and *locus 2* with *B* and *b*, are assumed to affect a complex trait of quantitative nature. Each locus is allowed to have HWD. For these two bi-allelic loci, nine genotypes are possible. Let *g*_{ijkl} and *p*_{ijkl} denote the genotypic value and genotypic probability of individuals possessing genotype *ij* at *locus 1* and *kl* at *locus 2*, where *i* and *j* denote alleles *A* and *a* of *locus 1*, and *k* and *l* denote alleles *B* and *b* of *locus 2* (Table 1). Under the Kempthorne model, each genetic effect is defined as a deviation of genetic combination effect from the sum of individual genetic factors in the genetic combination. Such definitions of gene effects require the calculation of 27 means of genotypic values: 1 population mean and 26 marginal means, which include 4 allelic means, 6 single locus genotypic means, 4 gametic means, 6 allele-genotype means, and 6 genotype-allele means.

To extend the Kempthorne definitions of genetic effects to allow HWD and LD, the calculation of the population mean requires the use of the observed genotypic frequencies (*p*_{ijkl}), and the calculations of the 26 marginal means require the use of a series of conditional probabilities based on the observed genotypic frequencies rather than using allele frequencies under HWE and LE, as shown in the Supplemental Material. Let μ = the mean genotypic value in the population, μ_{i} = the marginal mean of genotypic values for individuals with allele *i* (*i* = *A*, *a*), μ_{k} = the marginal mean of genotypic values for individuals with allele *k* (*k* = *B*, *b*), μ_{ij} = the marginal mean of genotypic values for individuals with genotype *ij* at the first locus (*ij* = *AA*, *Aa*, *aa*), μ_{ik} = the marginal mean of genotypic values for individuals with allele *i* at *locus 1* and allele *k* at *locus 2* (*i* = *A*, *a*; *k* = *B*, *b*), μ_{ikl} = the marginal mean of genotypic values for individuals with allele *i* at *locus 1* and genotype *kl* at *locus 2* (*i* = *A*, *a*; *kk* = *BB*, *Bb*, *bb*), and μ_{ijk} = the marginal mean of genotypic values for individuals with genotype *ij* at *locus 1* and allele *k* at *locus 2* (*ij* = *AA*, *Aa*, *aa*; *k* = *B*, *b*). With the use of 27 means, typical expressions of 35 possible individual genetic effects of two bi-allelic loci using Kempthorne's definitions can be expressed as follows (1.1) (1.2) (1.3) (1.4) (1.5) (1.6) (1.7) (1.8) where *a*_{i} = additive effect of allele *i* at *locus 1* (*i* = *A, a*), *a*_{k} = additive effect of allele *k* at *locus 2* (*k* = *B, b*), *d*_{ij} = dominance effect of genotype *ij* at *locus 1*, *d*_{kl} = dominance effect of genotype *kl* at *locus 2*, (*aa*)_{ik} = additive × additive epistasis effect of genotypes with alleles *ik*, (*ad*)_{ikl} = additive × dominance epistasis effect of genotypes with alleles *ikl*, (*da*)_{ijk} = dominance × additive epistasis effect of genotypes with alleles *ijk*, and (*dd*)_{ijkl} = dominance × dominance epistasis effect of *ijkl* genotype. The mathematical expressions of genetic effects of *Eqs. 1.1*–*1.8* do not change with or without HWD and LD. The only change required to allow HWD and LD is in the calculations of the 27 means. With the definitions of *Eqs. 1.1*–*1.8*, the Kempthorne model of a genotypic value for two loci is (2) With the extended Kempthorne genetic effects, genetic contrasts for testing the eight individual genetic effects of two loci allowing HWD and LD can be defined. The resulting contrasts for epistasis testing will be evaluated for its performance, including statistical power, rate of false positives, and accuracy in estimating each epistasis effect. Analytical formulas will be derived to predict the statistical power and sample size requirement for detecting epistasis effects, and the theoretical results will be compared with simulation results. Two types of false-positive results will be considered, namely false positives in the absence of any epistasis effects and false positives in the presence of one type of epistasis effect. The former is to evaluate whether the tests have more false positives than expected for random reasons, and the latter is to evaluate whether any epistasis effect has more false positives due to confounding with other epistasis effects. The rates of these two types of false-positive results will be evaluated using simulations. The accuracy in estimating each epistasis effect will be measured by the mean square error (MSE) from simulation results, where MSE is the sum of the variance and squared bias of the epistasis estimates.

## RESULTS

#### Contrasts of gene effects allowing HWD and LD.

On the basis of the definitions of genetic effects by *Eqs. 1.1*∼*1.8* with the extension to allow HWD and LD, eight genetic contrasts can be defined to test for four single gene effects and four epistasis effects as follows (3.1) (3.2) (3.3) (3.4) (3.5) (3.6) (3.7) (3.8) where α_{i} = additive effect (gene substitution effect) of locus *i* (*i* = 1,2), *d*_{i} = dominance effect of locus *i* (*i* = 1,2), *i*_{aa} = additive × additive effect, *i*_{ad} = additive × dominance effect, *i*_{da} = dominance × additive effect, *i*_{dd} = dominance × dominance effect, **k**_{j} = contrast vector of **β** (*j* = 2,…9), **β** = column vector of the population mean and the 35 genetic effects represented by *Eqs. 1.1*–*1.8*, **g** = (*g*_{AABB}, *g*_{AABb}, *g*_{AAbb}, *g*_{AaBB}, *g*_{AaBb}, *g*_{Aabb}, *g*_{aaBB}, *g*_{aaBb}, *g*_{aabb})′, and **s**_{i} is a function of conditional probabilities (Supplemental Material). Under the null hypothesis of no gene combination effect, each of the four epistasis contrasts of *Eqs. 3.5*–*3.8* is expected to be zero. *Eqs. 3.1*–*3.8* are orthogonal comparisons of the 35 individual genetic effects represented by *Eqs. 1.1*–*1.8*. Therefore, the test for each genetic effect is independent of the other genetic effects.

#### Linear model method for epistasis testing.

To test the significance of each genetic contrast, a statistical contrast needs to be used in place of the genetic contrast. Each statistical contrast is obtained by replacing **g** with the estimate of **g** (**ĝ**) using the following procedure. A phenotypic value is modeled as the summation of a genotypic value (*g*_{ijkl}) of the two genes underlying the complex trait and a random residual (*e*) following *N*(0,σ) distribution, i.e., phenotypic observation *m* with *g*_{ijkl} genotypic value is modeled as *y*_{ijklm} = *g*_{ijkl} + *e*_{ijklm}, or **y** = **Xg** + **e** in matrix notations, where **y** is the column vector of phenotypic values, **X** is the design matrix, and **e** is the vector of random residuals. Note that the above linear model can be extended to include nongenetic effects such as age and gender so that the genetic effects can be tested without the influence of those nongenetic effects. This flexibility of accounting for nongenetic effects is an advantage of linear model methods over frequency-based methods in genetic analysis. The normal equations in matrix notations are **X′Xg** = **X′y**, and the least squares estimate of **g** is given by **ĝ** = (**X′X**)^{−1}**X′y**. Then, from *Eqs. 3.5*–*3.8*, the four marker contrasts for testing epistasis effects can be expressed as (4) The epistasis estimator represented by *Eq. 4* is unbiased, i.e., (5) The null and alternative hypotheses for testing each epistasis effect are *H*_{x0}: *i*_{x} = 0 and *H*_{x1}: *i*_{x} ≠ 0, respectively, where *x* = *aa*, *ad*, *da*, *dd*. It is readily seen from *Eq. 5* that *H*_{x0} is equivalent to the hypothesis *L*_{x} = 0. The test statistic for testing *H*_{x0} is (6) where *s*^{2}=(**y**−**X****ĝ**)′(**y**−**X****ĝ**)/(*n*−*k*). Under *H*_{x0}, this statistic has a Student's *t*-distribution with degrees of freedom *n* − *k* (18), where *n* is the number of observations and *k* is the rank of **X**. To test the overall epistasis effects is to test the general null hypothesis that no epistasis effect exists, i.e., *H*_{0}: *i*_{aa} = 0, *i*_{ad} = 0, *i*_{da} = 0, and *i*_{dd} = 0. The alternative hypothesis is that at least one form of epistasis effect exists. It is easily seen from *Eq. 5* that *H*_{0} is equivalent to the hypothesis **S**_{E}**g** = 0, where **S**_{E} = (**s**′_{6}, **s**′_{7}, **s**′_{8}, **s**′_{9})′. The *F*-test statistic is (7) Under the null hypothesis *H*_{0}: **S**_{E}**g** = 0, the *F*-statistic given by *Eq. 7* follows an *F-*distribution with degrees of freedom 4 and *n* − *k* (18). This *F*-test can also be used to test all the genetic effects, including single gene effects or a combination of the genetic effects by modifying *Eq. 7*.

#### Statistical power and sample size.

Analytic formulas for predicting statistical power and sample size determination for a given set of statistical and genetic parameters were derived (*Eqs. C5–C18* in Supplemental Material) using an approach similar to that used in Refs. 13, 14, and 15. Table 2 shows the sample size requirement for each selected level of epistasis effect assuming a 95% statistical power and a 5% type I error rate. The results show that detecting *d* × *d* effect requires a considerably larger sample size than detecting the other epistasis effects. The reason is that the *d* × *d* variance is small even when the size of the *d* × *d* effect is the same as that of *a* × *a*. Under the assumptions of HWE, LE, and equal allele frequencies, σ= 2σ= 2σ= 4σ when |*i*_{aa}| = |*i*_{ad}| = |*i*_{da}| = |*i*_{dd}| based on the results of Cockerham (5) and Kempthorne (10), where σ= *a* × *a* variance, σ= *a* × *d* variance, σ = *d* × *a* variance, and σ_{DD}^{2} = *d* × *d* variance. Consequently, the contrast heritability of *i*_{dd}, which is the fraction of the phenotypic variance accounted for by the *d* × *d* contrast variance, is ∼0.25 of the contrast heritability of *i*_{aa}. For the same contrast heritability, the sample size required is the same for detecting each of the four epistasis effects. The sample size numbers in Table 2 are merely illustrative, and the actual sample size for a given set of data and threshold values of statistical power and type I error could vary significantly. In general, statistical power is an increasing function of the effect size, sample size, and type I error, while sample size is a decreasing function of type I error, type II error, and effect size.

#### Evaluation of statistical power, accuracy of estimating epistasis effects, and rates of false positives, using simulations.

The simulated phenotypic value of each individual is obtained as the summation of the individual quantitative trait locus (QTL) genotypic value and a random residual following *N*(0,1) distribution. Each simulation generated 10,000 repeats. Statistical powers observed from the simulated data agreed well with the predicted powers (Figs. 1 and 2). When more than one type of epistasis effect exists, the joint testing of the overall epistasis effect had the highest statistical power (predicted QTL and observed QTL in Figs. 1 and 2). Among the four types of epistasis effects involving two loci, the effect with largest contrast heritability (or contrast variance) is the easiest to detect. However, predicting the epistasis effect with the largest variance or contrast variance is not as straightforward as under the assumptions of HWE and LE, where the rank of statistical power is in the order of *a* × *a*, *a* × *d* (or *d* × *a*), and *d* × *d*, because the *a* × *a* effect has the largest contrast variance or heritability, i.e., *H*= 2*H*= 2*H*= 4*H*_{dd}^{2}. Under HWD and LD, this order may not hold because of the existence of covariances between the eight genetic effects (*Eq. C3* in Supplemental Materials and Ref. 9). For example, the *a* × *d* effect in Fig. 1 actually has slightly higher statistical power than the *a* × *a* effect. Figure 1 and all the other figures show that the *d* × *d* effect is the most difficult to detect. It is important to note that good statistical power and accuracy for *d* × *d* are possible if the *d* × *d* effect is sufficiently large. For example, when the *d* × *d* contrast heritability is ∼2% of the phenotypic variance (corresponding to 5% *a* × *a* contrast heritability; Table 2), the statistical power is nearly as good as for the other three epistasis effects (Fig. 2). In terms of accuracy of estimating epistasis effects measured by MSE, estimates for *a* × *a* and *a* × *d* (or *d* × *a*) had similar accuracy, while estimates for *d* × *d* had the worst accuracy (Fig. 3). Again, the poor accuracy in estimating the *d* × *d* effect was due to the fact that *d* × *d* had the smallest variance under the assumption of equal effect sizes for all epistasis effects. As sample size increases, all estimates tend to have similar accuracies (Fig. 3).

To evaluate the rate of false positives in the absence of any epistasis effect, the phenotypic value was defined as the random residual following *N*(0,1) distribution, so that the phenotypic value has no genetic effect. With 10,000 repeats, the observed rate of false positives for any of the four epistasis effects was virtually identical to the cutoff type I error used as the nominal significance level. This shows that the method of testing does not have inflated type I errors over those for random reasons expected by the nominal significance levels. To evaluate the rate of false positives due to confounding among the four types of epistasis effects in the presence of one type of epistasis effect, the phenotypic value was defined as the sum of a genotypic value and a random residual following *N*(0,1) distribution, where the genotypic value contains only one type of epistasis effect and is obtained by *Eq. C2* in Supplemental Materials. With 10,000 repeats, the observed rate of false positives for any of the three epistasis effects not present in the phenotypic value was virtually identical to the cutoff type I error used as the nominal significance level. This demonstrated that the method of testing did not have inflated type I errors due to confounding among the eight genetic effects over those expected for random reasons, as expected by *Eqs. 3.1*–*3.8*.

#### Comparison with methods requiring HWE and LE.

The method based on the extended Kempthorne model was compared with two methods that have been used in human epistasis testing (17): Cockerham's model (5) under the assumptions of HWE and LE and the genetic model of Cheverud (4). Theoretical analysis showed that the Cheverud model yields identical epistasis results as the Cockerham model under HWE and LE with equal allele frequencies (Supplemental Materials). For this reason, the Cheverud model will be considered as a method requiring HWE and LE in the following discussions. Both theoretical analysis and data simulation showed that the extended Kempthorne model had identical results for testing epistasis effects under HWE and LE with equal allele frequencies as the other two methods requiring HWE and LE (Supplemental Materials). However, application of methods requiring HWE and LE to data with HWD and LD generally had three problems: decreased statistical power, decreased accuracy in parameter estimation (except for *a* × *a* effect), and confounding between epistasis effects (or increased rate of false positives). Comparison of Fig. 1 with Fig. 4 shows that the two methods requiring HWE and LE had considerably lower statistical power than our method based on the extended Kempthorne model for detecting each of the *a* × *a*, *a* × *d*, *d* × *a*, and *d* × *d* epistasis effects when applied to data with HWD and LD. In terms of accuracy in estimating epistasis effects, the two methods requiring HWE and LE had larger MSEs (lower accuracies) than our method for *a* × *d*, *d* × *a*, and *d* × *d* epistasis effects, particularly when the sample size was small (Figs. 3 and 5). For detecting *a* × *a* effect, the methods requiring HWE and LE in fact had smaller MSEs. Confounding or increased rate of false positives is another problem when applying methods requiring HWE and LE to data with HWD and LD. Unlike other statistical problems such as power and accuracy problems that tend to diminish as sample size increases, the problem of confounding or false positives of the methods requiring HWE and LE becomes more serious as sample size increases (Figs. 6 and 7). The most serious confounding was that between *d* × *d* and *a* × *a* effects (Fig. 6), where *d* × *d* is the only genetic effect present in the phenotypic value but a high frequency of significant *a* × *a* effect was also observed. Confounding between *a* × *d* and *d* × *a* effects was also observed (Fig. 7) but was less serious than those in Fig. 6.

#### Computer implementation.

A computer program, epiSNP, has been developed to implement the epistasis testing method in this article using a two-step least squares analysis similar to that of Wolfinger et al. (23). The first step corrects the phenotypic values to remove nongenetic systematic effects such as gender, treatment, and living conditions, and the second step conducts a significance test for SNP effects using the corrected phenotypic values as the observations. The two-step analysis estimates nongenetic systematic effect only once and offers considerable computational efficiency when the number of SNPs is large. The computer program is freely available at **http://animalgene.umn.edu**.

## DISCUSSION

The pairwise epistasis testing method in this article can be used to assemble interactive SNPs over all autosomes and to assemble all SNP effects affecting a phenotype. The X chromosome can be included for pairwise analysis, but only female individuals can be used. This is because each female has two copies of the X chromosome, so that the analysis of the X chromosome in females is the same as that for autosomes. Similarly, the Z chromosome in birds can be included, but only male individuals can be used in the analysis. The final SNP assembly may contain single gene effects and epistasis effects of SNPs affecting the trait. This pairwise strategy could be used to identify higher order epistasis effects that can be placed in one “epistasis network” analogous to establishing a linkage group, i.e., if locus *A* interacts with locus *B* and locus *B* interacts with locus *C*, then *A*, *B*, and *C* are within one epistasis network. The pairwise analysis also can be used to assemble all SNP effects as the total SNP effect on a phenotype and is more practical than a higher order analysis. We caution that pairwise analysis does not offer the exact estimate of any higher order epistasis effect. Difficulties associated with a higher order analysis include mathematical complexity and computational difficulty of the method as well as sample size requirement. As the number of loci increases, the mathematical tediousness of the methodology is expected to grow rapidly but should still be manageable for a small number of loci. Potential computation difficulty is an issue that should be noted. The total number of pairwise tests is *n*(*n* − 1)/2, which is about 125 billion for 500,000 SNPs. Testing three loci at a time will increase the number of tests to more than 20,000 trillion. This magnitude of increase in the number of tests could easily reach the point where the numerically intensive computations exceed the capacities of the most advanced computing facilities. Another challenge of higher order analysis is in the sample size requirement. The sample size required to add each locus in the higher order epistasis analysis may need to be increased substantially to ensure that each subclass has a sufficient number of observations.

## GRANTS

This research was supported in part by funding from the National Research Initiative Competitive Grants Program/United States Department of Agriculture (grant no. 03275) and the Agriculture Experimental Station of the University of Minnesota (project MN-16-043).

## Acknowledgments

We thank an anonymous reviewer for the thorough reading of the manuscript and thoughtful suggestions that improved the manuscript.

## Footnotes

↵1 The 2nd International Symposium on Animal Functional Genomics was held May 16–19, 2006, at Michigan State University in East Lansing, MI, and was organized by Jeanne Burton of Michigan State University and Guilherme J. M. Rosa of University of Wisconsin, Madison, WI (see meeting report by Drs. Burton and Rosa,

*Physiol Genomics*28: 1-4, 2006).Address for reprint requests and other correspondence: Y. Da, 265D Haecker Hall, Dept. of Animal Science, Univ. of Minnesota, St. Paul, MN 55108 (e-mail: yda{at}umn.edu).

Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).

- Copyright © 2007 the American Physiological Society