## Abstract

We present a statistical model for testing and estimating the effects of maternal-offspring genome interaction on the embryo and endosperm traits during seed development in autogamous plants. Our model is constructed within the context of maximum likelihood implemented with the EM algorithm. Extensive simulations were performed to investigate the statistical properties of our approach. We have successfully identified a quantitative trait locus that exerts a significant maternal-offspring interaction effect on amino acid contents of the endosperm in maize, demonstrating the power of our approach. This approach will be broadly useful in mapping endosperm traits for many agriculturally important crop plants and also make it possible to study the genetic significance of double fertilization in the evolution of higher plants.

- autogamous plants
- EM algorithm
- linkage
- maternal-offspring interaction
- quantitative trait loci

higher plants are characterized by a complex life cycle that consists of alternating haploid and diploid generations. The diploid plant life form, called the “sporophyte,” supports meiosis which produces the haploid male and female spores that initiate the gametophytic generation. The sporophyte also nurtures the reproductive structures, such as the integuments within which the embryo develops (4). Gametogenesis and fertilization take place in an environment where gametophytic and sporophytic structures interact and are placed under several layers of haploid and diploid genetic controls (4).

This interaction culminates in the formation of a new diploid generation during a complex process called “double fertilization” (12). Following meiosis, three of the four megaspores degenerate, and the surviving megaspore produces the female gametophyte (embryo sac), which typically contains eight nuclei and seven cells. Two cells are female gametes: the haploid egg cell and the homodiploid central cell. The product of meiosis in the male gametophyte (pollen) produces a tip-growing pollen tube that migrates to stigma and eventually enters the ovule through the micropyle (sporophytic) and delivers two sperms into the embryo sac. Two zygotic products are produced, following fusion with one of the two sperm cells: the diploid embryo zygote that develops as the daughter plant and a triploid cell that develops as endosperm with a balance number of maternal and paternal genomes 2m:1p (3, 5).

Higher plant reproduction is thus characterized by five developmental phases: the diploid sporophyte, the haploid female gametophyte, the haploid male gametophyte, the developing diploid embryo, and the developing triploid endosperm. The development of the embryo sac and the seed are under control of both sporophytic and the female gametophytic origin. The paternal gametophytic and postfertilization sporophytic controls are additional levels of the complex genetic interactions that govern seed development. Recent genetic studies have identified different classes of maternal effect genes involved in seed development. These include genes required in the sporophyte for proper development of the embryo sac (14), genes required in the (maternal) sporophyte for normal embryo development (7), and genes required in the female gametophyte for proper embryo development (20). More recently, Evans and Kermicle (10) isolated a mutant in maize with effects on postfertilization development. By performing quantitative genetic analysis of different generations initiated with inbred lines, Dilkes et al. (9) detected significant evidence of sporophytic gene control over endoreduplication in maize endosperm.

It is expected that molecular markers, in conjunction with segregating plant pedigrees, have greater power and precision of detecting maternal effect genes affecting embryo and endosperm development in higher plants. The current molecular dissection of endosperm is mostly based on the assumption that endosperm-specific traits are only controlled by genes from the maternal sporophyte (22, 23). With this assumption (15), a traditional interval mapping method for diploid tissues can be directly used. Wu et al. (29) proposed an improved statistical model for dissecting endosperm traits by taking its trisomic inheritance property into consideration. The traditional interval mapping method may be appropriate for unraveling the genetic basis of early seed development, because at this stage the seed’s own genome has not yet played a role. For example, a recent study suggested that a large part of the paternal genome is silenced during early seed development (21). However, for agriculturally important, mature seed traits, which are to an increasing extent controlled by the seed’s genome (17), the triploid model of Wu et al. (29) should be biologically more relevant. In a mature maize endosperm analysis (30), the model of Wu et al. detected more significant quantitative trait loci (QTL) than the method of Lander and Botstein (15).

Because seed development is under control of both the sporophytic (maternal) genome and the seed’s own genome (offspring), joint maternal-offspring effects should be modeledfor control mechanisms influencing seed development. In this article, we develop a new statistical model for mapping seed-specific QTL expressed in both the sporophytic and offspring genomes. Our model is based on a statistical mixture model, consisting of quantitative genetic parameters contained in each normal density and the proportion of each genome-of-origin-specific QTL genotype. The maximum likelihood implemented with the EM algorithm (8) has been employed to estimate QTL effect and position parameters. An extensive simulation study is used to examine the statistical behavior of our mapping model.

## THE GENETIC MODEL

Seed development in angiosperms includes two major components, the embryo and the endosperm. These two tissues have different ploidy levels and are formed through different inheritance mechanisms. Therefore, we consider their underlying genetic models separately.

#### The embryo model.

For a QTL of two alleles (designated by *Q* and *q*) affecting a seed trait, tissue of diploid origin can have one of three possible genotypes, *QQ*, *Qq* and *qq*. Because a seed-specific trait is under control of both the sporophytic maternal genome and the offspring genome, its genetic value should be described by a joint effect of the two genomes. More specifically, modeling the overall genotypic value of an embryo trait in the seed needs to consider gene transition from the sporophyte (generation *t*) to its zygotic offspring (generation *t* + 1). For an autogamous species, the sporophytic genotype *QQ*(*t*) generates one embryo genotype *QQ*(*t* + 1); the sporophytic genotype *Qq*(*t*) generates three embryo genotypes *QQ*(*t* + 1), *Qq*(*t* + 1), and *qq*(*t* + 1) with the respective probabilities of 1/4, 1/2, and 1/4; and the sporophytic genotype *qq*(*t*) generates one embryo genotype *qq*(*t* + 1).

Our quantitative genetic model for seed development will be constructed on the basis of the combination of two-generation (maternal and offspring) QTL genotypes at the putative QTL. Let *a* and *d* be the additive and dominant effects of the QTL, respectively. Thus the genotypic values of three genotypes *QQ*, *Qq*, and *qq* can be specified as μ + *a*, μ + *d*, and μ − *a*, where μ is the overall mean. For a joint maternal-offspring QTL homozygote, only the additive effects are involved; for example, the genotypic values of *QQ*(*t*), *QQ*(*t* + 1), and *qq*(*t*) *qq*(*t* + 1) can be denoted by μ + 2*a* and μ − 2*a*, respectively. For joint maternal-offspring QTL heterozygotes, *Qq*(*t*)*QQ*(*t* + 1), *Qq*(*t*)*Qq*(*t* + 1), and *Qq*(*t*)*qq*(*t* + 1), we need to model both the additive and dominant effects, whose genotypic values are expressed as μ + *a* + β_{1}, μ + β_{2}, and μ − *a* + β_{3} (Table 1). The dominant effect of a joint heterozygote (β_{1}, β_{2}, or β_{3}) can be partitioned into two components due to the intra-locus interaction within (*d*) and between the generations. For the diploid embryo, we denote *b*_{1} and *b*_{2} to be the between-generation dominant effects between the maternal heterozygote and offspring homozygote and between the maternal heterozygote and offspring heterozygote, respectively. We thus have 1 The compositions of different joint maternal-offspring QTL genotypes for the embryo are given in Table 1.

#### The endosperm model.

Three F_{2} sporophytic QTL genotypes are self-crossed to form different endosperm genotypes, i.e., *QQ*(*t*) to *QQQ*(*t* + 1); *Qq*(*t*) to *QQQ*(*t* + 1), *QQq*(*t* + 1), *Qqq*(*t* + 1), and *qqq*(*t* + 1), each with a probability of 1/4; and *qq*(*t*) to *qqq*(*t* + 1).

For the triploid endosperm, the within-generation dominant effect can be due to the interactions between different numbers of dominant vs. recessive alleles. Let *d*_{1} and *d*_{2} be the dominant effects of two *Q* vs. one *q*(*QQq*) and one *Q* vs. two *q*(*Qqq*), respectively. Thus four endosperm QTL genotypes, *QQQ*, *QQq*, *Qqq*, and *qqq*, can be modeled by μ + 3/2*a*, μ + ½*a* + *d*_{1}, μ − ½*a* + *d*_{2}, and μ − 3/2*a*. We use *b*_{1} to denote the between-generation dominant effect between the maternal heterozygote and the offspring homozygote, and we use *b*_{2} and *b*_{3} to denote the between-generation dominant effects between the maternal heterozygote and offspring heterozygotes *QQq* and *Qqq*, respectively. Thus the dominant effects of joint maternal-offspring endosperm heterozygotes, *Qq*(*t*)*QQQ*(*t* + 1), *Qq*(*t*)*QQq*(*t* + 1), *Qq*(*t*)*Qqq*(*t* + 1), and *Qq*(*t*)*qqq*(*t* + 1), are expressed, respectively, as 2 See Table 1 for the compositions of different joint maternal-offspring QTL genotypes for the endosperm.

## EXPERIMENTAL DESIGN

For tissue of ploidy level >2, genotypic characterization using molecular markers can be difficult. For this reason, the triploid endosperm is generally not genotyped in endosperm traits mapping. Marker genotypes can be derived from two different tissues, the sporophyte (generation *t*) and the embryo (generation *t* + 1). This thus establishes a two-stage hierarchical design for genotyping. Suppose there is an F_{2} population of size *M*, initiated with two contrasting inbred lines. A number of molecular markers (denoted by **M**) are genotyped both for the F_{2} samples and the embryos from their seeds. Let *n _{i}* denote the number of seeds collected from F

_{2}plant

*i*. From these sampled seeds, various phenotypes of interest are measured for the diploid embryo and triploid endosperm.

The conditional probabilities of joint QTL genotypes for the maternal plant and offspring, conditional upon the genotypes of two flanking markers **M**_{η} and **M**_{η+1} (of a recombination fraction of *r*) from the sporophytic plant and its embryo, can be derived on the basis of gene transition patterns for different F_{2} genotypes. We use *r*_{η1} and *r*_{η2} to denote the recombination fractions between the left marker **M**_{η} and QTL, and the QTL and right marker **M**_{η+1}, respectively. A general expression of the conditional probabilities is written as 3 where *g _{k}*(

*t*+ 1) is the joint maternal-offspring QTL genotype for an embryo (

*L*= 5) or an endosperm (

*L*= 6) (Table 1),

*G*(

_{i}*t*) is the marker genotype of F

_{2}plant

*i*(in generation

*t*), and

*G*(

_{ij}*t*+ 1) is the embryo marker genotype of the

*j*th seed (in generation

*t*+ 1) which sporophytic plant

*i*produces. The table in Fig. 1 materializes the conditional probabilities shown in

*Eq. 1*for the embryo. A similar conditional probability matrix can also be derived for the endosperm.

## THE STATISTICAL MODEL

#### The mixture model.

A fundamental statistical model for mapping QTL is based on a mixture model that has been previously developed (15, 28). In the mixture model, each observation *y* is assumed to have arisen from one of *L* components, with each component being modeled by a density from the parametric family *f*. In this study, the phenotype *y _{ij}* derived from the

*j*th seed of the

*i*th F

_{2}plant is assumed to be determined by one of the

*L*joint maternal-offspring QTL genotypes, plus a random error, with the likelihood function expressed as a mixture model as follows: 4where π

*= (π*

_{ij}

_{ij}_{1}, …, π

*)*

_{ijL}*are the mixture proportions specified by conditional probabilities of the joint maternal-offspring QTL genotype given a two-stage hierarchic marker genotype for the*

^{T}*j*th seed from the

*i*th F

_{2}plant; μ = (μ

_{1}, … , μ

*)*

_{L}*are the expected genotypic values of different QTL genotypes; and σ*

^{T}^{2}is the residual variance within each QTL genotype.

#### The EM algorithm.

Under the two-stage hierarchical genotyping design, we have the likelihood of all observations as 5 We have formulated a procedure for implementing the EM algorithm to obtain the maximum likelihood estimates (MLEs) of the unknown parameters including the QTL effects and residual variance (μ* _{k}*, σ

^{2}) and the QTL position (

*r*

_{η1}) contained within π

*(table in Fig. 1). The EM algorithm is described as follows.*

_{ijk}In the E step, the conditional probabilities (priors) of the QTL genotypes given the marker genotypes and the normal distribution function are used to calculate 6 which could be thought of as a posterior probability that the *j*th seed of the *i*th F_{2} plant has the *k*th joint maternal-offspring QTL genotype.

In the M step, the calculated posterior probabilities were used to solve the unknown parameters 7 8 Iterations are repeated between *Eqs. 6–8* until convergence. The values at convergence are the MLEs. With the MLEs of μ* _{k}* values, the MLEs of the overall mean, the additive effect, and within- and between-generation dominant effects of the QTL, as indicated in Table 1, can be obtained by solving a system of regular equations. It should be pointed out that the separation of within-generation from between-generation dominant effects for the endosperm has two difficulties. First, the endosperm model is overparameterized because six unknown dominant parameters (at the left side of

*Eq. 2*) are contained within the estimated genotypic means of four joint maternal-offspring heterozygotes (μ

_{2}, …, μ

_{5}; Table 1). Second, μ

_{3}and μ

_{4}are indistinguishable because the conditional probabilities of the corresponding QTL genotypes

*t*)

*QQq*(

*t*+ 1) and

*t*)

*Qqq*(

*t*+ 1) given the marker genotypes are identical (results not shown).

The estimation of the QTL position can be obtained using a grid approach. This approach views *r*_{η1} or *r*_{η2} as a known parameter in the likelihood function (4) by scanning the QTL over all marker intervals. The position corresponding to the maximum of the log-likelihood ratio across a linkage group is the MLE of the QTL position.

#### Hypothesis tests.

A number of hypothesis tests can be formulated for our seed model proposed above. The first hypothesis test considers the existence of any QTL affecting the expression of an embryo or endosperm trait. For the embryo model, for example, we have the hypotheses, 9 The test statistics for testing the above hypotheses are calculated as the log-likelihood ratio of the full model (*H*_{1}) over the reduce model (*H*_{0}), 10 where the tilde (∼) and the carat (^) symbols denote the MLEs of the unknown parameters under H_{0} and *H*_{1}, respectively. The log-likelihood ratio (LR) is asymptotically χ^{2} distributed with 4 degrees of freedom. However, the critical threshold value for declaring the existence of a QTL is generally calculated on the basis of permutation tests (6).

After a significant QTL is found, any specific components of the genotypic values can be tested. For example, the maternal-offspring intra-locus interaction effect on the embryo trait can be tested by formulating the following hypotheses, 11 whose log-likelihood ratio test statistics is asymptotically χ^{2} distributed with 2 degrees of freedom. Testing *b*_{1} = 0 and *b*_{2} = 0 is equivalent to testing μ_{1} − μ_{5} = 2(μ_{2} − μ_{4}) and μ_{1} + μ_{5} = 2(μ_{2} + μ_{4} − μ_{3}), respectively.

Similar hypotheses can also be formulated to test whether there is a QTL affecting a endosperm trait and whether there is a significant intra-locus interaction between the maternal heterozygote and the offspring homozygote (*b*_{1}; Table 1). The latter hypothesis test can be performed under constraint 5(μ_{2} − μ_{5}) = 3(μ_{1} − μ_{6}). One can also test whether one or both of the sums *d*_{1} + *b*_{2} and *d*_{2} + *b*_{3} (Table 1) are significantly different from zero. But the separation of *d*_{1} and *b*_{2} or *d*_{2} and *b*_{3} is not possible unless some particular constraints are used. The critical thresholds for all these hypotheses mentioned above can be obtained by simulation studies.

## RESULTS

#### Monte Carlo simulation.

We performed a series of simulation experiments to examine the statistical properties of the method proposed to map seed development. A linkage group length of 180 cM, comprising 10 equidistant markers ordered *M*_{1}, …, *M*_{10}, is simulated for an F_{2} population. We hypothesize a QTL affecting an embryo trait located at 5 cM from the left marker of the third interval or at 45 cM from the first marker of the linkage group. As a result of the nature of our approach, we simulate two-stage hierarchical marker genotypes for the F_{2} individuals (in generation *t*) and their autogamous progeny (in generation *t* + 1). The autogamous embryos derived from the F_{2} are affected by five joint maternal-offspring QTL genotypes, *QQ*(*t*)*QQ*(*t* + 1), *Qq*(*t*)*QQ*(*t* + 1), *Qq*(*t*)*Qq*(*t* + 1), *Qq*(*t*)*qq*(*t* + 1), and *qq*(*t*)*qq*(*t* + 1), with the frequencies of 1/4, 1/8, 1/4, 1/8, and 1/4, respectively. The genetic variance due to this QTL is calculated using assumed genetic effect values (Table 2). Given a sample size of 400, our simulation scenarios include few families (10) each with large size (40) or many families (40) each with small size (10). These two different sampling strategies (10 × 40 and 40 × 10) are combined with different levels of heritabilities (*H*^{2} = 0.2 vs. 0.6).

Our simulation for the marker genotypes includes a two-stage hierarchy. The upper level in the hierarchy is the F_{2} genotype, whereas the lower level in the hierarchy is the autogamous embryos from the F_{2}. Let us consider the first marker that has three genotypes *M*_{1}*M*_{1}(*t*), *M*_{1}*m*_{1}(*t*), and *m*_{1}*m*_{1}(*t*), with a probability of 1/4, 1/2, and 1/4, at the upper level. Genotype *M*_{1}*M*_{1}(*t*) is self-pollinated to produce a single genotype *M*_{1}*M*_{1}(*t* + 1); genotype *M*_{1}*m*_{1}(*t*) to produce three genotypes *M*_{1}*M*_{1}(*t* + 1) (1/4), *M*_{1}*m*_{1}(*t* + 1) (1/2), and *m*_{1}*m*_{1}(*t* + 1) (1/4); and *m*_{1}*m*_{1}(*t*) to produce a single genotype *m*_{1}*m*_{1}(*t* + 1) at the lower level. Similarly, the second marker also has three genotypes at the higher level, each of which is self-pollinated to produce the corresponding genotypes at the lower level. At the higher level, three genotypes for the first marker are combined with three genotypes for the second marker to form nine 2-locus genotypes, each with a probability being a function of the recombination fraction between these two markers. Meanwhile, considering the difference of genotypes at the higher level, each marker should have five joint maternal-offspring genotypes, and thus, a pair of markers produces 25 such joint genotypes at the lower level. The probability of each of these joint genotypes depends on the probability of the corresponding two-locus genotype at the higher level and the Mendelian segregation ratios of a heterozygote, if any. This simulation strategy is extended to consider all markers. We use the Kosambi map function to convert the map distance into the recombination fraction.

The declaration for the existence of QTL is based on a critical threshold for the log-likelihood ratio test statistic that controls the chromosome-wide type I error rate. The characterization of the threshold for declaring the existence of a QTL is a difficult issue. The simulation test is regarded as a useful approach for calculating the threshold, because it is not dependent on the distribution of the test statistic. We simulate the marker genotype data and the phenotype data under the null hypothesis that there is no QTL. The simulated data are analyzed by the proposed model. The distribution of the log-likelihood ratio values over 1,000 simulation replicates can be approximated by a χ^{2} distribution. The 99th percentiles of the distribution of the maximum are used as empirical critical values to declare the existence of a QTL on the linkage groups at the significance level α = 0.01.

Figures 2 and 3 illustrate the profiles of the log-likelihood ratio test statistics across the simulated linkage group under different sampling strategies and heritability levels. In all the situations, QTL can be detected given that the peaks of the profiles are greater than the critical threshold. But an increase of heritability from 0.2 to 0.6 can increase the power to detect QTL. It appears that the two sampling strategies provide similar power and accuracy to detect the QTL position (Figs. 2 and 3). In general, our model can provide reliable estimates of the QTL effects including the additive, within-generation dominant, and between-generation dominant (Table 2). As expected, the additive effect and residual variance can be better estimated than the dominant effects. The within-generation dominant effects (*b*_{1}) can be more precisely estimated than the between-generation dominant effects (*b*_{2}).

The MLEs of dominant effects have large sampling errors estimated from 100 simulation replicates, but the sampling errors can be reduced when an effective measure is taken to increase the heritability level or when sampling strategy 40 × 10 is used (Table 2). The increase of heritability from 0.2 to 0.6 markedly reduces the sampling errors of dominant effect estimation. We perform a hypothesis test for the significance of between-generation dominant effects based on the hypothesis described by *Eq. 11*. Given the data set simulated under the condition as given in Table 2, we estimate the log-likelihood ratios under the hypotheses by *Eq. 11* and reject the null hypothesis *b*_{1} = *b*_{2} = 0 in all 100 simulations for different heritabilities and sampling strategies. This suggests that our model has great power to detect maternal-offspring dominant effects on an embryo trait.

Similar simulation designs were also made to study the statistical properties of the endosperm model. Because the endosperm model is overparameterized and because the genotypic means of two heterozygote QTL genotypes are indistinguishable, we cannot estimate all within-generation and between-generation dominant effects defined in *Eq. 2* and Table 1. However, it is possible to make hypothesis tests for some of the dominant effects or the sum of them.

In practice, we may simplify the endosperm model and reduce the number of dominant effect parameters involved. For example, by letting *d*_{1} + *b*_{2} = *e*_{1} and *d*_{2} + *b*_{3} = *e*_{2} so that the number of the unknown parameters equal the number of equations, we can make the model more tractable. However, because the estimated genotypic values of joint maternal-offspring genotypes *Qq*(*t*)*QQq*(*t* + 1) and *Qq*(*t*)*Qqq*(*t* + 1) are unidentifiable, *e*_{1} and *e*_{2} can still not be uniquely estimated. We simulated the endosperm data assuming that *e*_{1} and *e*_{2} are distinguishable. Results from the simulation suggest that the estimation precision of the endosperm parameters (*Eq. 2* and Table 1) is broadly consistent with that of the embryo model.

We performed an additional simulation to test the sensitivity of our model to false positives. A data set for two-stage maternal-offspring marker genotypes and offspring phenotypes was simulated under the assumption that there is no maternal effect. This simulated data set was then analyzed with our (full) model incorporating the maternal-offspring interaction effects and a (reduced) model with no such effects. We did not find significant maternal and maternal-offspring interaction effects by the full model, although both the full and reduced models can detect offspring QTL (results not shown).

#### A case study.

We use an example of maize for two endosperm traits, elongation factor 1α (eEF1A) and free amino acid (FAA) contents, to demonstrate the power of our statistical approach. An F_{2} population of 106 plants was derived from a cross between two contrasting maize inbred lines, Oh51A*o*2 (high eEF1A and low FAA content) and Oh545*o*2 (low eEF1A and high FAA content). The F_{2} and F_{2:3} progeny from this cross were prepared for genotypic and phenotypic analysis as previously described by Wang and Larkins (22). DNA was extracted from young leaves of the F_{2} plants, whereas grain protein quality traits were measured from the F_{3} kernels of the F_{2}, as described in Wang and Larkins (22) and Wang et al. (23). Simple sequence repeat (SSR) primers were selected from the Maize Microsatellite-RFLP consensus map. The primer sequences were described in the Maize Genome Database (22). A linkage map of 83 SSR markers of the F_{2} plants was constructed, based on the known order of SSR markers on maize chromosomes.

Our proposed autogamous model is used in allogamous maize, because the F_{2:3} on which the endosperm traits were measured were derived from artificial self-pollination. Given the structure of the data used in this example, we modified our model to map joint maternal-offspring QTL effects on endosperm traits based on the marker genotypes only derived from the F_{2} (maternal) plants (29). We recognize that this one-stage genotyping scheme provides limited information to separate the estimated genotypic means of the six 2-generation QTL genotypes (see Table 1). But this data set is still useful for providing a test for the existence of a maternal-offspring QTL affecting the endosperm trait using the hypothesis described by *Eq. 10*. All the 10 chromosomes were scanned for the existence of QTL for two endosperm traits. We successfully detect a QTL for eEF1A on chromosome 6, as indicated by LR of 31.2 greater than the genome-wide threshold of 22.4 at the significance level α = 0.001. The estimation position of QTL is 28 cM from the first marker of chromosome 6 (Fig. 4). The threshold for claiming the detection of a QTL was calculated on the basis of the 99.9th percentile of the LR distribution from 1,000 permutation tests.

## DISCUSSION

We have proposed a new statistical model for testing and estimating the joint effects of maternal and offspring genomes on quantitative traits expressed during seed development. Maternal inheritance has long been thought to affect animal traits (1, 13), but based on recent observations (9), maternal effects may be of greater importance in the seed formation of flowering plants than originally appreciated. The formation of seed in flowering plants results from double fertilization (11), in which one of the two sperm cells from a pollen tube fertilizes the haploid egg cell to form a diploid zygote (embryo) and the other sperm cell fertilizes the diploid central cell and fuses with the central cell (polar) nuclei, thus giving rise to the triploid endosperm. Both the embryo and endosperm affect seed size and seed quality, including oil, protein, and carbohydrate contents. An understanding of how these traits are genetically determined through the integration of the formation mechanisms of the embryo and endosperm is of paramount importance to increase grain production and quality (2).

Wu et al. (29) have derived a maximum likelihood-based one-QTL model for mapping endosperm-specific traits in autogamous plants. This endosperm model has proven to be more powerful to detect significant QTL than can usual diploid-models. But the model of Wu et al. did not take into account the effects due to the interaction between the maternal genome and offspring. A considerable body of literature supports the view that the maternal and offspring genomes interact to determine the developmental processes of seeds, including those in both the embryo and the endosperm. In a recent study in *Arabidopsis*, the paternal genes were found not to be expressed during early stages of seed development (21). But this finding is not supported by other studies in the same plant species or different species (24). Our model can be used to test for the maternal-offspring interaction effect of a QTL on seed development and separate the effects due to the maternal genome from the effect due to the offspring genome. This separation can be made through performing the hypothesis testing of *Eq. 11*. Thus the approach proposed here can be used to examine how and when the paternal genome exerts effects on seed development and, ultimately, address the above-mentioned fundamentally important debate arising from *Arabidopsis* genetic research.

Studies of joint effects of maternal and offspring genomes on offspring traits have received considerable attention in animals (25–27), but there is surprisingly a paucity in this kind of study for plants. Our proposed model will provide a powerful tool for mapping specific genetic loci that trigger joint maternal-offspring effects in plants. Our model is based on an F_{2} population for an autogamous plant system. It is not difficult to extend the model to other reproductive systems, such as allogamous, mixed-pollinated, and other mapping populations. For an autogamous plant, the eggs and two polar nuclei cells are self-fertilized so that the frequencies of male gamete genotypes are identical to those of female gamete genotypes. But in an allogamous plant, such as maize, each female gamete from each mother plant will be pollinated by all possible male gametes from the pollen pool. This difference should be considered when the current model is used to study the genetics of the allogamous seed development. As mentioned above, the autogamous system has an inherited limitation to separate genotypic means of two joint maternal-offspring QTL heterozygotes for the endosperm because the conditional probabilities of these two QTL heterozygotes given two-stage hierarchical marker genotypes are identical. This problem disappears for allogamous and mixed-pollinated systems in which all QTL genotypes can be uniquely determined by marker genotypes.

Understanding the maternal and paternal genetic regulation of seed development helps to answer many fundamental evolutionary questions in higher plants. There is a straightforward application of our model to evolutionary genetic studies, but this will need to consider the patterns of gene segregation and transmission in natural plant populations. Additional parameters characterizing population structure and organization, such as allele frequencies, linkage disequilibrium, and haplotype frequencies (16), should be incorporated into our seed development mapping model. In addition, a considerable body of literature has suggested different roles of the paternal and maternal loci in seed development, a phenomenon called “parent-of-origin effects” (18, 19). Our model provides a fundamental platform for detecting and characterizing these so-called imprinting genes whose expression depends on the origin of parents. In sum, the model framework presented in this article will make us closer to unravel the genetic basis of embryogenesis and seed development in higher plants.

## GRANTS

This work is partially supported by an Outstanding Young Investigators Award of the National Science Foundation of China (30128017), a University of Florida Research Opportunity Fund (02050259), and a University of South Florida Biodefense Grant (7222061-12) to R. Wu.

## Acknowledgments

The publication of this manuscript is approved as journal series R-10069 by the Florida Agricultural Experiment Station.

## Footnotes

Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).

Address for reprint requests and other correspondence: R. Wu, Dept. of Statistics, 533 McCarty Hall C, Univ. of Florida, Gainesville, FL 32611 (E-mail: rwu{at}stat.ufl.edu).

- Copyright © 2004 the American Physiological Society