## Abstract

“Naturally occurring” or “programmed” cell death (PCD) in which the cell uses specialized cellular machinery to kill itself is a ubiquitous phenomenon that occurs early in organ development. Such a cell suicide mechanism that enables metazoans to control cell number and eliminate cells threatening the organism’s survival has been thought to be under genetic control. In this report, we develop a novel statistical model for mapping specific genes or quantitative trait loci (QTL) that are responsible for the PCD process based on polymorphic molecular markers. This model incorporates the biological mechanisms of PCD that undergoes two different developmental stages, exponential growth and polynomial death. We derived a parametric approach to model the exponential growth and a nonparametric approach based on the Legendre function to model the polynomial death. A series of stationary and nonstationary models has been used to approximate the structure of the covariance matrix among cell numbers at a multitude of different times. The statistical behavior of our model is investigated through simulation studies and validated by a real example in rice.

- quantitative trait loci
- semiparametric model
- mean-covariance structure model
- EM-Simplex algorithm
- order selection

the question of how an organism develops into a fully functioning adult from a mass of undifferentiated cells has always attracted top researchers in diverse areas of developmental biology (19). Fundamentally, to produce a functioning adult form, a living organism should coordinate various complementary and sometimes antagonistic processes, which include cell proliferation and programmed cell death (PCD), or apoptosis, during its development (8). The molecular and genetic characterization of these processes has been useful to identify the specific signaling pathways that underlie a delicate balance between cell proliferation and PCD (32) and, in the ultimate, enhance our understanding of the roots of disease such as cancer (17, 43). In particular, PCD has been thought to be a universal phenomenon that occurs in predictable patterns in response to environmental or developmental clues, whose study has become one of the most fascinating areas in all of biology over the last decade (16, 19).

Studies of the genetic control of development have used simple model systems, the nematode *Caenorhabditis elegans* and the fruitfly *Drosophila*, from which views have been established that PCD involves specific genes and proteins and that these genes and proteins interact within the cells that die (13, 35, 31, 18). For the adult hermaphrodite *C. elegans* forms, there are 1,090 somatic cells, of which 131 die by apoptosis. Each apoptosis process is characterized by four stages (31): *1*) decision about whether a cell should die or assume another fate, *2*) death, *3*) engulfment of the dead cell by phagocytes, and *4*) degradation of the engulfed corpse. Each of these stages is regulated by a number of genes. Mutations affecting the final three stages affect all somatic cells, whereas genes affecting the death verdict affect very few cells.

The molecular genetic pathway that defines PCD in *C. elegans* and *Drosophila* provides a basis for understanding apoptosis in more complex organisms, including higher plants and humans, because genes responsible for PCD are evolutionarily conserved (18). Given the unique complexity of genetic pathways of these species, however, a rigorous, detailed, and analytic approach should be developed on its merit that allows for the genome-wide identification of genes for apoptosis in any complex organism. Genetic mapping based on molecular markers (23, 44, 26), by superimposing real biological phenotypes on genome sequence and structural polymorphisms, can provide an unbiased view of the network of gene actions and interactions of quantitative trait loci (QTL) that build a complex phenotype like PCD.

Unlike traditional QTL mapping for a complex trait, the mapping of PCD must incorporate the dynamic feature of this developmental process. Although this presents one of the most difficult tasks in genetic studies, some of the key issues have been overcome by Wu and colleagues (27, 37–41), who proposed a so-called “functional mapping” to map and identify specific QTL that underlie the developmental changes of complex traits. The rationale of functional mapping is to express the genotypic values of QTL at a series of time points in terms of a continuous growth function with respect to time *t*. Under this formulation, the parameters describing longitudinal trajectories, rather than time-dependent genotypic values as carried out in traditional mapping strategies, are estimated within a maximum likelihood framework. Also, unlike traditional strategies, functional mapping estimates the parameters that model the structure of the covariance matrix among a multitude of different time points, which, therefore, largely reduces the number of parameters being estimated for variances and covariances, especially when the dimension of data is high.

In this article, we develop a novel statistical model for the genome-wide scan of QTL that guide PCD toward an active process of cell death. This model incorporates two sequentially distinct stages of the developmental process into the mapping framework constructed within the context of Gaussian mixture models. The first stage, growth, has proven to obey some universal growth law that can be modeled mathematically by curve parameters (5). Although no proper mathematical equation can describe the second stage, death, which is subject to a fast exponential decay of cells followed by a slowly decreasing function (4), a nonparametric approach based on the Legendre function is derived to model the PCD process. The combination of parametric modeling of the growth process and nonparametric modeling of the death process lays a foundation for semiparametric functional mapping of PCD. We implement a nonstationary mean-dependent covariance model to characterize the structure of the covariance matrix among cell numbers measured at a multitude of different times. The statistical behavior of our model is investigated through simulation studies. The utility of the model in a real example of rice suggests that our model can be useful in practice.

## DIFFERENT PHASES OF PCD

In general, the whole process of PCD can be described by five reasonably distinct phases (Fig. 1; Ref. 15): lag, exponential, declining growth rate, a stationary phase, and death. Each of the phases is defined below.

### Lag Phase

The lag phase is the initial growth phase, during which cell number remains relatively constant before rapid growth. During this phase the organism prepares to grow, and unseen biochemical changes, cell division, and differentiation of tissues occur during this time.

### Exponential Phase

During the exponential phase the tissues are growing and dividing rapidly to take advantage of abundant nutrients. Growth rate, as a measure of the increase in biomass over time, is determined from the exponential phase. Growth rate is one important way of expressing the relative success of an organism in adapting to the biotic or abiotic environment imposed upon it. The duration of the exponential phase depends on the growth rate and the abundance of nutrients to support tissue growth. If the growth phase is plotted (time on *x*-axis and biomass on logarithmic *y*-axis), the exponential phase will be straightened out.

### Declining Growth

Declining growth normally occurs when either a specific requirement for cell division is limiting or something else is inhibiting reproduction. During this phase growth slows or the death rate increases. As a result, the initiation of new tissues and the senescence and death of old ones start to come into equilibrium. This phase typically occurs as nutrients become limiting for growth.

### Stationary Phase

Tissues enter the stationary phase when net growth is zero, and within a matter of time cells may undergo dramatic biochemical changes. The nature of the changes depends on the growth-limiting factor. The shutdown of many biochemical pathways as the stationary phase proceeds means that the longer the cells are held in this condition the longer the lag phase will be when cells are returned to good growth conditions.

### Death Phase

When cell metabolism can no longer be maintained, the death rate of a tissue is generally very rapid (4). The steepness of the decline is often more marked than that represented in the accompanying growth figure.

The duration and extent of each phase will depend on the organism and the environmental conditions. For example, if tissues from the stationary phase are supplied with fresh nutrients, the lag phase will be longer than for the case of tissues from the declining phase. For growing tissues from the exponential phase, organisms supplied with fresh nutrients will likely skip the lag phase. If the growth nutrient is rich, organisms will remain in the exponential growth phase for a longer period and produce a greater biomass. Furthermore, their rate of growth in the exponential phase may also be greater.

Growth curves must be drawn from a series of growth measurements at different times during the growth curve. Mathematical equations have been derived to model the growth from the lag to stationary phases (34), although there is no specific mathematical equation for the death phase.

## STATISTICAL MODEL

### The Mixture Model-Based Likelihood

Consider a standard backcross design, initiated with two contrasting homozygous inbred lines, in which there are two genotypes at each locus. Assume that a genetic linkage map covering the entire genome has been constructed with polymorphic markers, aimed to identify QTL responsible for PCD. There are a certain number of QTL forming *J* genotypes that affect PCD. The statistical foundation for functional mapping of these QTL is based on a finite mixture model. According to this mixture model, each PCD curve, *y*_{i}, for a backcross progeny (*i*) longitudinally measured at *T* time points is assumed to have arisen from one (and only one) of these *J* QTL genotypes (called components in statistics), with each being modeled by a multivariate normal distribution, i.e., (1) where *p* is the mixture of multiple multivariate normal distributions, each denoted by *f*, and ω = (ω_{1},…, ω_{J})′ are the mixture proportions (i.e., QTL genotype frequencies) of *J* QTL genotypes, which are constrained to be nonnegative and φ = (φ_{1},…, φ_{J})′ are the QTL genotype-specific parameters, and η are parameters that are common to all QTL genotypes.

Now we use the genetic linkage map constructed by molecular markers to detect and map the underlying QTL for PCD over the entire genome. Consider such a segregating QTL with alleles *Q* and *q* in the backcross population. This QTL cannot be directly observed but rather is to be inferred on the basis of known marker information. In QTL interval mapping, we will use two flanking markers to infer the genotype of a QTL that is hypothesized to be located between the two markers (23). The recombination fraction is a linkage parameter that can describe the genetic distance between two given loci. Let θ, θ_{1}, and θ_{2} be the recombination fractions between the two markers *M*_{1} and *M*_{2}, between the left marker *M*_{1} and the QTL, and between the QTL and the right marker *M*_{2}, respectively. On the basis of segregation and transmission of genes from the parent to progeny, one can derive the conditional probabilities of an unknown QTL genotype, conditional on the known marker genotypes, in terms of these recombination fractions. The unknown parameters that specify the position of QTL within a marker interval are arrayed in Ω_{s}, where *s* denotes the QTL position.

According to functional mapping (27), the mixture-based likelihood function of the longitudinal PCD trait (*y*) and marker information (*M*) collected in the backcross population at this hypothesized QTL with two genotypes (denoted by *j* = 1, 0) is constructed as (2) where ω_{1|i} and ω_{0|i} contained in Ω_{s} are the mixture proportions corresponding to the frequencies of different QTL genotypes for a progeny *i*, expressed as the conditional probabilities of QTL genotypes given marker genotypes for this progeny; contains the parameters that model time-dependent means for genotype *j*; and Ω_{v} contains the parameters that model the structure of the residual covariance matrix that is assumed to be common to all mixtures. Different from parameters φ and η in the original mixture model (1), which are unstructured, Ω_{uj} and Ω_{v} are the mathematical parameters that model the mean-covariance structure.

The multivariate normal distribution of each mixture for progeny *i* measured for *T* time points is expressed as (3) where *y*_{i} = [*y*_{i}(1),…, *y*_{i}(*T*)] is a vector of observation for progeny *i* and *u*_{j} = [*u*_{j}(1),…, *u*_{j}(*T*)] is a mean vector for all the progeny with genotype *j*. At a particular time point (say *t*), the relationship between the observation and mean can be described by a linear regression model, where ξ_{i} is an indicator variable of progeny *i* for QTL genotype defined as 1 for *j* = 1 and 0 for *j* = 0 and *e*_{i}(*t*) is the residual error which is *iid* normal with zero mean and variance σ^{2}(*t*). The errors for progeny *i* at two different time points, *t*_{1} and *t*_{2}, are correlated with covariance cov(*y*_{i}(*t*_{1}),*y*_{i}(*t*_{2})). The variances and covariances comprise the covariance matrix **∑**, whose elements are the common parameters specified by Ω_{v}. With these general settings, the statistical challenge becomes how to model the mean process and how to structure the covariance matrix.

### Semiparametric Modeling of Mean Vector

In a broad sense, the entire PCD process for a particular individual *i* can be divided into two phases, growth and death (Fig. 1). Let *t*_{i}* be the transition time point that marks the end of the growth phase and the beginning of the death phase. The mean vector in the multivariate normal distribution of PCD (*Eq. 3*) for individual *i* that carries QTL genotype *j* can now be specified by two mean subvectors expressed as (4) where **u**_{Gj|i} and **u**_{Dj|i} correspond to the growth and death vectors before and after *t*_{i}*, respectively.

#### Parametric model of growth phase.

The process of growth (before *t*_{i}*) follows universal growth laws and can be described by biologically meaningful mathematical functions. As a nearly universal biological law for living systems, the sigmoidal (or logistic) growth function can be fitted to capture age-specific change during the growth phase (34). The logistic function is mathematically described for individual *i* by (5) where *a*_{i} is the asymptotic or limiting value of *g*_{i} when *t* → ∞, *a*_{i}/(1 + *b*_{i}) is the initial value of *g*_{i} when *t* = 0, and *c*_{i} is the relative rate of growth (5). Thus with these three parameters, one can uniquely determine the shape of PCD in the growth phase for individual *i* that carries QTL genotype *j* and have the time-dependent mean vector for *Eq. 4* specified by (6) If different genotypes at a putative QTL have different combinations of the parameters (*a*_{j}, *b*_{j}, *c*_{j}), this implies that this QTL plays a role in governing the difference of growth trajectories.

#### Nonparametric model of death phase.

Because no particular mathematical function can be used to describe the death phase (after *t*_{i}*), the nonparametric approach based on the orthogonal Legendre polynomial (LEP) is used. The flexibility of LEP will greatly increase the robustness of functional mapping.

With appropriate order *r*, the time-dependent genotypic values for different QTL genotypes in the death phase can be fitted by the orthogonal LEP. A family of such polynomials with normalized time *t*′ is denoted by and a vector of genotypic-related, time-independent values with order *r* is denoted by where **v**_{j} is called the base genotypic vector for QTL genotype *j* and the parameters within the vector are called the base genotypic means. The normalized time *t*′ is obtained by adjusting the original measurement time *t* to match the orthogonal function range [−1, 1], by where *t*_{min} and *t*_{max} are the first and last time point, respectively.

With these specifications, the time-dependent genotypic values, *u*_{Dj|i(t)} in the death phase can be described as a linear combination of *v*_{j} weighted by series of the polynomials, i.e., (7) Thus for individual *i* whose QTL genotype is *j*, we use the following expression to model genotype means in the death phase (8) This approach has great flexibility in modeling longitudinal data that cannot be fitted by a parametric model. By choosing an appropriate order, the nonparametric model can better capture the intrinsic pattern of developmental PCD. The number of parameters can be reduced if the order of the polynomial should be less than the number of time points.

### Modeling Covariance Structure

To model the covariance structure for longitudinal data, we need to make the following assumptions: *1*) the error *e*_{i}(*t*) in *Eq. 1* is normally distributed with mean zero and variance σ^{2}(*t*), and *2*) the error *e*_{i}(*t*) is independent and identically distributed among different individuals. A number of statistical models have been used to model the covariance structure (12). In earlier functional mapping, the first-order autoregressive [AR(1)] model was used (27), which is expressed as (9) for the variance, and (10) for the covariance between any two time intervals *t*_{1} and *t*_{2}, where 0 < ρ < 1 is the proportion parameter with which the correlation decays with time lag. The parameters that model the structure of the (co)variance matrix are arrayed in Ω_{v}.

To remove the heteroscedastic problem of the residual variance, which violates a basic assumption of the simple AR(1) model, two approaches can be used. The first approach is to model the residual variance by a parametric function of time, as originally proposed by Pletcher and Geyer (29). However, this approach must implement additional parameters for characterizing the age-dependent change of the variance. The second approach is to embed Carroll and Ruppert’s (7) transform-both-sides (TBS) model into the growth-incorporated finite mixture model (40), which does not need any more parameters. Both empirical analyses with real examples and computer simulations suggest that the TBS-based model can increase the precision of parameter estimation and computational efficiency. Furthermore, the TBS model preserves original biological means of the curve parameters, although statistical analyses are based on transformed data.

The TBS-based model displays the potential to relax the assumption of variance stationarity, but the covariance stationarity issue remains unsolved. Zimmerman and Núñez-Antón (47) proposed a so-called structured antedependence (SAD) model to model the age-specific change of correlation in the analysis of longitudinal traits. The SAD model has been employed in several studies and displays many favorable properties (48). Zhao et al. (46) incorporated the first-order SAD [SAD(1)] model into modeling of the covariance matrix.

In this article, we use a different modeling approach that is as simple as the AR(1) and as flexible as the SAD(1). This approach has two steps. In *step 1*, the intraindividual correlation structure is modeled. In many cases, a systematic pattern of correlation is evident, which may be characterized accurately by a relatively simple model. The intraindividual correlation among the time-dependent elements of *e*_{i} for individual *i* is assumed to follow a pattern, expressed as where the correlation matrix **R**_{i}(φ) is a function of a vector of correlation parameters φ. The correlation structure can be described by the AR(1) model in which φ = ρ (*Eq. 10*).

In *step 2*, time-dependent variances are specified according to Horwitz’s rule in analytical chemistry. This rule proposes that there exists an empirical relationship between concentration and variance (1). Thus we can similarly model time-dependent variances for individual *i* by considering its genotypic means at various time points, expressed as Therefore, the corresponding covariance matrix can be modeled by (11) where . Because the covariance structure is modeled as a function of genotypic mean, it is called the mean-covariance (M-C) model. The M-C model has great flexibility for modeling the covariance matrix of the PCD process. The unknown parameters to be estimated in the M-C model are arrayed in with accepted means.

### Computation Algorithms

The EM algorithm (10) has served as a standard approach for obtaining the maximum likelihood estimates (MLEs) of the parameters in traditional QTL mapping (23). This algorithm has also been used for functional mapping of longitudinal traits to obtain the MLEs of (Ω_{s}, Ω_{u}, Ω_{v}) for the likelihood (*Eq. 2*) (27, 40). The EM algorithm is implemented with two steps: *1*) the E step, in which the posterior probability of a QTL genotype given the marker genotype of progeny *i* is calculated using and *2*) the M step, in which the calculated Π_{j|i} values are used to solve the log-likelihood equations, aimed to estimate (Ω_{s}, Ω_{uj}, Ω_{v}) defining the QTL position, the QTL genotype-specific curve parameters, and the parameters that model the covariance matrix, respectively.

An iterative procedure between the E and M steps will be processed until convergence.

The estimates at convergence are regarded as the MLEs of the unknown parameters. In practice, the QTL position parameter Ω can be viewed as a known parameter because a putative QTL can be searched at every 1 or 2 cM on a map interval bracketed by two markers throughout the entire linkage map. The amount of support for a QTL at a particular map position is often displayed graphically through the use of likelihood maps or profiles, which plot the likelihood ratio test statistic as a function of map position of the putative QTL.

In functional mapping for PCD, Ω_{u} = (Ω_{G},Ω_{D}) and Ω_{v} are contained in complex nonlinear equations, and therefore it is difficult to derive a closed form for their MLEs. The Nelder-Mead simplex algorithm as a direct search method for nonlinear unconstrained optimization, originally proposed by Nelder and Mead (28), can be used to estimate these parameters (45). This algorithm attempts to minimize a scalar-valued nonlinear function using only function values, without any derivative information (explicit or implicit). The algorithm uses linear adjustment of the parameters until some convergence criterion is met.

However, because of the complex nonlinear function being minimized by simplex algorithm, it cannot always guarantee the correct convergence of covariance parameters during the minimization process. This consequently results in negative infinity of the log-likelihood function, and convergence will never be reached. Because of these concerns, we used the simplex algorithm to estimate the mean parameters, namely, the logistic curve and Legendre polynomial parameters, and the EM algorithm to estimate the parameters that model the structure of the covariance matrix (see appendix).

Under the joint modeling framework, two mean functions, growth and death, must be connected. Two constraints are imposed to make the PCD curve continuous at the transition time point *t*_{i}* for each individual. The first constraint is to make the growth mean equal to the death mean at time *t*_{i}*. The second constraint is that the two functions have the same score at time *t*_{i}*. These two constraints are expressed as With these constraints, we obtain the expressions of one growth parameter and one death parameter for any QTL genotype *j*. For example, if the Legendre polynomial order is 3, we can solve the equations to obtain the estimates of *a*_{j} and *v*_{1j} as follows, where *t*_{i} is the adjusted time for *t*_{i} and Δ*t* = *t*_{max} − *t*_{min}.

It is possible that the algorithm described above may generate local maxima for the likelihood surface. An empirical approach for reducing the possibility of local maxima is to use multiple sets of initial values of the parameters. The initial values are determined in the light of parameter estimates from the data by assuming that no QTL is involved. We will obtain the global maxima when no further increase of the likelihood is found in a space of parameters.

### Legendre Order Selection

To determine which order of the LEP best fits the data, we must select the optimal order. One of the popular model selection criteria is the Akaike information criterion (AIC) (1). The AIC value at a particular order *r* is calculated by (12) where and are the MLEs of parameters for the growth curve function and the Legendre polynomial of order *r*, Ω_{v} contains the MLEs of the covariance parameters, and dimension(Ω_{G},Ω_{D},Ω_{v}|*r*) represents the number of free parameters under order *r*. The optimal order is one that displays the minimum AIC value.

Another model selection criterion to determine the optimal order of the Legendre function is the Bayesian information criterion (BIC) (30), which is calculated by (13) where all the parameters are defined similarly as above except that *n* is the total number of observations at a particular time point. Because the BIC adjusts the effect of sample size, the model selected by the BIC will be more parsimonious. Other criteria, such as those proposed for high-dimension parametric models (25), can also be used.

### Calculating Curve Heritability

It is easy to calculate the heritability level (*H*^{2}) when traits are measured at a single time point, but for longitudinal traits heritability calculation is difficult. We propose two ways to do it: *1*) Calculate *H*^{2} at a single time point *t* where the traits show the highest variation. For a backcross design, the genetic variation is given by where *u*_{j}(*t*) (*j* = 1,0) is the genetic mean for genotype *j* at time *t*, and the heritability is calculated, where σ_{ε}^{2}(*t*) is the residual variance at time *t*.

*2*) Calculate *H*^{2} with the area under curve (AUC). Functional mapping maps the dynamic gene effect over time. The genetic variation explained by the entire measurement period is more informative than that by individual time point. We propose to calculate the genetic variation by where AUC_{1} and AUC_{0} are the AUC for two different genotypes. The AUC is calculated by where *t** is the transition time point, *a*_{j}, *b*_{j}, and *r*_{j} are the growth parameters for genotype *j*, **P**_{r}(*t*′) is a vector of LEP with order *r*, *u*_{j} is the base genotypic mean parameters, and *t*′ is the adjusted time point.

### Hypothesis Testing

One of the major advantages of functional mapping is that it allows for a number of hypothesis tests to examine the genetic control mechanisms of growth throughout development and in response to varying environmental or developmental clues. Wu et al. (39) have formulated some of these hypothesis tests, which include the global test of genetic effects on the entire developmental process, the regional test of genetic control over a particular developmental period of interest, and the point test for the timing of developmental events. From a genetic perspective, we can also test how different genetic action modes play a role in regulating the developmental process. With such a complete set of tests, we are able to address biological questions related to the genetic control mechanisms of PCD traits.

Testing whether specific QTL exist to affect the PCD process is a first step toward the understanding of the detailed genetic architecture of complex phenotypes. The genetic control over the entire developmental process of PCD can be tested by formulating the following hypotheses: (14) *H*_{0} states that there is no QTL affecting the dynamic PCD process (the reduced model), whereas *H*_{1a} proposes that such a QTL does exist (the full model). The test statistic for testing the hypotheses is calculated as the log-likelihood ratio of the reduced to the full model as given below: (15) where Ω̃ and Ω̂ denote the MLEs of the unknown parameters under *H*_{0} and *H*_{1a}, respectively. The critical threshold value for declaring the presence of QTL can be empirically calculated based on the permutation tests (9).

Other hypotheses can be made to test whether the detected QTL only controls the growth phase with the following alternative hypothesis: (16) or whether the detected QTL only controls the death phase with the following alternative: (17) The critical thresholds for the above two hypotheses can be determined with simulation studies. Only when both *H*_{1b} and *H*_{1c} are rejected, is the detected QTL thought to pleiotropically affect the growth and death phases.

The proposed model can be used to test the influence of QTL on growth in different stages of development, lag, exponential, declining growth, stationary phase, and death. These tests can be based on the AUC during a time course of interest. Simulation studies are used to determine the critical thresholds for each test.

## RESULTS

### A Worked Example

We use the proposed model here to analyze a real data set of rice. Two inbred lines, semidwarf IR64 and tall Azucena, were crossed to generate an F_{1} progeny population. By doubling haploid chromosomes of the gametes derived the heterozygous F_{1}, a doubled haploid (DH) population of 123 lines was founded (20). Such a DH population is equivalent to a backcross population because its marker segregation follows 1:1. With 123 DH lines, Huang et al. (20) genotyped 135 RFLP and 40 isozyme and RAPD markers to construct a genetic linkage map, based on the Kosambi function, of length 2,005 cM with an average distance of 11.5 cM, representing a good coverage of 12 rice chromosomes.

The 123 DH lines and their parents, IR64 and Azucena, were planted in a randomized complete design with two blocks. Each block was divided into different plots, each containing eight plants per line. Starting from 10 days of transplanting, tiller numbers were measured every 10 days for five central plants in each plot until all lines had headed. We used the means of the two blocks for QTL analysis.

Figure 2 illustrates the dynamics of tiller numbers for each DH line measured at 9 time points. Tiller growth is thought to be an excellent example of PCD in plants (16) because it experiences several developmental stages when rice grows. At an early stage, tiller numbers increase dramatically, corresponding to the vegetative phase in rice. During the reproductive phase, the increase of tiller numbers declines with the initiation of the panicle, the emergence of the flag leaf (the last leaf), and booting, heading, and flowering of the spikelets. Tillers that do not bear panicles are called ineffective tillers and will be killed, leading to the death phase. The number of ineffective tillers is a closely examined trait in plant breeding because they are undesirable for commercial varieties. Ineffective tillers result in many unwanted problems in rice such as the overconsumption of nutrition and competition of space. The genetic control system plays an important role in reducing overproduced tillers and balancing the rice metabolism system for optimal use efficiency of nutrients.

Our semiparametric model was used to map specific QTL that determine the dynamic changes of tiller number during ontogeny. Although the growth phase of tiller number can be well modeled by a logistic equation defined by parameters *a*, *b*,and *c* (*Eq. 5*), no proper equation can be used to model the death phase. For this reason, a nonparametric approach based on the Legendre polynomial function is adopted in the framework of QTL mapping. However, this encounters the issue of order determination. To detect the best order of the Legendre polynomial function for this rice data set, we calculated the AIC and BIC for various orders (Table 1). Both criteria provide the consistent result that the death phase of tiller number can be best explained by a Legendre polynomial of order 3.

By genomewide scanning for QTL at every 2 cM within each marker interval across 12 rice chromosomes, our model identified three major QTL that trigger their effects on the overall PCD process of tiller number. As shown by the genome-wide log-likelihood ratio (LR) profile in Fig. 3, these three QTL are located between markers RG146 and RG345 and between markers RZ730 and RZ801 on chromosome 1 and on marker RZ792 on chromosome 9. Of these three detected QTL, the first is significant genome-wide, whereas the other two are significant chromosome-wide, all at the 5% significance level based on the critical thresholds determined from the permutation tests.

To know more about the behavior of the detected QTL, we tabulated the MLEs of curve parameters that specify the growth phase and genotypic basis effects that specify the death phase, along with the approximate standard errors of the estimates (Table 2). All the parameters that specify the growth and death phases for different QTL genotypes (*j* = 1 for *QQ* or 0 for *qq* in the DH population) are estimated with reasonably high precision as shown by the standard errors, although the estimation precision tends to be better for the growth parameters than for the death parameters (Table 2). The parameters that model the structure of the covariance matrix based on the M-C model can also be well estimated, suggesting good behavior of our model.

Using the MLEs of parameters for the growth and death phases, we draw the developmental trajectories of tiller number for the two different QTL genotypes (Fig. 4). Each QTL shows a unique developmental pattern over time. For example, the dynamic process of genetic effects for the QTL located between markers RZ730 and RZ801 on chromosome 1 is different from those for the other two QTL. Statistical tests based on *Eqs. 16* and *17* show that the QTL detected between markers RG146 and RG345 on chromosome 1 and on marker RZ792 on chromosome 9 merely control the growth phase, whereas the second QTL on chromosome 1 controls the entire developmental process (*P* < 0.05).

### Simulation

We performed a series of simulation studies to examine the statistical properties of the model. Six equidistant markers are simulated from a backcross population and are ordered as *M*_{1}–*M*_{6} on a linkage group with a length of 100 cM. The Haldane map function was used to convert the map distance into the recombination fraction. Different heritability levels (*H*^{2} = 0.1 vs. 0.4) and different sample sizes (*n* = 100 vs. 200) were considered in the simulation study to examine the model’s performances under different scenarios. The putative QTL is located between markers *M*_{3} and *M*_{4}, at 48 cM from the first marker. Data are simulated by assuming that the QTL controls the entire developmental process. The simulated data have nine continuous time points. The means at different time points used to model the covariance matrix based on the M-C model are the average of the two genotypic means.

Table 3 lists the results from the simulation; the true parameters are given in the first column. In general, our model can provide reasonable estimates of the QTL positions and the growth and death parameters determined by the QTL, with estimation precision depending on heritability level and sample size. In all cases of different sample sizes and heritabilities, the maximum values of the LR landscapes from 100 simulation replicates are beyond the critical thresholds at the α = 0.001 level determined from 1,000 permutation tests for the simulated data, suggesting that our model has 100% power to detect QTL in these conditions. The precision of parameter estimation is evaluated in terms of the square root of the mean squared errors (SMSE) of the MLEs. The QTL positions and effects can be better estimated when the PCD trait has a higher than lower heritability or when the sample size is larger rather than smaller (Table 3). However, the increase of *H*^{2} from 0.1 to 0.4 leads to more significant improvement for the estimation precision than the increase of *n* from 100 to 200. For example, the SMSE of the growth parameter *c*_{0} for QTL genotype *qq* reduces by more than onefold when *H*^{2} is increased from 0.1 to 0.4 for a given sample size, whereas such reduction is much smaller when *n* is increased from 100 to 200 for a given heritability. This suggests that in practice it is more important to manage experiments to reduce residual errors (and therefore increase *H*^{2}) than to simply increase sample size.

## DISCUSSION

The growth of any tissue, whether normal or malignant, is determined by the quantitative relationship between the rate at which cells proliferate and the rate at which cells die. Depending on how the rate of cell proliferation is compromised or coordinated with the rate of cell death, all tissues will undertake two distinct processes, growth or death, throughout their development (21). Unlike the detailed framework for cell proliferation, understanding of the initiation of cell death and the cellular mechanics of this process is still in its infancy. Cell death can occur as an active and orderly process of development, a process described by the term programmed cell death (PCD) (19).

PCD, also referred to as apoptosis, appears to be a universal feature of animal development, and abnormalities in PCD have been associated with a broad variety of human diseases, including certain cancers and neurodegenerative disorders (17). In plants, PCD is also ubiquitous for essential development and survival, including xylogenesis, reproduction, senescence, and pathogenesis (16). To make PCD control and execution efficient, particular genetic mechanisms should be involved in regulating and modulating this process in response to various developmental and environmental stimuli. The use of organisms with simple structure like the nematode *C. elegans* and *Drosophila* has led to the identification of numerous genes responsible for PCD (13, 18, 19. 43). The 2002 Nobel Prize in Physiology or Medicine was awarded to three scientists because of their discoveries of the genetic regulation of PCD (19).

Although the use of simple model systems can provide a wealth of information about the genetics of PCD for more complicated organisms like animals and higher plants, some key questions cannot be addressed well without a direct use of these organisms. Recent development of high-throughput molecular technologies has made it possible to generate a massive amount of genetic and genomic data for any organism almost without limit. This thus presents a pressing need for the development of vigorous, detailed, and analytical approaches to unravel the genetic control and regulation mechanisms that underlie the PCD process. In this article, we have for the first time developed a statistical model that can make a systematic scan of QTL for PCD across the entire genome with a well-covered genetic linkage map. This model has been validated by a real example for the PCD process of tiller number in rice. Three QTL were detected to affect tiller number trajectories during a growing season in the field. The locations for two of the QTL detected on chromosome 1 are consistent with those estimated from basic interval mapping of single traits (42), but the third QTL detected on chromosome 9 was previously undetected by interval mapping. It seems that our model has not been able to detect many other QTL detected by Yan et al. (42), which may be due to the difference in the threshold criteria used to claim the existence of a QTL between the two approaches.

The rationale for this PCD mapping model is similar in spirit to that established in earlier functional mapping of growth curves (27, 39–41, 46). Both types of models for functional mapping integrate the mathematical aspects of biological principles into the framework for QTL mapping constructed on the basis of Gaussian mixture models. Their significant advantages compared with traditional genetic mapping models (23) lie in increased model flexibility, stability, and statistical power for QTL detection (reviewed in Ref. 36). This is because functional mapping only needs to estimate a few number of mathematical parameters that define the curve, rather than estimating many genotypic means at all time points. Functional mapping implemented with PCD can provide a quantitative platform for testing the interplay between gene actions and PCD processes. The statistical power of functional mapping is further increased by taking advantage of structuring the covariance matrix with a much lower number of parameters.

It can also be seen that the PCD mapping model proposed here is different from earlier published functional mapping approaches purely based on parametric modeling. The present model splits the PCD process into two sequentially distinct phases—growth and death. As in a parametric model, universal growth laws (34) are used to model the growth phase, whereas nonparametric modeling is performed for the death phase. This combination of parametric and nonparametric models, referred to as semiparametric modeling, is aimed at overcoming the problem of a death phase that cannot be described mathematically. Although it is feasible for a nonparametric approach to model any form of curve including the entire growth-death curve as in Fig. 1, this approach does not make full use of biological information contained in the growth phase and, therefore, is likely to lose the advantages of parametric functional mapping in the biological interpretation of results (see Ref. 39).

The nonparametric part of our semiparametric model is based on Legendre polynomial approaches. As shown by Kirkpatrick and Heckman (22), Legendre polynomials have several favorable properties for curve fitting which include *1*) the functions are orthogonal; *2*) they are flexible to fit sparse data; *3*) higher orders are estimable for high levels of curve complexity; and *4*) computation is fast because of good convergence. Other nonparametric regression methods using kernel estimates have been considered for the mean structure of growth curve data by Altman (2), Boularan et al. (6), Wang and Ruppert (33), and Ferreira et al. (14).

Relative to nonparametric modeling of the mean structure, nonparametric covariance modeling has received little attention. Leonard and Hsu (24) derived a Bayesian approach for nonparametric estimates of the covariance structure. Diggle and Verbyla (11) used kernel-weighted local linear regression smoothing of sample variogram ordinates and of squared residuals to provide a nonparametric estimator for the covariance structure without as assuming stationarity. It is appealing to incorporate these nonparametric or semiparametric approaches into our functional mapping framework to increase the model’s flexibility.

## APPENDIX

The MLEs of the parameters Ω = (Ω_{u}, Ω_{v}) are derived as follows, with the symbol * denoting the estimates of parameters from the previous step. The values of (Ω_{u}* Ω_{v}*) estimated from the following equations will be used to provide new estimators of (Ω_{u}, Ω_{v}) in the next step.

The first derivative of the log-likelihood function in *Eq. 2* with respect to specific parameter Ω_{λ} is given by where we define (A1) The MLEs of the parameters contained in (Ω_{m}, Ω_{v}) are obtained by solving (A2) However, the MLEs cannot be obtained directly because of the mixture distribution problem. The EM-Simplex algorithm was applied to estimate the parameters. For the M-C covariance model with AR(1) correlation structure, we have ∑ = σ^{2}*u***R**(ρ)*u*, where *u* = diag{*ū*^{2}(1),…, *ū*^{2}(*T*)}, *ū*(*t*) refers to the average mean for the two genotypes at time *t*, and **R**(ρ) is the AR(1) correlation matrix. This model has the following properties: and Therefore, by solving *Eq. A2*, we have and where

### Estimation Procedures

*1*) E step: Given initial values for (Ω_{u}, Ω_{v}), calculate the posterior probability matrix Π = Π_{j|i} in *Eq. A1*.

*2*) M step: With the mean parameters contained in Ω_{u} from the previous step, calculate the mean vector **u** and update the covariance parameters σ̂^{2} and ρ̂. These updated covariance parameters are used in the simplex step to maximize the mean parameters contained in Ω_{u}.

*3*) The above procedures are iteratively repeated until a certain convergence criterion is met. The converging values are the MLEs of the parameters.

## GRANTS

This work was partially supported by National Science Foundation Grant 0540745 to R. Wu.

## Acknowledgments

We thank Dr. Jun Zhu for providing the rice data set that made it possible to write this manuscript.

## Footnotes

Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).

Address for reprint requests and other correspondence: R. Wu, Dept. of Statistics, Univ. of Florida, Gainesville, FL 32611 (e-mail: rwu{at}stat.ufl.edu).

- Copyright © 2006 the American Physiological Society