Significant variation exists for maternal nurturing ability in inbred mice. Although classical mapping approaches have identified quantitative trait loci (QTL) that may account for this variation, the underlying genes are unknown. In this study, lactation performance data among the mouse diversity panel were used to map genomic regions associated with this variation. Females from each of 32 inbred strains (n = 8–19 dams/strain) were studied during the first 8 days of lactation by allowing them to raise weight- and size-normalized cross-foster litters (10 pups/litter). Average daily weight gain (ADG) of litters served as the primary indicator of milk production. The number of pups successfully reared to 8 days (PNUM8) also served as a related indicator of maternal performance. Initial haplotype association analysis using a Bonferroni-corrected, genome-wide threshold revealed 10 and 15 associations encompassing 11 and 13 genes for ADG and PNUM8, respectively. The most significant of these associated haplotype blocks were found on MMU 8, 11, and 19 and contained the genes Nr3c2, Egfr, Sec61g, and Gnaq. Lastly, two haplotype blocks on MMU9 were detected in association with PNUM8. These overlapped with the previously described maternal performance QTL, Neogq1. These results suggest that the application of in silico QTL mapping is a useful tool in discovering the presence of novel candidate genes involved in determining lactation capacity in mice.
- genome-wide association study
lactation is the most energetically demanding stage of a female's lifetime (42). In rodents, the demands of lactation on the mother are twofold: 1) production of milk for the suckling pups, and 2) increased heat dissipation in response to higher food intake. Maternal food intake doubles with the onset of lactation, and there is significant growth in both the mammary gland and the maternal gut to support these demands. The identification of genes underlying these complex changes has been challenging, not only in rodents but also in dairy animals, which are more amenable to direct measurements of milk production.
The contribution of genetic factors to lactation has been extensively studied in agricultural species such as the cow and sheep (2, 10, 17, 20, 30, 34, 37, 47, 54, 60, 62, 68, 75). In the cow, heredity accounts for 25–50% of the variation in milk production traits (21). As a consequence genotype-phenotype associations mapped as quantitative trait loci (QTL) for lactation-associated traits have been found on all 29 autosomes (10). However, only a few of the causal genes for these QTL have been identified. The most noteworthy of these are Acyl-CoA:diacylglycerol acyltransferase 1 (Dgat1), growth hormone receptor (Ghr), ATP binding cassette sub-family G member 2(Abcg2), and osteopontin (Spp1) (4, 14, 23, 69). Similar limitations have been observed with the analysis of lactation or “maternal nurturing ability” in the mouse (44, 57, 71, 72, 79). To date at least eight QTL have been described in association with maternal nurturing ability in mice. In addition, there are estimated to be ∼15 epistatic interactions around 23 QTL distributed across the mouse genome (57). Despite these discoveries, the genes responsible for these QTL remain to be identified.
Recent advances in genomic sequence analysis and single nucleotide polymorphism (SNP) discovery now provide new and potentially more efficient resources for the discovery of QTL candidate genes using inbred mouse strains such as the mouse diversity panel (MDP) and the inbred laboratory mouse haplotype map (39, 50). These publicly available databases, have allowed for the development of a novel gene mapping approach known as in silico genome-wide association (GWA) mapping (24). Although not without controversy (12, 16), in silico GWA has now successfully been used in several instances to either refine previously described QTL intervals or identify completely new QTL genes (26, 45, 51, 52, 61, 73). The key advantage to this approach is the speed inherent in simply phenotyping existing individuals from defined panels of inbred strains (24). A major criticism of the approach is a tendency to produce false positive associations because of inherent “population structure” within the genomes of the classical inbred strains (6–8, 35). However, effective computation strategies have been developed to overcome this limitation (35).
Our laboratory has used litter cross-fostering approaches to measure relative milk production in the mouse (27–29). These approaches have allowed for indirect measurement of milk production capacity during all stages of the cycle with a degree of increased sensitivity and precision that has not been present in previous studies. The overall goal of this study was to collect quantitative data on maternal nurturing capacity from mouse strains within the MDP and use these data with in silico GWA mapping to discover novel genes that may contribute to the variation in milk production. In our analysis, we employed a correction for kinship that has been shown by others to provide an effective means of removing effects due to population structure (35, 39). On the basis of this work we confirmed the existence of one previously discovered litter gain QTL and we describe three potentially new ones.
MATERIALS AND METHODS
Males and females (6–10 wk old) from each of 31 inbred mouse strains were purchased from the Jackson Laboratories (Bar Harbor, ME). The strains used in this study were CZECHII/EiJ, PL/J, A/J, MA/MyJ, RIIIS/J, SJL/J, DBA/1J, C58/J, LP/J, BUB/BnJ, CBA/J, DBA/2J, SM/J, CAST/EiJ, AKR/J, SWR/J, I/LnJ, C57BL/6J, NZW/LacJ, CE/J, C57L/J, LG/J, KK/HlJ, 129S1/SvImJ, BALB/cByJ, C3H/HeJ, NON/LtJ, NOD/LtJ, FVB/NJ, SEA/GnJ, and BTBR T+ tf/J. The QSi5 strain was also included in this study as these mice have previously been described as being highly fecund and capable of rearing much larger litters than other inbred strains (33, 64, 65). This strain was obtained from Dr. Ian Martin of The University of Sydney, rederived into the Children's Nutrition Research Center mouse facility, and then maintained as an inbred line. To provide cross-foster litters, CD-1 females (Charles River Laboratories) were mated alongside the experimental females. To minimize variation, all CD-1 cross-foster donor dams used in this study originated from the Charles River's Raleigh, NC, production facility.
The experiments described in this paper were conducted in accordance with procedures outlined in the NIH Guide to Care and Use of Experimental Animals, and they were approved by the Baylor College of Medicine Animal Care and Use Committee. All animals were maintained at an ambient temperature of 21°C, and a light-dark cycle of 14 h:10 h. To allow all females used in the study to complete one lactation, and to provide females for subsequent study, individual matings were set up between males and females within each strain. The animals were fed the 2020X pelleted diet (Harlan Teklad, Indianapolis, IN) through day 15 of their second pregnancy. After the females were allowed to complete their first lactation they were mated a second time to CD-1 males (Charles River Laboratories, Wilmington, MA) and then studied during the second lactation. At 15 days of pregnancy, study females were switched to a powdered form of the diet AIN-93G (Harlan Teklad), which was dispensed in funnel feeders (Research Products International, Mt. Prospect, IL). Each female was also given a single nestlet pad (Ancare, Bellmore, NY) on day 15 of pregnancy to provide a standardized material for nest building. On day 1 postpartum, the number of pups born to each female was counted and litters were weighed. Each female then had her own litter replaced with a litter made up of 10 1-day-old CD-1 pups. The cross-foster litters were adjusted so that all had similar weight. Litter weight and pup number were then recorded for each of the females daily through day 8 postpartum. On day 10 postpartum, all study animals were euthanized for tissue collection. The phenotypes for this study were cross-foster litter weight gain and cross-foster litter pup number during the first 8 days postpartum. Additional phenotypes were measured, but will be the subject of future manuscripts.
Data processing, quality control, and descriptive statistics.
Data were summarized using the explore function of IBM SPSS version 19 (IBM, Armonk, NY). Outliers were identified and removed by using extreme studentized deviate method (66). Strain effects were then tested for using the univariate GLM function in SPSS. Covariates were not used in the analysis. To identify strains with unusually high or low phenotypic values, the 95% confidence interval (CI) about the mean for the entire study group was used compared with the 95% confidence intervals for individual strain means. All phenotypic data are presented as the mean ± 95% CI.
Broad sense heritability estimates for pups born, litter gain, and pup survival were calculated from the sums of squares in the ANOVA analysis as previously described (52).
SNP data source for the in silico haplotype mapping.
The Primary SNP data used in the association mapping for these studies was obtained from the Inbred Laboratory Mouse Haplotype Map resource(http://www.broadinstitute.org/mouse/hapmap/), and consisted of 132,285 SNPs mapped (132k set) across 94 strains according to National Center for Biotechnology Information build 37 of the mouse genome.
Construction of haplotype blocks.
With the 132k set, data were first filtered to exclude nonpolymorphic SNPs or those with minor-allele frequency of < 0.1, then linkage disequilibrium (LD) blocks were constructed in Haploview (3) using the four-Gamete Rule option, which generated a greater number of LD blocks compared with the other options. The tag SNPs in each LD block were also identified and marked and the LD block information was exported as text files, separately for each chromosome. These tag SNP files were then used to extract the haplotype for each strain, using R codes (74). The code is available on request from Peter C. Thomson (University of Sydney). One haplotype file was generated for each chromosome.
Construction of the similarity matrix.
Unless the genetic similarity across mouse strains is taken into account, an evaluation of association between haplotypes (or individual SNPs) and phenotypic strain means may result in spurious associations. To accommodate this, a genetic similarity matrix (S) was derived across all mouse strains in the database. The correction is essentially the same as the previously described kinship matrix approach (35). This was constructed by considering for each pair of mouse strains, the proportion of the genome they have in common, as assessed by the haplotypes. A weighting is included in this calculation for the number of tag SNPs. Formally, the calculation method for the strain similarity, sij between mouse strains i and j is where nijk = number of tag SNPs in haplotype block k; H = total number of haplotype blocks over the genome; and where
To evaluate the association between the phenotype (strain mean) and the haplotype, adjusting for strain similarity, a mixed-model framework was used, namely
where y = vector of phenotypic values, X = incidence matrix incorporating terms for the observed haplotype and any other fixed effects; β = vector of effects corresponding to these terms, Z = incidence matrix linking phenotype records to genetic groups, u = vector of polygenic genetic effects, and ε = vector of residual random errors.
The random errors are assumed to be independent with constant variance σε2, i.e., var(ε) = σε2 I, whereas the values of u are assumed to be correlated, according to the strain similarity, i.e., var(u) = σu2S. (Since only a subset of the strains were used here because of phenotypic information availability, a subset of the matrix S was extracted.) Model fitting is conducted iteratively using Henderson's mixed model equations (32), with a REML procedure used to estimate σε2 and σu2 at each iteration, as well as obtain estimates of β (and predictions of u). The effect of the haplotype is obtained by extracting the required elements of β̂, say β̂k. The significance of the haplotype association was assessed using a likelihood ratio χ2 test, comparing the fit of the model with (full) and without (reduced) the haplotype. The degrees of freedom for each test were the number of different haplotypes in the block less one. The mixed model analysis was conducted with user-written code in R (74). The code is available on request to Peter C. Thomson (University of Sydney). Haplotype associations were considered significant at a genome-wide P value threshold of 2.75 × 10−6 based on the Bonferroni correction of α/n, where n was the number of blocks tested (18,124) and α was 0.05.
Individual SNP association.
To confirm the results from the haplotype analysis and to focus on individual SNPs within the genome, an efficient mixed-model analysis was conducted as previously described (35) using the UCLA EMMA webserver (http://mouse.cs.ucla.edu/emmaserver/). The genome-wide significance threshold for this analysis was P = 10−5.
Real-time PCR analysis of gene expression.
Real-time PCR was used to measure the mRNA levels for genes identified from the genome-wide association study (GWAS) analysis as potential candidates. The genes in this analysis included epidermal growth factor receptor (Egfr), guanine nucleotide binding protein Q polypeptide (Gnaq), nuclear receptor subfamily 3, group C, member 2 (Nr3c2), and Sec61γ (Sec61g). The primer sequences for these genes are as follows: ATGCGAAGACGTCACATTGTT and CCATCACATAGGCTTCGTCA for Egfr, TGAACACAATAAGGC TCATGC and TGTAGGCAGAT AGGAAGGGTC for Gnaq, ACTTGCCTCTTGAGGACCAA and TTTGGAACTGTGCTCAGTAGC for Nr3cs, and GCAGTTTGTGGAGCCCAGCC and TGAGAAGGACTCAGCCACCCACAA for Sec61g. Total RNA from mammary tissue collected from lactating dams (five per strain) at day 10 postpartum was isolated with TRIzol reagent (Invitrogen) as per manufacturer's instruction. Reverse transcription PCR was performed using Iscript Reverse Transcription (Bio-Rad catalog 170-8841) according to the manufacturer's manual. In brief, 1,000 ng total RNA was used for each 20 μl reaction. The reverse transcription was incubated at 25°C for 5 min, followed by 42°C for 30 min and 85°C for 5 min. Quantitative (q) PCR was performed in the Bio-Rad CFX 96 Real Time System under the following conditions: 20 μl reaction contained 50 ng of cDNA template, 125 nM of each primer, and 1× SsoAdvanced SYBR Green Supermix (Bio-Rad, catalog 1725261). Samples were denatured at 95°C for 30 s, followed by a 40 cycle-PCR or 95°C for 5 s, 65°C for 30 s. Upon completion of 40 cycles the samples were heated to 72°C for 5 min and followed by dissociation analysis to determine amplicon size. Reference genes were chosen based on the presence of stable expression and a high pair-wise correlation using the Bestkeeper software (59). Relative mRNA levels were normalized and expressed as 2−ΔCT using a Bestkeeper index based on the CT of the two reference genes nucleolin (Ncl) and peptidylprolyl isomerase A (Ppia), which are both highly stable and displayed a Pearson's correlation of 0.7. The primer pairs for Ncl and Ppia were CATGGTGAAGCTCGCAAAG and TCACTATCCT CTTCCACCTCCT, and GAGCTGTTTG CAGACAAAGTTC and CCCTGGCACATGAATCCTGG, respectively.
Analysis of litter gain and pup survival.
A total of 419 dams were placed on the study at the beginning of their second parity by giving them weight-normalized litters of 10 CD-1 pups at day 1 postpartum. Of these, 379 completed the study without complications. Of these, 358 completed the study with at least 1 pup remaining by day 8 postpartum. The traits analyzed were number of pups reared to day 8 postpartum (PNUM8), and average daily litter gain (ADG) during the first 8 days postpartum. The broad-sense heritability estimate for PNUM8 was 0.32. For average daily gain, the heritability estimate was 0.47. For both phenotypes, strain accounted for a highly significant proportion of the observed variation (all P values <0.0001). In addition both PNUM8 and ADG were highly correlated (P < 0.0001) with a Pearson's r value of 0.85. The distribution strain means for ADG (Fig. 1A) ranged from −0.71 ± 1.37 to 4.59 ± 0.53 g/day (mean ± 95% CI) and an overall mean of 2.57 ± 0.19 g/day (95% CI, 2.37–2.76 g/day). Extreme strains at the upper end of the distribution included QSi5, BTBRT + tf/J, and SEA/GnJ. Strains that were extremes at the lower end of the distribution included RIIIS/J, MA/MyJ, and A/J. Overall, the average PNUM8 was 8.98 ± 0.24 (95% CI: 8.74–9.22). This trait showed a skewed distribution that was the result of limiting the cross-foster litter size to 10. When compared across strains, means for this trait ranged from 6.46 ± 2.38 to 10.0 ± 0.00 (Fig. 1B). Of the 32 strains studied, eight maintained all 10 pups out through day 8 postpartum. Extreme strains at the lower end of the distribution included A/J, MA/MyJ, and CZECHII/EiJ.
Phenotypic distributions in relation to strain relatedness.
To visualize variations in phenotypic performance in relation to genetic relatedness, a heat map illustrating phenotypic performance quintiles for ADG and PNUM8 aligned to a dendrogram based on genetic relatedness (Pearson's distance, d) was constructed (Fig. 2). Inspection of the figure suggested that low performance was not confined to any particular cluster of strains but could be found in at least four different branches and in some cases closely related to strains with high performance. There did appear to be a high-performance cluster (average d = 0.41) that contained NON/LtJ, FVB/NJ, NOD/LtJ, and QSi5. Interestingly, within this cluster there were also strains, BUB/BnJ, SJL/J, and SWR/J, that displayed average to below-average performance on the two traits. In addition, SEA/GnJ and BALB/cByJ are both somewhat closely (average d = 0.39) related to A/J, which is a strain that displayed below-average performance. The two strains BTBRT + tf/J and 129S1/SvImJ, though both above average performance and somewhat closely related (d = 0.42), are distinct (d = 0.6) from the other high performance strains in the set. Although the CZECHII/EiJ and CAST/EiJ strains are quite distinct from the rest of the strains it appears that they may differ in their capacity to rear litters and they are also quite different from each other genetically (d = 0.65). Unfortunately, we were unable to obtain more than a single second-parity dam from the CAST/EiJ strain during the course of the experiment. However, the fact that pup survival was so high suggested that further work with this strain may be informative.
Identification of regions associated with ADG and PNUM8.
The application of Haploview generated a total of 18,124 haplotype blocks with block sizes ranging from 1 bp to 499 kbp with an average of 50.75 ± 0.51 kbp (Supplemental Table S1).1 The distance between haplotype blocks ranged from 0.02 to 20,798.75 with an average of 89.79 ± 1.99 kbp. The density of haplotype blocks across the genome ranged from 0 to 141 with an average of 69 ± 2 blocks per 10 Mbp. The highest average haplotype density was found for chromosome 9 (83 ± 10), while the X chromosome had a density of 8 ± 1 blocks per 10 Mbp.
To focus on only the strongest haplotype associations, a genome-wide Bonferroni-corrected P value of 2.75 × 10−6 was used. This threshold allowed for the detection of 10, and 15 associations for ADG (Fig. 3A), and PNUM8 (Fig. 3B), respectively (Table 1). Although calculating the effect size for each of the blocks in this analysis was not practical because many of the blocks had more than two haplotypes, phenotypic means for each haplotype in the associated blocks (Supplemental Table S2) were used to calculate a percent change as an estimate of relative effect size (Table 1). This allowed for prioritizing the associations on either P value or relative effect size. The initial work described here is based on using P values to prioritize the blocks. Overall, the most significant (P = 10−10) associations were found for PNUM8 on MMU11 (Fig. 4A). On this chromosome there were six blocks exhibiting strong association (Table 1). Of these there were three that contained genes. A block at MMU11:16392211 (P = 2 × 10−6) was 53 kbp and contained the gene Sec61g (Fig. 4B). A single SNP at MMU11:16509601 (P = 2 × 10−9) was located between the Sec61g and Egfr genes. There was also a block at MMU11:1000111136 that was 18 kbp and overlapped with Krt17 and Krt42. For the region adjacent to Egfr, an independent association analysis based on individual SNPs confirmed the significance of the haplotype associations by detecting a cluster of highly significant single SNP beginning at MMU11:156483344 and ending at 15606840 (Fig. 4B). Although none of these SNPs were located within any known genes, a few were present in an 18 kbp region that was 145 kbp upstream of Egfr and 79 kbp downstream of Sec61g. Further analysis using the software package Haploview indicated that although the region between these two genes had a generally high LD (Fig. 4B), there could be as many as five independent blocks across the region. Furthermore, follow-up analysis using the UCSC Genome Browser (http://genome-test.cse.ucsc.edu/index.html) failed to indicate the presence of significant regulatory elements in this region. Analysis of individual SNPs in the block containing the Krt17 and Krt42 genes failed to demonstrate significant associations. Therefore these genes were not studied further.
The second most highly significant (P = 2×10−9) haplotype association was between ADG and a 142 kbp block located at MMU19:16323845 (Table 1, Fig. 5A). This block was also significantly associated with PNUM8 (P = 1×10−7) and contained the gene Gnaq. Association analysis based on single SNPs (Fig. 6B) demonstrated the presence of a cluster of significantly associated SNPs that were primarily located toward the 3′-end of the Gnaq gene. The region containing these SNPs was also found to have a high LD (Fig. 5B). All of the associated SNPs were contained within introns.
The third most highly significant (P = 5×10−9 and P = 1×10−6 for PNUM8 and ADG, respectively) association detected in this analysis was a 64 kbp block at MMU8:79373410 (Table 1, Fig. 6A). This block contained the gene Nr3c2. The analysis of individual SNPs on this region of the chromosome identified three SNPs that approached the genome-wide threshold of P = 10−5 (Fig. 6B). Of these, two were located in the gene for a hypothetical protein, GM10649, and the third was located in the first intron of Nr3c2. The SNPs in this region also had high LD.
Of the remaining haplotype associations that reached the Bonferroni-corrected genome-wide significance threshold (Table 1), there were two adjacent blocks on MMU9 (P = 10−6) that encompassed three genes and reside in a region previously reported to contain Neogq1, a QTL linked to variations in maternal nurturing capacity in the mouse (44). There were also blocks on MMU1, 2, 4, 6, 7, and 16 that encompassed a total of 14 genes including one encoding for the microRNA, namely, miR1902.
To determine if the associations detected for the blocks containing Egfr, Gnaq, Nr3c2, and Sec61g were linked to alterations in mammary gland gene expression, we conducted real-time RT-PCR on a subset of five dams from each strain with mammary tissue that was collected at the end of the study period on day 10 postpartum (Fig. 7). To control for technical variation, the mRNA levels for two reference genes, Ncl and Ppia, were also measured. Comparison of Ct values for these two genes across the sample set demonstrated them to be both stable and correlated. The geometric mean Ct for the reference genes was then employed to calculate a normalized ratio (2−ΔCt) for each gene of interest. Although the mRNAs for all four of these genes were present in mammary RNA from these day 10 lactating dams, the transcript for Sec61g had the greatest average abundance (1.4 ± 09) when normalized to the reference gene mRNAs (Fig. 7D). The average abundance of Gnaq mRNA was 0.03 ± 0.001 (Fig. 7C), while that of Egfr (Fig. 7A) and Nr3c2 (Fig. 7B) was 4 × 10−3 ± 2 × 10−4 and 6.5 × 10−4 ± 3.6 × 10−5, respectively. Abundance for all of these genes varied significantly both across and within strains. However, rather than observing positive correlations between ADG and PNUM8 there were only low negative correlations detected. Significant (P < 0.05) negative Pearson's correlations were detected between ADG and the mRNA levels for Egfr (r = −0.16, P = 0.04) and Nr3c2 (r = −0.20, P = 0.02). A significant negative correlation was also observed for PNUM8 and the mRNA level for Nr3c2 (r = −0.24, P = 0.006). These results support the conclusion that although the haplotype blocks containing these genes are associated with ADG and PNUM8, the effects of these associations are not necessarily mediated through effects on the expression of these genes in the mammary gland during an established lactation.
In the mouse, litter weight gain is a well-documented and robust indicator of milk production (41, 70). The trait is also quite complex with the potential to be influenced not only by the volume of milk produced but also by milk composition, litter size, and maternal behavior. At present, the relative contribution of these factors to litter gain is not well documented. Our own preliminary analysis of these ancillary traits in the MDP suggests that although maternal behavior and milk composition do have some influence on the variability in litter gain, further study of these is necessary to make firm conclusions. A similar case can be made that traits such as maternal food intake and body mass may also contribute to strain-dependent variation in litter gain. Despite this complexity, litter gain in mice is a heritable trait. This fact was illustrated by the analysis of variance results, which demonstrated that strain contributed significantly (P < 0.0001) to the variation in both ADG and PNUM8. Although for PNUM8, the true variation across strains is likely to be larger than that observed in this study. The fact that in eight of the 32 strains studied all of the dams were able to successfully rear all of their cross-foster pups suggests that the upper extreme of the distribution for PNUM8 was not defined. This does not invalidate the trait with regard to the GWA results, but it does suggest that any associations should be viewed as alleles that are linked to decreased maternal rearing ability rather than for higher than average ability.
The heritability of litter weight gain in mice has been reported to be 0.33–0.35 (57, 79). Our own estimates of heritability are not tremendously different from these and suggest a significant genetic contribution to both ADG and PNUM8. In agreement with these observations, studies have demonstrated that pup weight gain during the suckling period, which largely depends on milk production, can be increased through selection (33, 53, 65). As a consequence it was not surprising to observe significant interstrain variation in both pup survival and litter gain. An important point to mention is that three of the strains used in our study, LG/J, SM/J, and KK/HlJ, have also been used for previous work on QTL associated with maternal nurturing ability in mice (71, 79). That these three strains were not in the extremes for ADG illustrates the fact that the additional strains used in this study have documented a broader range of phenotypic diversity than previously appreciated. Interestingly, not all of our highest performing strains came from a similar genetic background. In fact, of the top five strains only three, QSi5, FVB/NJ, and NOD/Ltj have a common origin from the Swiss subgroup of mice (49, 58). The other two, SEA/GnJ and BTBR T + tf/J, have origins that are both distinct from each other and from Swiss mice (58). Of all of these five strains, the QSi5 resulted from direct selection for fecundity. This produced a strain with increased mature body size, larger-than-average litter size, and increased lactation capacity (33, 64, 65, 77). With regard to identifying lactation QTL, the availability of this highly fecund strain, in combination with other strains in the dataset, now provides a great resource from which to expand on previous lactation QTL studies.
Previous lactation QTL studies in mice have revealed the presence of eight QTL intervals. These are on chromosomes 1, 5, 6, 7, and 9. The intervals for these QTL are quite large, ranging in size from 12 to 36 Mbp and containing a total of 1,336 potential candidate genes (44, 57, 71, 72, 79). Consequently the identities of the causal genes underlying these QTL are still unclear. When first introduced in 2001, in silico mapping based on classical inbred strains was suggested to be a computational approach that would markedly accelerate the genetic analysis of murine disease models (24). Subsequent work also suggested that this approach would significantly narrow QTL intervals to <0.5 Mbp (45). A number of studies have also had success with this approach (26, 43, 45, 46, 61, 76). The main advantages of the approach over the traditional linkage mapping are; 1) no requirement for time-consuming matings to generate intercross progeny, 2) no requirement for genotyping, 3) greater resolution due to increased marker density and greater number of historic recombinations, and 4) an ability to capture a broader range of genetic diversity (76). The main disadvantage of the approach is that genetic relatedness among the MDP is high, producing a population structure which can introduce systematic bias leading to false-positive associations if not appropriately accounted for in the analysis (35, 39). A second potential disadvantage is the potential for limited power to detect loci associated with quantitative traits (56). With regard to the power issue, the 32 strains used in our study seem to have provided adequate power to detect potential candidates. With regard to the population structure issue, our use of a similarity matrix as a term in the model should correct for this. There have now been a number of studies using a kinship matrix to correct for population structure in GWAS with classical inbred strains (5, 18, 63, 80). Lastly, the fact that overlap occurred between the current GWAS and that of at least one previously described lactation QTL (44) suggests that the candidate regions discovered here will be well worth pursuing.
Before further discussing the biological significance of the associations detected in this study, is it is important to note that the GWA analysis described here is based only on SNP. Consequently some of the variation in phenotype may have been due to genomic variation that was unaccounted for in the present analysis. In addition, the possibility exists that SNPs detected in this study, though associated with the phenotype may not be causative, but simply in high LD with other genomic variants that are causative. This will be discussed further.
The use of the Bonferroni correction screened out all but the most significant associations producing a list of 25 blocks containing a total of 19 genes. These were prioritized for follow-up based on two criteria, P value and the presence of overlap between the two traits. The most interesting among these was the one at MMU11:16509601, which was in close proximity to the gene for Egfr. Previous work comparing mammary gene expression between QSi5 and CBA suggested that MAPK signaling and EGF may contribute to differences in lactation performance (64). Consequently, the regions identified on MMU11 in close proximity to Egfr are very interesting. The other gene close to this association was Sec61g, which encodes a subunit of the Sec61 complex a major component of the protein translocation apparatus within the endoplasmic reticulum (22, 31). Although not well studied in mammalian systems, variations in Sec61 function within the lactating mammary cell could impact milk production through potential effects on protein trafficking. Egfr is a strong candidate from the standpoint that EGF family members are known to be important to mammary gland development and lactation. Targeting of AR/EGF/TGF-α produced a lactation defect that was linked to both reduced mammary development and decreased secretory cell differentiation (48). The genes for all of the Erbb family members are expressed in the lactating mammary gland and the receptors are phosphorylated, indicating that these pathways are highly active (67). In addition, a natural mutation in the EGF receptor (waved-2) trans-membrane domain has been demonstrated on at least one genetic background to produce a lactation defect, making the region on MMU11 near the Egfr a very strong candidate region for follow up (19).
The associated block on MMU19 was also a very attractive candidate because it was detected for both ADG and PNUM8. The only gene within this block, Gnaq, encodes for an α-subunit in the Gq class that is an integral component of heterotrimer G protein signaling through G protein-coupled receptors. Recently, variations in a haplotype within the human Gnaq promoter have been linked to insulin resistance in mammary adipocytes isolated from obese women with polycystic ovary syndrome (40). In addition, studies in mice have demonstrated that targeted deletion of Gnaq within the forebrain abolishes maternal nurturing behavior in a Gα11-deficient mouse line (78). These observations support the notion that the block on MMU19 could play a significant role in maternal nurturing ability either through potential effect on insulin signaling within the mammary gland or possibly through impacting maternal behavior. This will be addressed in future studies.
The association on MMU8 was also very interesting from the standpoint that Nr3c2, which encodes for the mineralocorticoid receptor, has been implicated as playing a potential role in the regulation of mammary cell differentiation. In this regard both glucocorticoids and mineralocorticoids have been known for some time to play an important role in mammary cell differentiation and lactation (1, 15). More recent work in lactating mice has also illustrated the importance of glucocorticoids as well as the potential for the mineralocorticoid receptor to play a role in lactation (13, 38). These observations suggest that a region on MMU8 near Nr3c2 may also be important with regard to variation in milk production.
An important consideration with all of the above genes is that although the blocks they were contained in had highly significant P values that were also confirmed with individual SNP analysis, our estimates of effect sizes did not rank them at the top. It is also important to note that none of the SNPs detected in our top three regions were present in known regulatory or coding regions. The fact that none were in coding regions was not surprising given that previous work cataloging the locations of allelic variants suggests that these tend to be overrepresented in intergenic regions (25, 36). It was, however, puzzling that mRNA levels for our four most promising candidate genes were not correlated to ADG. This lack of correlation could be interpreted in one of several ways. Firstly, it may be that day 10 of lactation was not the correct time to study the expression of these genes and that differences in expression during earlier stages of mammary gland development may have been more strongly correlated to ADG. For Egfr, this is clearly a possibility, since this receptor family plays a major role in earlier stages of mammary gland development. Secondly, for some of the candidate blocks, it may be that the polymorphism is acting in an organ other than the mammary gland. Although we only collected mammary tissue from the animals in the current study we hope to address this issue in future studies with a subset of animals taken from the extremes of our litter gain distribution. Lastly it is possible that the SNPs identified here are not causative, but in LD with as of yet unidentified polymorphisms. A number of other types of genomic variation have increasingly come under scrutiny with regard to both disease traits and quantitative traits. In this regard recent sequencing work has not only catalogued >12 million SNP in classical inbred mouse strains, but also >2 million structural variants including insertions, deletions, and transposable elements (36). In addition, studies have now linked the presence of copy number variants (CNV) to effects not only on gene expression in various tissues, but also on metabolic traits such as blood metabolites and body composition (9, 55). In this regard, further analysis of the sequence surrounding the regions of interest identified in this study would be helpful as would microarray analysis of mammary tissue gene expression from these inbred strains. A more extensive database of CNV within the inbred mouse strains would also provide a useful tool for follow-up on the results presented here.
In summary, the application of previously existing SNP datasets in conjunction with the inbred mouse mapping panel has allowed for the identification of several new regions within the mouse genome that contain genes of potential importance to the variation observed in lactation performance. Follow-up studies are needed on these regions to confirm this analysis and further our understanding of factors that contribute to variations in lactation and possibly other aspects of mammary gland biology.
This work was funded by grants from the National Institute of Child Health and Human Development Grant #1R21HD-059746-01A1 and the USDA/ARS (cooperative agreement #6250-51000-054).
No conflicts of interest, financial or otherwise, are declared by the author(s).
Author contributions: D.L.H., J.W., and P.W. conception and design of research; D.L.H., J.W., W.O., and L.A.H. performed experiments; D.L.H., J.W., W.O., L.A.H., A.R., P.C.T., and M.S. analyzed data; D.L.H. and P.W. interpreted results of experiments; D.L.H. prepared figures; D.L.H. drafted manuscript; D.L.H., P.C.T., and P.W. edited and revised manuscript; D.L.H., J.W., P.C.T., and P.W. approved final version of manuscript.
The authors thank Dr. Ian Martin for providing the QSi5 strain. The authors thank Dr. Junfeng Tong and Valerie Lee for assistance in collecting the phenotype data. Thanks to Dr. Mathew Pletcher for providing the heritability calculator. Thanks also goes to Dr. Monique Rijnkels for a critical reading of the manuscript and to Linda Kemper for editorial suggestions.
↵1 The online version of this article contains supplemental material.