Our objective was to resequence insulin receptor substrate 2 (IRS2) to identify variants associated with obesity- and diabetes-related traits in Hispanic children. Exonic and intronic segments, 5′ and 3′ flanking regions of IRS2 (∼14.5 kb), were bidirectionally sequenced for single nucleotide polymorphism (SNP) discovery in 934 Hispanic children using 3730XL DNA Sequencers. Additionally, 15 SNPs derived from Illumina HumanOmni1-Quad BeadChips were analyzed. Measured genotype analysis tested associations between SNPs and obesity and diabetes-related traits. Bayesian quantitative trait nucleotide analysis was used to statistically infer the most likely functional polymorphisms. A total of 140 SNPs were identified with minor allele frequencies (MAF) ranging from 0.001 to 0.47. Forty-two of the 70 coding SNPs result in nonsynonymous amino acid substitutions relative to the consensus sequence; 28 SNPs were detected in the promoter, 12 in introns, 28 in the 3′-UTR, and 2 in the 5′-UTR. Two insertion/deletions (indels) were detected. Ten independent rare SNPs (MAF = 0.001–0.009) were associated with obesity-related traits (P = 0.01–0.00002). SNP 10510452_139 in the promoter region was shown to have a high posterior probability (P = 0.77–0.86) of influencing BMI, fat mass, and waist circumference in Hispanic children. SNP 10510452_139 contributed between 2 and 4% of the population variance in body weight and composition. None of the SNPs or indels were associated with diabetes-related traits or accounted for a previously identified quantitative trait locus on chromosome 13 for fasting serum glucose. Rare but not common IRS2 variants may play a role in the regulation of body weight but not an essential role in fasting glucose homeostasis in Hispanic children.
- DNA resequencing
- body mass index
- single nucleotide polymorphisms
- insulin receptor substrate 2
insulin receptor substrate 2 (IRS2) is a compelling candidate gene for obesity, insulin resistance, and Type 2 diabetes given its role in insulin signal transduction and the manifestation of obesity, insulin resistance and β-cell failure in Irs2−/− mice (30), and yet human studies of common IRS2 variants have not demonstrated consistent effects (2–4, 34). IRS2 is a cytoplasmic signaling protein that mediates effects of insulin, insulin-like growth factor-1, and cytokines by acting as a relay protein between diverse receptor tyrosine kinases and downstream effectors (35, 36) and thereby accounting for the diverse actions of insulin (18). Structurally, IRS2 has many modified residues (phosphotyrosines, phosphoserines, and phosphothreonines) and a CpG island in the promoter region, which suggests high regulatory potential. Highly expressed in the hypothalamus as well as pancreatic islets, prefrontal cortex, adrenal cortex and adipocytes, IRS2 may play a key role in the integration of central control of energy homeostasis with peripheral insulin action and β-cell function (11, 36).
Human linkage and association studies have not demonstrated a consistent effect of IRS2 variants on insulin resistance or Type 2 diabetes (2–4, 19). Study cohorts, however, have been relatively small, and IRS2 genotyping has been limited. In Caucasian families with early-onset Type 2 diabetes, no linkage or mutations involving IRS2 were identified in 220 diabetic patients and 146 controls (3). In 150 Ashkenazi Jewish families, linkage was not detected between Type 2 diabetes and IRS2 (19). In a Danish cohort, IRS2 SNPs (codons 1057 and 879) were not associated with Type 2 diabetes or insulin secretion (2, 4). In 193 Italians with Type 2 diabetes and 200 controls, the same single nucleotide polymorphism (SNP) (codon 1057) was protective against Type 2 diabetes in lean individuals (25). In French Caucasian women, SNP (codon 1057) was associated with severe obesity (22). Bottomley et al. (9) sequenced the coding region of IRS2 in 161 European nonobese and obese adults with severe insulin resistance. Eight rare, nonsynonymous variants were identified, but there was no clear evidence that these variants had a pathogenic effect on insulin resistance.
Findings from our Viva La Familia genome-wide scan suggested that variation in IRS2 may contribute to childhood obesity and its comorbidities (13). We identified a highly significant quantitative trait locus (QTL) on chromosome 13q [logarithm of the odds (LOD) = 4.6] for fasting serum glucose and suggestive linkage for sICAM-1 (LOD = 2.5), total T4 (LOD = 1.7), and ghrelin (LOD = 1.2). Pleiotropy was found between fasting serum glucose and insulin and c-peptide, as might be expected given the pivotal role of insulin in glucose homeostasis.
The contribution of IRS2 variants to the high susceptibility for obesity and Type 2 diabetes among Hispanic children has not been investigated. In 2007–2008, 20.9% of Hispanic children, aged 2–19 yr, were classified as obese [≥95th percentile for body mass index (BMI) for age] (27). In 2001, the prevalence of Type 2 diabetes in Hispanic youth, aged 10–19 yr, was 0.48 cases per 1,000 (24). A positive family history was common for Type 2 diabetes (83%), and minorities were more likely to have a positive family history than non-Hispanic whites (odds ratio 1.96, confidence interval 1.69–2.27) (15).
Hence, the specific objectives of our study were: 1) to resequence exonic and intronic segments and 5′ and 3′ flanking regions of IRS2 (∼14.5 kb) to identify SNPs in 934 Hispanic children; 2) to perform measured genotype analysis to test for associations between genotype and the obesity- and diabetes-related traits; and 3) to perform Bayesian quantitative trait nucleotide (BQTN) analysis to test combinations of variants to statistically identify potentially functional genetic variants influencing obesity and diabetes risk in Hispanic children.
MATERIALS AND METHODS
Study Design and Subjects
Subjects (n = 934) were from 319 Hispanic families enrolled in the Viva La Familia Study in 2000–2004. Anthropometric and body composition measurements were performed on parents and children. Blood samples were drawn for biochemistry profiling of the children and genotyping of children and parents. Subjects and study procedures are described in detail in a previous publication (12). All children and their parents gave written informed consent or assent. The protocol was approved by the Institutional Review Board for Human Subject Research for Baylor College of Medicine and Affiliated Hospitals and the Texas Biomedical Research Institute.
In brief, the Viva La Familia Study cohort consisted of a total of 631 parents and 1,030 children. The majority of the parents were either overweight (34%) or obese (57%). Type 2 diabetes was reported in 11 and 8% of the mothers and fathers, respectively. Fifty-one percent of the children were above the 95th BMI percentile with z-scores ranging from 2.3 to 4.5 (21).
Anthropometry and Body Composition
Body weight to the nearest 0.1 kg was measured with a digital balance and height to the nearest 1 mm was measured with a stadiometer. BMI was calculated as weight/height2 (kg/m2). Waist circumference was measured at a level midway between the inferior border of the rib cage and superior border of the iliac crest with a nonstretchable tape measure. Total body estimates of fat-free mass (FFM), fat mass (FM), and percent FM were measured by dual-energy x-ray absorptiometry using a Hologic Delphi-A whole-body scanner (Delphi-A; Hologic, Waltham, MA).
A blood sample was drawn in the morning after a 12-h fast. Serum samples were obtained from whole blood after clotting. Fasting serum concentration of glucose was measured by enzymatic-colorimetric techniques using the GM7 Analyzer (Analox Instruments, Lundeburg, MA) [coefficient of variation (CV) = 2.4%]. Commercial radioimmunoassay kits were used to measure fasting serum concentrations of insulin (CV = 7.1%), C-peptide (CV = 4.2%), and leptin (CV = 4.6%) (Linco Research, St. Charles, MO) and ghrelin (Phoenix Pharmaceuticals, Belmont, CA) (CV = 7%). Total and free triiodothyronine (T3) were measured by radioimmunoassay kits (Diagnostic Products, Los Angeles, CA) (CV = 6.85, 9.5%). C-reactive protein (CRP) was measured using a quantitative sandwich enzyme immunoassay (American Laboratory Products, Windham, NH) (CV = 11.6%). Plasma soluble intercellular adhesion molecule-1 (sICAM-1) was measured using a quantitative sandwich enzyme immunoassay (CV = 6.1%) (R & D Systems, Minneapolis, MN). Triglycerides were assayed enzymatically using lipase, glycerol kinase, glycerol phosphate oxidase, and peroxidase supplied by Thermo Electron (Louisville, CO). Total cholesterol and high-density lipoprotein (HDL) were determined using cholesterol esterase, cholesterol oxidase, and peroxidase supplied by Thermo Electron. Insulin resistance was estimated by the HOMA (homoeostatic model assessment) method that is computed as HOMA = [fasting serum insulin (mU/ml) * fasting serum glucose (mmol/l)]/22.5 (26).
IRS2 is a 1,338-amino acid protein with a well-conserved pleckstrin homology domain at the extreme NH2 terminus, followed by a phosphotyrosine binding domain that binds to phosphorylated NPXY motifs, and a COOH-terminal region with multiple tyrosine phosphorylation sites (35). The IRS2 gene has 32,731 bp. Exonic and intronic segments, and 5′ and 3′ flanking regions (a total of 29 amplicons) were sequenced in 934 children. IRS2 has one main coding exon ∼4 kb in length and a short coding piece in the 3′ exon. There are three conserved regions extending through 1.5 kb upstream from the IRS2 promoter and a prominent CpG island. Validated primer pairs for this project were used to amplify exonic and intronic segments, and 5′ and 3′ flanking regions from 934 samples of DNA and a negative and positive control per 384-well plate quadrant [8 no-DNA polymerase chain reaction (PCR) control wells included for quality control].
The target sequences were amplified in ∼500 bp fragments. Exon specific primer pairs were designed using the program Primer 3 (http://frodo.wi.mit.edu/). Universal M13 forward or reverse primer tails were attached to each amplicon-specific primer. PCR reactions were established using the Qiagen PCR kit, with individual amplicon size of 270–550 bp. PCR products were analyzed on 2% agarose gels and purified using ExoSap (USB, Cleveland, OH). Bidirectional sequencing of each fragment was performed separately using big dye terminator sequencing chemistry on 3730XL DNA Sequencers (Applied Biosystems, Foster City, CA). The average amplicon sequencing success rate was 92%. The sequences were analyzed using the program SNPdetector v. 3.0 (38). Variant bases were annotated with Reference SNP IDs based on dbSNP build 131. SNPs were confirmed manually by viewing raw sequence traces in CONSED (16).
Our resequencing strategy was to interrogate as much of the IRS2 gene region as possible within our budget. Our above resequencing strategy was designed to cover the coding regions, the promoter region, and a modest number of 5′- and 3′-untranslated region (UTR) conserved noncoding regions. Because we opportunistically had additional SNP data from the Illumina panel, we chose to augment coverage in the 5′- and 3′-UTR regions. Hence, 15 SNPs located in the IRS2 region were selected: three SNPs are just outside the sequenced region, one upstream and two downstream. The other 12 SNPs are located within the intron sequence of IRS2 not captured by the primer design. Although our intent for using the Illumina chip data was not as an additional quality control step, six additional SNPs on the Illumina chip were also identified by sequencing.
These 15 additional SNPs in the intronic and 5′- and 3′-UTRs regions were genotyped as part of the Illumina HumanOmni1-Quad v1.0 BeadChips using the Infinium HD single-base extension (SBE) assay (Illumina, San Diego, CA). Genomic DNA was whole genome amplified, fragmented, and hybridized to the BeadChips where SBE occurs using fluorescence-labeled nucleotides. Fluorescence intensities were detected by scanning the BeadChips on the Illumina BeadStation 500GX and analyzed with the GenomeStudio software from Illumina. Cluster calls were checked for accuracy. For an assessment of consistency in allele calling, replicate samples were included on our genotyping plates. Further quality control assessment is performed using sample-dependent and sample-independent controls included on the Illumina BeadChips.
All traits were transformed to meet the assumption of normality before entering the genetic analyses to avoid inflating type 1 error. Age, age2, sex, and their interactions were used as covariates and simultaneously estimated in these models. All genetic analyses were performed using the Sequential Oligogenic Linkage Analysis Routines (SOLAR) computer package (1). Bivariate analyses were conducted to partition the phenotypic correlations (ρP) between two traits into additive genetic (ρG) and environmental correlations (ρE). Genotype frequencies for each SNP were estimated allowing for nonindependence due to kinship (5, 6) and were tested for departures from Hardy-Weinberg equilibrium. Estimates of linkage disequilibrium between SNPs were determined by calculating pair-wise D′ and r2 statistics.
Measured genotype analysis.
As a first step in investigating the association between the SNPs in IRS2 and our obesity- and diabetes-related traits, we employed a measured genotype analysis (8), as implemented in SOLAR (1). This approach extends the classical variance components-based biometrical model to account for both the random effects of kinship and the main effects of SNP genotypes. Variance components are modeled as random effects, e.g., additive genetic effects and random environmental effects and the effects of covariates such as age and sex are modeled as fixed effects on the main trait. For each SNP, we compared this saturated model with a null model in which the main effect of the SNP is constrained to zero. The test statistic, twice the difference in loge (likelihood) between the saturated model and the SNP-specific null, is distributed as χ2 with one degree of freedom.
The measured genotype analysis provides the prior probability that each SNP is associated with a trait of interest. However, it is not known a priori whether a gene influencing a given trait has one common variant affecting the trait in all pedigrees or multiple variants with potentially interacting effects. BQTN analysis was implemented in SOLAR to statistically identify the most likely functional SNPs associated with a trait (5). For a chosen set of SNPs, this approach evaluates the likelihood of all possible association models, while providing a penalty for overparameterization, to identify the subset of SNPs with the highest posterior probability of association with the trait. Although the major purpose of BQTN analysis is to statistically identify the most likely functional variant, it has also been proved extremely useful in cases where a subset of SNPs have been genotyped.
Bayesian model selection identifies the set of variants that optimally predicts the trait. In a Bayesian framework, two competing hypotheses (models) are compared by evaluation of the Bayes factor, which is the ratio of the integrated likelihoods of the competing models (20). For each hypothesis, a Bayesian information criterion (BIC) is defined with reference to a single null model (in our case, the random effects model without the SNPs) and is used to assess whether the BQTN model explains sufficient variation in the trait to justify the number of parameters used (6). The magnitude of the BIC difference provides an estimate of the evidence of support for one model over another. For example, a BIC difference greater than two units provides support for one model over another with a posterior probability of 75%. The BQTN approach also accounts for model uncertainty and thus provides the posterior probability that each variant is associated with the trait of interest. This variant-specific posterior probability is a measure of the evidence that a particular variant is likely to be functional or in high linkage disequilibrium (LD) with a functional variant. This approach has been shown to provide accurate determination of functional variants in conditions where the variants have been identified (5, 6, 14, 31).
Obesity- and Diabetes-Related Traits
Obesity- and diabetes-related traits of the 934 nonobese and obese children in the Viva La Familia cohort are summarized in Table 1. The additive genetic correlations between the obesity- and diabetes-related traits are shown in Table 2. The overall phenotypic correlation between fasting glucose and BMI in this cohort was statistically significant (ρP = 0.20, P = 5.71E-09). However, the bivariate analyses indicated that the phenotypic correlation was not driven by the genetic correlation. The additive genetic correlation between fasting glucose and BMI was not statistically significant [ρG = 0.10 (0.13), P = 0.45], whereas the environmental correlation was significant [ρE = 0.30 (0.10), P = 0.004].
Resequencing And Genotyping
A total of 140 SNPs were identified with minor allele frequencies (MAF) ranging from 0.001 to 0.47. Of the 140 SNPs, 35 (25%) are described in the database dbSNP [National Center for Biotechnology Information (NCBI), build 131]. Forty-two of the 70 coding SNPs result in nonsynonymous amino acid substitutions relative to the consensus sequence; 28 SNPs were detected in the promoter, 12 in the intron, 28 in the 3′-UTR, and 2 in 5′-UTR. The genotypic distributions of all 140 SNPs were in Hardy-Weinberg equilibrium. Thirty-three SNPs were removed from further consideration due to high LD [correlation (ρ) ≥0.9]. The 40 SNPs with MAF ≥0.01 are presented in Table 3; all 140 SNPs are presented online as Supplemental Material.1
Two insertion/deletions (indels) were detected in the promoter region at chromosomal positions 109237583 and 109237738 (NCBI build 131) with MAFs corresponding to 0.317 and 0.016, respectively.
Measured Genotype Analysis
Measured genotype analysis was performed between the 140 identified SNPs and two indels and the obesity- and diabetes-related traits. Ten SNPs were found to be associated with the measured traits at nominal P values ≤0.01, uncorrected for multiple testing (Table 4, Fig. 1). Associations were observed between IRS2 variants and weight, BMI, waist circumference, FFM, FM, and fasting serum levels of HDL, total and free T3, CRP, sICAM-1, leptin, and ghrelin. With respect to our original QTL, three SNPs (9967402_100, rs73606275, and rs35927012) were marginally associated with fasting serum glucose (P values = 0.02–0.05) but did not withstand correction for multiple testing. The indels were not significantly associated with any of the obesity- and diabetes-related traits.
The 10 nominally associated SNPs were analyzed by the BQTN method to estimate the posterior probabilities that any one variant or combination of variants has an effect on the measured traits. SNP 10510452_139 in the promoter region was shown to have a high posterior probability (P = 0.77–0.86) of influencing BMI, fat mass, and waist circumference in Hispanic children. The minor allele for SNP 10510452_139 was detected in two unrelated pedigrees among five obese children, all with BMIs >99th percentile. The proportion of variance contributed by the SNPs is shown in Table 5 for all 10 nominally significant SNPs. SNP 10510452_139 contributed the highest percentage of the variance (between 2 and 4%) in body weight and composition.
Resequencing of IRS2 in 934 Hispanic children predisposed to obesity and Type 2 diabetes revealed multiple, rare variants that may exert an effect on body weight regulation. Our experimental approach to identify likely functional polymorphisms within IRS2 involved enumeration of genetic variants in a large set of resequenced individuals, followed by measured genotype analysis and Bayesian model selection and averaging. For each putative SNP, a posterior probability of effect was obtained to prioritize SNPs for future intensive molecular functional analysis. Our resequencing strategy designed to cover the coding regions, the promoter region, and a modest number of 5′- and 3′-UTR conserved noncoding regions represents the largest effort to date to enumerate IRS2 variants in a human population.
Our major study finding suggests a role of IRS2 in energy homeostasis and body weight regulation in Hispanic children. Strong associations were seen between IRS2 variants and weight, BMI, waist circumference, FFM, and FM. Our BQTN analysis statistically identified a novel variant in the promoter region that may be involved in the regulation of body weight. This SNP (10510452_139) was detected in two unrelated pedigrees among five children with severe obesity (BMIs >99th percentile). Our findings may be specific to the Hispanic population, since frequencies of rare variants in a given genomic region may vary greatly from one ethnic group to another due to different evolutionary histories including genetic drift and bottlenecks (7). Although this rare variant was detected only in a few families, on a population basis it could contribute between 2–4% of the variance in body weight and composition.
The mechanism by which genetic heterogeneity in the IRS2 gene may impact energy homeostasis and body weight regulation is unclear. However, evidence from Irs2−/− mice support a role of Irs2 in the hypothalamic control of energy homeostasis and body weight regulation (11, 37). Female Irs2−/− mice consumed 30% more food, weighed 20% more, and stored twice the amount of body fat than controls. Mechanistically, leptin-stimulated phosphorylation of STAT-3 in the hypothalamus was inactivated in the Irs2−/− mice, disrupting appetite control and body weight regulation. Acting as an integrator of energy homeostasis, IRS2 could underlie the close relationship between obesity, peripheral insulin resistance, and eventual β-cell failure (36). Our results in Hispanic children are consistent with the possibility that partial dysregulation of IRS2 may contribute to the early development of obesity that predisposes to Type 2 diabetes.
In contrast to findings from murine models (30) but perhaps more consistent with human studies (2–4, 19), our analysis did not reveal any statistically significant associations between IRS2 variants and fasting serum glucose, insulin, C-peptide, or HOMA-IR. Consequently, IRS2 variants did not account for our original QTL on chromosome 13q for fasting serum glucose. The lack of association between IRS2 variants and fasting serum glucose might be attributable to compensatory overlap between IRS1 and IRS2 proteins (10, 18) or mimicking by environmental and physiological factors that affect glucose metabolism (36) or, indeed, another gene responsible for the QTL. This QTL encompasses 27 genes, half of which are hypothetical proteins of unknown function. In fact, the individuals harboring the rare IRS2 variants did not belong to the families contributing to the linkage signal for glucose. Pleiotropy was not seen between fasting glucose and body weight, suggesting different, independent genetic effects are acting on these two traits. Even though IRS2 was a strong positional candidate gene for our QTL for fasting glucose, the identified genetic variants were not associated with fasting glucose nor did they account for our linkage signals. This “negative” finding is important in and of itself; furthermore, our investigations led us to identify an IRS2 promoter variant that statistically has a high likelihood of being effective in the regulation of body weight and composition.
Our resequencing of IRS2 represents the largest effort to date to enumerate IRS2 variants in a human population. Until now, resequencing of IRS2 had been restricted to the coding region. Our resequencing effort demonstrated considerable within locus variant heterogeneity of IRS2. In addition to 70 coding SNPs, SNPs were detected in the promoter (28), intron (12), 3′-UTR (28), and 5′-UTR (2). Two indels also were detected in the promoter region. The common alleles identified in IRS2 appeared to be neutral, whereas rare variants were associated with disease susceptibility, supporting the “common disease rare variant hypothesis,” that there is extreme allelic heterogeneity for complex traits and that multiple rare variants with moderate to high penetrance may be responsible for complex diseases (7, 17, 23, 28, 29). A significant proportion of the inherited susceptibility to common chronic diseases may be due to the summation of the effect of a series of low-frequency dominantly and independently acting variants of different genes, each conferring a moderate but detectable increase in relative risk (7).
Our study has demonstrated the utility of deep resequencing of genes for identification of low-frequency alleles contributing to complex diseases. Understanding the genetic architecture of complex diseases such as obesity or diabetes is of great interest to the biomedical community. While whole genome association studies based on genotyping predominantly high-frequency SNPs have identified SNPs associated with obesity and diabetes, the variants account for only a small fraction of observed phenotypic variation (32, 33). In contrast, deep resequencing of putative genes has the potential to reveal low-frequency alleles contributing to complex diseases, such as the novel variant detected in the IRS2 promoter region in this study.
Rare but not common IRS2 variants may play a role in the regulation of body weight but not an essential role in fasting glucose homeostasis in Hispanic children.
This work was supported by the National Institutes of Health (NIH) Grant DK-080457 and the USDA/ARS (Cooperative Agreement 6250-51000-053). Work performed at the Texas Biomedical Research Institute in San Antonio, TX, was conducted in facilities constructed with support from the Research Facilities Improvement Program of the National Center for Research Resources, NIH (C06 RR-013556, C06 RR-017515).
The contents of this publication do not necessarily reflect the views or policies of the USDA, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.
No conflicts of interest, financial or otherwise, are declared by the author(s).
We thank all the families who participated in the Viva La Familia Study. The authors acknowledge the contributions of Mercedes Alejandro and Marilyn Navarrete for study coordination; and Sopar Seributra for nursing; and Theresa Wilson, Maurice Puyau, Firoz Vohra, Anne Adolph, Nitesh Mehta, Roman Shypailo, JoAnn Pratt, Maryse Laurent, Grace-Ellen Meixner, Margie Britten, and Maria del Pilar Villegas for technical assistance.
The authors' responsibilities were as follows: N. F. Butte, S. A. Cole, and A. G. Comuzzie participated in the conception, design, analysis, data interpretation, and redaction of the manuscript; D. M. Muzny, D. A. Wheeler, and R. A. Gibbs performed the resequencing; K. Chang and A. Hawes were responsible for the informatics; V. S. Voruganti and K. Haack participated in the genetic statistical analyses. All authors played a role in the data interpretation and writing of the manuscript and approved the final version of the manuscript.
This work is a publication of the U.S. Department of Agriculture/Agricultural Research Service (USDA/ARS) Children's Nutrition Research Center, Department of Pediatrics, Baylor College of Medicine and Texas Children's Hospital, Houston, Texas.
↵1 The online version of this article contains supplemental material.