Proteomic profiling of nuclei from native renal inner medullary collecting duct cells using LC-MS/MS

Dmitry Tchapyjnikov, Yuedan Li, Trairak Pisitkun, Jason D. Hoffert, Ming-Jiun Yu, Mark A. Knepper


Vasopressin is a peptide hormone that regulates renal water excretion in part through its actions on the collecting duct. The regulation occurs in part via control of transcription of genes coding for the water channels aquaporin-2 (Aqp2) and aquaporin-3 (Aqp3). To identify transcription factors expressed in collecting duct cells, we have carried out LC-MS/MS-based proteomic profiling of nuclei isolated from native rat inner medullary collecting ducts (IMCDs). To maximize the number of proteins identified, we matched spectra to rat amino acid sequences using three different search algorithms (SEQUEST, InsPecT, and OMSSA). All searches were coupled to target-decoy methodology to limit false-discovery identifications to 2% of the total for single-peptide identifications. In addition, we developed a computational tool (ProMatch) to identify and eliminate ambiguous identifications. With this approach, we identified >3,500 proteins, including 154 proteins classified as “transcription factor” proteins (Panther Classification System). Among these, are members of CREB, ETS, RXR, NFAT, HOX, GATA, EBOX, EGR, MYT1, KLF, and CP2 families, which were found to have evolutionarily conserved putative binding sites in the 5′-flanking region or first intron of the Aqp2 gene, as well as members of EBOX, NR2, GRE, MAZ, KLF, and SP1 families corresponding to conserved sites in the 5′-flanking region of the Aqp3 gene. In addition, several novel phosphorylation sites in nuclear proteins were identified using the neutral loss-scanning LC-MS3 technique. The newly identified proteins have been incorporated into the IMCD Proteome Database (

  • vasopressin
  • transcription
  • aquaporin

the inner medullary collecting duct (IMCD) is the final portion of the renal collecting duct system. It is responsible for the controlled reabsorption of water from the tubule lumen into the interstitial space of the kidney for eventual return to the bloodstream. The main controlling factor is the peptide hormone vasopressin. Vasopressin mediates rapid regulation of IMCD water permeability by triggering redistribution of the water channel protein aquaporin-2 (Aqp2) from a largely intracellular location to the apical plasma membrane through vesicular trafficking (30). Addition of water channels to the apical plasma membrane increases its permeability to water, allowing accelerated osmotic water transport. In addition to this “classic” mode of regulation, vasopressin has long term effects on the renal collecting duct to increase the total abundance of the Aqp2 protein (10) as well as that of its basolateral counterpart aquaporin-3 (Aqp3) (13).

Vasopressin has been demonstrated to increase transcription of the Aqp2 gene (22, 28, 52), resulting in increased levels of Aqp2 mRNA (12, 16, 51) and protein (31) in kidney tissue. Water restriction increases, while water loading decreases Aqp2 mRNA levels in rat kidney (9, 29, 40). Administration of an orally acting vasopressin V2 antagonist decreased Aqp2 mRNA (16), a finding subsequently confirmed by Christensen et al. (9) and Murillo-Carretero et al. (29). Christensen et al. (9) showed in addition that treatment of rats with a vasopressin V2 receptor antagonist caused renal Aqp2 mRNA levels to fall within 30 min.

In addition to Aqp2, Aqp3 gene expression is regulated by vasopressin as well, with marked increases in levels of Aqp3 mRNA (12, 29) and Aqp3 protein (13, 45). However, the role of transcriptional mechanisms is largely unexplored for Aqp3.

The transcriptional network that governs long-term responses to vasopressin is largely unknown. Recently, we used mRNA profiling (Affymetrix) of native rat IMCD and mouse mpkCCD collecting duct cells, coupled with computational analysis (Genomatix) to identify conserved transcriptional regulator binding site motifs in the 5′-flanking region of the Aqp2 gene to identify transcriptional regulators (TRs) that potentially regulate Aqp2 gene expression (54). The findings demonstrated SF1, NFAT, FKHD, ETS, RXR, AP2, CREB, GATA, SRF, HOX, and EBOX family TR binding sites as likely components of the transcriptional network responsible for regulation of Aqp2 gene transcription. Although these conserved binding site motifs can predict what TR families may be involved in transcriptional regulation of Aqp2 and other genes, identification of the actual TR proteins expressed in the IMCD is a necessary intermediate step to the design of studies needed to fully resolve transcriptional regulatory networks involved in regulation of Aqp2 gene expression. We have previously carried out extensive transcriptomic profiling of IMCD cells using Affymetrix arrays to learn what transcripts are expressed in the IMCD (46) (see “IMCD Transcriptome Database”: However, relative transcript levels are not necessarily predictive of the level of the corresponding proteins in cells and direct detection by mass spectrometry is desirable. The low abundance of many TR proteins relative to other categories of proteins has made them difficult to detect by protein mass spectrometry and the current IMCD Proteome Database (33) contains only a few TR proteins such as Stat2 and Pax8, detected in whole cell analysis. Notably, a transcription factor whose role is already established, namely Creb1 (22, 28, 52), was not detected using proteomic methods. This problem can be addressed biochemically by isolating nuclei from IMCD cells to enrich nuclear proteins prior to mass spectrometric identification, making it more likely that TRs and other nuclear proteins will be detected. In this study, we have isolated nuclei from native rat IMCDs and carried out proteomic and phosphoproteomic profiling using 1-D SDS-PAGE followed by LC-MS/MS. To increase the number of proteins identified we utilized a computational protocol that includes a combination of three search algorithms (SEQUEST, InsPecT, and OMSSA) for matching mass spectra to rat protein sequences. To limit false-positive identifications we used target-decoy analysis to set the false discovery rate (FDR) to <2% of the total for single-peptide identifications. In addition we developed a novel computational tool (ProMatch) to identify ambiguous identifications, i.e., tryptic or chymotryptic peptides that are the same in two or more proteins. Finally, since regulation of transcription is in part mediated by regulated phosphorylation of various nuclear proteins, we also carried out MS-based experiments to identify phosphoproteins in nuclei from native IMCD cells.


IMCD Sample Preparation and Isolation of Nuclei

Twenty male Sprague-Dawley rats were euthanized (National Heart, Lung, and Blood Institute Animal Care and Use Committee-approved protocol H-0110) and their renal inner medullas (IMs) were resected and pooled. The IMs were minced and digested in a solution containing 250 mM sucrose, 2,000 units/ml hyaluronidase, and 3 mg/ml collagenase B for 75 min at 37°C with continuous stirring. IMCDs were sedimented at 70 g for 20 s, and the supernatant containing non-IMCD elements was discarded. IMCD pellet was resuspended in a homogenization solution [250 ml sucrose solution containing phosphatase inhibitor (Halt phosphatase inhibitor, Thermo Scientific, Waltham, MA) and protease inhibitor (Complete mini, Roche Diagnostics, Indianapolis, IN)]. Previous studies have shown that IMCDs isolated by this technique are at least 85% pure (46), viable, and vasopressin-responsive (35).

Isolated IMCDs were suspended in the homogenization solution and homogenized using a Potter-Elvehjem motor-driven homogenizer on ice for a total of 2 min in 15 s intervals. The homogenized solutions were centrifuged at 1,000 g for 30 min at 4°C. The supernatant from the previous step, which contained the cytoplasmic fraction of lysed IMCD cells, was kept. The cell lysis solutions Cytoplasmic Extraction Reagent I and II (CER I and CERII) from the NE-PER Nuclear and Cytoplasmic Extraction Reagents kit (Thermo Scientific) were then added, according to instructions, to the pellet now containing nuclei and unbroken IMCD cells followed by centrifugation for 5 min at 16,000 g. CERI and CERII were again added to the pellet at half the original volumes and the sample was centrifuged as above. All supernatants containing the cytoplasm of lysed cells were pooled (designated “Cyto”). Nuclear extraction solution was then added to the pellet, and the sample was centrifuged according to the kit instructions. The supernatant from the final centrifugation contained soluble nuclear proteins (designated “NE”) and the pellet contained nuclear membrane/ER proteins (designated “NP”).

One-dimensional SDS-PAGE and In-gel Digestion

One-dimensional (1-D) SDS-PAGE was performed on samples from NE, NP, and Cyto fractions (400 μg protein each) using three separate 10% polyacrylamide Ready Gels (Bio-Rad, Hercules, CA). The gels were stained with Imperial Protein Stain (Thermo Scientific). Each gel was sliced into 40 fractions and cut into 1 mm3 blocks. In-gel digestion was performed as described previously (34). Briefly, 25 mM NH4HCO3/50% acetonitrile (ACN) was added to each sample three times for 10 min each to dehydrate the gels and samples were dried via a Speed Vac. 10 mM DTT in 25 mM NH4HCO3 was added to each sample at 56°C for 1 h followed by the addition of 55 mM iodoacetamide in 25 mM NH4HCO3 for 45 min in the dark. The gel blocks were then dried via a Speed Vac. The gel pieces were digested overnight at 37°C with 12.5 ng/μl of Sequencing Grade Modified Trypsin (Promega, Madison, WI). Following the in-gel digestion, peptides were extracted with 50% ACN/0.1% formic acid (FA) then desalted using a ZipTip C18 pipette tip (Millipore, Billerica, MA). In addition, a separate set of 40 NE fractions was in-gel digested overnight at 25°C with 12.5 ng/μl of Sequencing Grade Chymotrypsin (Roche Diagnostics, Indianapolis, IN). In total, 120 samples were analyzed by LC-MS/MS.

Nanospray LC-MS/MS

Each peptide sample was analyzed once via 1-D nanospray LC-MS/MS with a modified ProteomeX 2D LC/MS workstation utilizing a linear ion trap mass spectrometer, LTQ-FT (Thermo, San Jose, CA) (20). In an alternating fashion, two Zorbax 300SB-C18 peptide traps (Agilent Technologies, Wilmington, DE) chromatographically separated the peptides. A nanospray ionization source and a reverse-phase PicoFrit column [BioBasic C18, 75 μm × 10 cm, tip = 15 μm (New Objective, Woburn, MA)] were used. The peptides were eluted via an increasing 0–60% gradient of solvent B in solvent A (A = 0.1% FA, B = 100% ACN/0.1% FA) over a 30 min period at a flow rate of ∼200 nl/min.

Post-LC-MS/MS Analysis

Search algorithms.

To maximize the number of peptide identifications, we analyzed the data using three different search algorithms. In addition to utilizing two peak matching algorithms viz. SEQUEST (15) and the Open Mass Spectrometry Search Algorithm (OMSSA) (17), we also employed InsPecT (44), a so-called “hybrid” algorithm that performs partial de novo sequencing generating 3-amino acid tags that are used to narrow down the possible peptide candidates before performing the final peak matching. A fixed +57 Da modification on cysteine and variable +14 Da cysteine and +16 Da methionine modifications were included as part of the search.

The raw data files were searched, using the three search algorithms, against a concatenated forward and reverse database that included the most recent RefSeq rat protein database from the National Center for Biotechnology Information appended with a list of common contaminants (pig and bovine trypsin as well as human isoforms of keratin). Peptide identifications were sorted from best to worst using the following parameters: XCorr (SEQUEST), P value (InsPecT), and E-value (OMSSA). Target-decoy analysis was performed to limit global FDR for each peptide identified using the following formula: FDR = 2R/(F + R), where R is the number of accumulated peptide hits from reversed “decoy” sequences and F is the number of accumulated peptide hits from forward “target” sequences (14). All peptides passing the 2% FDR filter from the three different search algorithms were merged at the spectral level. For a given spectrum, a valid identification was defined if at least 67% of the passed search algorithms agree on the identification (an agreement of 3 out of 3, 2 out of 3, 2 out of 2, or 1 out of 1). If no agreement among the three search algorithms was found, the single best identification (based on FDR) across all algorithms for a given spectrum was selected. Unresolved spectra were discarded.

Peptide to protein matching.

The same peptide may map to more than one protein. However, peak matching programs like SEQUEST often report a single match even when the peptide could match to multiple protein candidates. To detect ambiguous identifications, we have used an in-house program written in Java called ProMatch (executable file available upon request). In this program, each peptide identified is matched against all proteins in the rat RefSeq database, every protein whose amino acid subsequence identically matches to that peptide is extracted.

The following rules were used in reporting the peptide-based protein identifications. A peptide that matched only to a single protein was defined as a “unique peptide” and reported as such. A peptide that matched to multiple proteins was defined as a “nonunique peptide.” If that peptide matched to multiple proteins that are splicing variants or products of alternative transcription start sites in the same gene, it was considered as a unique peptide since it maps to only a single gene. For this determination, we extracted the nucleotide sequences of all the proteins that matched to a particular peptide and then applied the ClustalW2 sequence alignment program (European Bioinformatics Institute) to compare them. Proteins with a similarity score ≥90 were considered to be derived from the same gene and a single RefSeq accession number is reported for all. A protein identification was labeled as “unambiguous” if it was derived from at least one unique peptide. A protein identification was labeled as “ambiguous” if it was derived exclusively from one or more nonunique peptides derived from different genes. Ambiguous protein identifications were not reported.

Phosphopeptide Analysis

Sample preparation.

IMCDs were isolated and pooled from 10 male Sprague-Dawley rats. Nuclear isolation was performed as described above. The NE and NP fractions (250 and 500 μg, respectively) were resuspended with 6 M guanidine solution. Protein samples were reduced with 10 mM DTT solution at 56°C for 1 h then alkylated with 40 mM iodoacetamide solution in the dark at room temperature for 1 h. DTT solution (40 mM) was added after alkylation to quench excess iodoacetamide. Samples were diluted in 25 mM NH4HCO3 solution and digested with trypsin (1:30 wt/wt) overnight at 37°C. Peptide samples were desalted with a 1 ml HLB cartridge (Oasis, Milford, MA) and then dried via a Speed Vac.

Enrichment of phosphopeptides.

Dried peptide samples were resuspended in 5% acetic acid and Ga3+-IMAC (Phosphopeptide Isolation Kit, Thermo Scientific) was performed as described previously (20). The IMAC flow-through was subjected to an additional phosphopeptide isolation protocol using TopTips with TiO2 Material (PolyLC, Columbia, MD). Phosphopeptide samples were dried with a Speed Vac, resuspended in 0.5% FA, and desalted with ZipTip C18 pipette tips (Millipore).

LC-MS3 analysis.

The neutral loss scanning LC-MS3 analysis was performed as described (20). Briefly, isolated phosphopeptide samples were analyzed on an Agilent 1100 nanoflow system (Agilent Technologies, Palo Alto, CA) connected to a Finnigan LTQ-FT mass spectrometer (Thermo Electron, San Jose, CA) equipped with a nanoelectrospray ion source. The LTQ was used for the full MS scan and subsequent spectra (MS2 and MS3). MS3 scans were triggered when the presence of a neutral loss peak (−98, −49, or −32.7 m/z from precursor ion) was detected in MS2 scans.

Post LC-MS3 analysis.

SEQUEST, InsPecT, and OMSSA searches were performed against the concatenated forward and reverse database as described above. A fixed +57 Da cysteine modification and variable +16 Da methionine and +80 Da phosphorylation on serine, threonine, and tyrosine modifications were included. In addition, a variable loss of water −18 Da modification was included for serine and threonine when searching MS3 spectra. Identified peptides (both phosphopeptides and nonphosphopeptides) were filtered for a 2% FDR using the target-decoy approach described above. The PhosphoPIC program was used to extract and compile MS2 and MS3 spectra of phosphopeptides (21). Phosphorylation site assignment was performed using Ascore (2), Phosphate Localization Score (PLscore) (1), and PhosphoScore (39).


Samples were solubilized in 1.5% SDS/Tris, pH 6.8. A BCA assay (Thermo Scientific) using BSA as the standard was used to determine total protein concentrations. Samples were then diluted in Laemmli buffer (10 mM Tris, pH 6.8, 1.5% SDS, 6% glycerol, 0.05% bromphenol blue, and 40 mM DTT). Proteins (5 μg of each sample) were separated by 1-D SDS-PAGE and transferred to nitrocellulose membranes. The membranes were blocked with blocking buffer (LI-COR Biotechnology, Lincoln, NE) then exposed overnight to primary antibodies diluted in the blocking buffer. After a series of washes, membranes were incubated with secondary antibodies as previously described (35) prior to image acquisition on an Odyssey Infrared Imaging System (LI-COR Biotechnology).


Antibodies used are listed as follows: aldose reductase (Alr2) (sc-17735, Santa Cruz Biotechnology, Santa Cruz, CA), BRM/SWI2-related gene 1 (Brg1) (sc-17796, Santa Cruz Biotechnology), and cyclic AMP response element binding protein 1 (Creb1) (9197, Cell Signaling, Danvers, MA). Species-specific secondary antibodies were obtained from Rockland Immunochemicals (Gilbertsville, PA).

Transcription Factor Binding Site Analysis

The 5′-flanking region (1,000 nucleotides) and first intron of Aqp2 and Aqp3 gene of human, rat, and mouse were extracted using UCSC Genome Browser ( To identify conserved transcription factor binding sites (TFBS), the 5′-flanking region, and first intron sequences were analyzed using the online Genomatix software suite ( as previously described (54). Sequence matches to a particular TFBS matrix were scored and filtered based on MatInd algorithm (36).


Confirmation of Nuclear Isolation and Purification from Native Rat IMCD

Common markers for the cytoplasm and nucleus were used to determine the efficacy of the nuclear isolation and enrichment procedure. Figure 1 demonstrates that the nuclear proteins BRM/SWI2-related gene 1 (Brg1) and cyclic-AMP response element binding protein 1 (Creb1) were almost exclusively present in the nuclear fractions. In contrast, the cytosolic marker aldose reductase (Alr2) was far more abundant in the cytoplasmic fractions.

Fig. 1.

Quality control for nuclear isolation. Nuclear marker Brg1 is found only in the nuclear extract (NE) fraction. Another nuclear marker Creb1 is chiefly present in the NE fraction but is also found in the nuclear pellet (NP) fraction. Cytoplasmic marker (Alr2) is predominantly present in the cytoplasmic fractions. (Cytoplasm I = Potter-Elvehjem supernatant, Cytoplasm II = CERI/CERII supernatant.) Note that 5 μg of each sample was loaded. In this study we carried out MS analysis of the pooled Cytoplasm I and II (Cyto) as well as NE and NP fractions.

Proteomic Profiling of Rat IMCD Fractions

LC-MS/MS analysis was performed on three fractions derived from the nuclear isolation protocol: the soluble nuclear fraction (NE), the nuclear membrane pellet (NP), and the pooled cytoplasmic fractions (Cyto). Figure 2 summarizes the proteomic and bioinformatic workflow used for these samples. While the sample preparation protocol is relatively standard, we customized the post-LC-MS/MS data analysis for this study. Specifically, three different search algorithms (SEQUEST, InsPecT, and OMSSA) were used to increase the number of peptides identified from the spectra that were generated. The ProMatch algorithm was then used to check whether identified peptide sequences were unique to a single identified protein to allow ambiguous identifications to be eliminated.

Fig. 2.

Proteomic work flow. Graphical representation of the complete workflow used for this proteomic study. The ovals represent the experimental samples, spectra, or proteomic data. Shaded boxes represent the biochemical and computational methods that were applied to the biological samples and data.

A total of 2,172 proteins were identified in the NE fraction, 1,306 proteins were identified in the NP fraction, and 1,555 proteins were identified in the cytoplasmic fraction. For all three fractions we imposed a single-peptide FDR of 2% or less using a target-decoy algorithm (Fig. 3A). A total of 3,531 proteins were identified in one or more of the three fractions (all protein identifications are listed in Supplementary Table S1).1 Overall, 45% of protein identifications in this study are based on spectra for 2 or more peptides (Supplementary Table S1). Figure 3B summarizes the number of proteins discovered by each of the three search algorithms (SEQUEST, InsPecT, and OMSSA) in the NE fraction. Also in the NE fraction, 690 proteins were identified from chymotrypsin-digested samples. This yielded 184 proteins that were not found using trypsin and confirmed the presence of 506 proteins found in the NE trypsin dataset (Fig. 3C). It is evident from Fig. 3, B and C, that the use of multiple search algorithms and different proteases are beneficial in increasing the number as well as the confidence of unique peptides and proteins identified. Among the 3,531 protein identifications made in this study, 2,163 (61%) were not identified in the IMCD in previous studies and now have been added to the IMCD Proteome Database (

Fig. 3.

Venn diagram of proteomic results. A: the number of proteins identified in the NE, NP, and cytoplasmic fractions (Cyto) with an estimated single-peptide false discovery rate specified as 2% or less. A total of 3,531 proteins were identified in one or more of the 3 fractions. B: the number of proteins identified in the NE fraction by each searching algorithm (SEQUEST, InsPecT, and OMSSA). Intersections between each circle show the number of proteins identified by a combination of search algorithms. C: the number of proteins identified in the NE fraction using 2 different proteases (trypsin and chymotrypsin).

Transcription Factors Identified by LC-MS/MS

To enumerate the transcription factors found in the rat IMCD in this study, proteins identified in all the samples were combined into one dataset (Supplementary Table S1). One hundred fifty-four proteins (4.4% of total) were classified as “transcription factor” proteins (Table 1) using the Panther Classification System ( Figure 4 shows a Venn diagram of the distribution of transcription factors identified in the NE, NP, and Cyto fractions, with the majority of these proteins found in the NE fraction. One hundred twenty-eight of these proteins (83%) were new identifications for the IMCD Proteome Database.

View this table:
Table 1.

Transcription factors

Fig. 4.

Venn diagram of the distribution of transcription factors identified in the NE, NP, and Cyto fractions.

Computational Analysis of Conserved Transcription Factor Binding Elements in Aqp2 and Aqp3 Genes

To relate the transcription factors found (Table 1) to their possible roles in regulation of Aqp2 and Aqp3 gene transcription, we have mapped conserved binding element motifs present in the 5′-flanking region as well as the first intron of rat Aqp2 and Aqp3 genes (Fig. 5), based on analysis using the Genomatix software suite (methods). In the following, we summarize these findings in the context of existing literature.

Fig. 5.

Transcription factor binding site analysis. The 1,000 bp 5′-flanking region and 1st intron of the Aqp2 and Aqp3 genes of human, rat, and mouse were analyzed for conserved transcription factor binding sites (TFBS) using Genomatix database and software suite. This figure represents only Rat Aqp2 and Aqp3 genes. Conserved TFBS were found in the 1,000 bp 5′-flanking region of the Aqp2 and Aqp3 genes and the 1st intron of the Aqp2, however, no conserved TFBS were found in the 1st intron of the Aqp3 gene. Transcription factors identified in this study that potentially bind to these TFBS are shown above each TFBS. Abbreviations for Genomatix TFBS family name: V$SF1F = Vertebrate steroidogenic factor; V$NFAT = Nuclear factor of activated T-cells; V$FKHD = Fork head domain factors; V$ETSF = Human and murine ETS1 factors; V$RXRF = RXR heterodimer binding sites; V$AP2F = Activator protein 2; V$CREB = cAMP-responsive element binding proteins; V$GATA = GATA binding factors; V$SRFF = Serum response element binding factor; V$HOXF = Paralog hox genes 1–8 from the four hox clusters A, B, C, D; V$EBOX = E-box binding factors; V$EGRF = EGR/nerve growth factor induced protein C & related factors; V$MYT1 = MYT1 C2HC zinc finger protein; V$KLFS = Krueppel like transcription factors; V$CP2F = CP2-erythrocyte Factor related to drosophila Elf1; V$CEBP = Ccaat/Enhancer Binding Protein; V$PARF = PAR/bZIP family; V$NR2F = Nuclear receptor subfamily 2 factors; V$HAND = Twist subfamily of class B bHLH transcription factors; V$GREF = Glucocorticoid responsive and related elements; V$MAZF = Myc associated zinc fingers; V$ZBPF = Zinc binding protein factors; and V$SP1F = GC-Box factors SP1/GC.

Transcription Factors Potentially Involved in Aqp2 Gene Transcription

Figure 5A shows the transcription factors found in this study (listed in Table 1) that potentially bind conserved transcription factor binding elements in the first 1,000 bp of the 5′-flanking region or first intron of the Aqp2 gene, based on computational analysis using the Genomatix suite. As discussed in the following some of these have been examined experimentally in prior studies.


A conserved CREB binding motif (aka CRE) is present at −222 bp from the transcription start site. Originally identified by Uchida and colleagues (48), the importance of this site has been thoroughly documented by the subsequent studies of Hozawa et al. (22), Matsumura et al. (28), Yasui et al. (52), and Cai et al. (5). There are several genes in mammalian genomes that code for proteins that could potentially bind to CRE, including the index protein Creb1, which is activated by phosphorylation at Ser-133 by Ca-calmodulin-dependent kinases, ribosomal S6 kinase, or protein kinase A (41). In this study we identify two of these, Creb1 and Crebl1 (also known as “cAMP responsive element binding protein-like 1”). These contain the so-called “bZIP domain,” comprising a basic region and a leucine zipper region. Crebl1 acts in the unfolded protein response (UPR) pathway by activating UPR target genes induced during endoplasmic reticulum (ER) stress (24). It is a single-pass integral membrane protein that undergoes regulated intramembrane proteolysis in the Golgi during the UPR to yield an ∼400 amino acid cytoplasmic product that translocates into the nucleus.


Centered at −214 bp from the transcription start site is a conserved putative GATA sequence, a binding motif for zinc-finger transcription factors of the GATA family. Uchida et al. (47) identified Gata3 mRNA in microdissected collecting ducts and demonstrated that overexpression of Gata3 enhanced Aqp2 promoter-reporter activity. Both Gata2 and Gata3 are expressed in IMCD at many fold above the median signal (46). In the present study, we identified another GATA family member, Trps1 (Tricho-rhino-phalangeal syndrome type I protein), in nuclei from native IMCD cells. This protein is known to bind relatively specifically to GATA sequences and represses GATA-regulated genes (27). Interestingly Rai et al. (37) identified, in the 5′-flanking region of the Aqp2 gene, a region overlapping this GATA site that contained an unidentified cis-element with an apparent negative regulatory role on transcriptional activity. However, in a later study overexpression of the Gata3 transcription factor increased Aqp2 transcription pointing to an enhancer role for this binding element (47), in seeming contradiction to the findings of Rai et al. (37). A possible explanation for this conundrum is that Trps1 normally maintains repression at the GATA site, blocking the enhancer activity of Gata2 and Gata3 that would otherwise occur.


In the present study we also identified by mass spectrometry two transcription factors that potentially bind to a conserved HOX binding element centered at −201 bp from the transcription start site, viz., Hoxa2 and Hoxb7. Homeobox or HOX transcription factors are recognized to be involved in renal tubule segmentation as well as collecting duct development (32). The 5′-flanking region of Hoxb7 has been used to target transgene expression specifically to collecting duct cells in mice (38, 42).


Centered at −469 bp upstream from the transcription start site is a potential NFAT binding element. Several NFAT proteins including TonEBP (Nfat5) (19) and the calcineurin-dependent Nfat proteins (Nfatc1–4) (26) have been implicated in Aqp2 transcriptional regulation. Although we did not identify these transcription factors in the present study, we found a protein (Nf45) with weak homology to them that has been previously implicated in regulation of interleukin-2 transcription in lymphocytes (23). Although mRNA levels do not necessarily correlate with protein expression, Affymetrix microarray studies of native IMCD cells demonstrated that Nfat5 (also called TonEBP) is expressed with a signal 12.5-fold above the median signal for all transcripts (46). Interestingly, the same study showed that another Nfat protein is strongly expressed in rat renal IMCD, viz. Nfatc3, which is regulated by intracellular calcium. A rise in calcium activates the Ca-dependent protein phosphatase, calcineurin, which dephosphorylates the Nfat protein allowing it to translocate into the nucleus. A member of this family has recently been implicated in the transcriptional control of the Aqp2 gene (26). The involvement of Ca2+-calcineurin-sensitive regulation of Aqp2 gene transcription identifies another mechanism by which vasopressin can regulate Aqp2 expression, since several laboratories have demonstrated that activation of the V2 vasopressin receptor is associated with an elevation of intracellular calcium (6, 8, 11, 35, 43, 53).


Centered at −455 bp from the transcriptional start site of the Aqp2 gene is a conserved Ets binding motif that we identified in a previous study (54) as playing an enhancer role in cell-specific expression of Aqp2, and possibly playing a role in the vasopressin response. In the present study, MS analysis identified Etv5 (ets variant gene 5 or PEA3), a member of the ets family. This transcription factor is known to be selectively expressed in ureteric bud (the developmental precursor of the collecting duct system) (7).


Centered at −342 bp from the transcription start site is a conserved putative RXR binding motif. RXR binding elements bind dimers of ligand-activated transcription factors, usually RAR/RXR heterodimers. These transcription factors are known to be involved in development of the ureteric bud (4). In addition, RXR can heterodimerize with vitamin D receptors, peroxisome proliferator activator receptors, thyroid receptors, and other ligand-activated nuclear receptors. MS analysis in the present paper found one such ligand-activated transcription factor, viz. Nr1h2 (nuclear receptor subfamily 1 group H member 2, also known as “liver X receptor”). Interestingly, this is one of two transcription factor genes (with Elf1) whose mRNA levels correlated negatively with Aqp2 mRNA levels among subcloned mpkCCD collecting duct cell lines (54). Its endogenous ligand is at present unknown.


A conserved EBOX motif is located at −67 bp relative to the transcription start site of Aqp2. Two transcription factors were identified that potentially bind this site, namely Usf1 and Usf2. These are classical basic helix-loop-helix transcription factors known to be involved in transcriptional regulation in a variety of tissues.

Potential binding elements in the first intron.

We identified four potential transcription factor binding elements in a conserved region of the first intron of the Aqp2 gene (Fig. 5A), viz. binding sites for EGR, MYT1, KLF, and CP2 family transcription factors. The transcription factors identified in IMCD nuclei that correspond to these sites (Zbtb7a, St18, Klf15, Tcfcp2, and Tcfcp2l1) are all of the zinc-finger or the CP2 classes of transcription factors and typically play repressor roles. Klf15 has been previously implicated in regulation of the ClC-K1 chloride channel of the thin ascending limb of Henle (49). Tcfp2l1 is associated with normal duct development in both the salivary gland and kidney (50).

Transcription Factors Potentially Involved in Aqp3 Gene Transcription

Figure 5B shows the transcription factors found in this study (Table 1) that potentially bind conserved transcription factor binding elements in the first 1,000 bp of the 5′-flanking region or first intron of the Aqp3 gene, based on computational analysis using the Genomatix suite. No conserved binding motifs were found in the first intron.

The 5′-flanking region of the Aqp3 gene contains several conserved transcription factor binding element motifs seen in the Aqp2 5′-flanking region or first intron including two EBOX motifs (potentially binding Usf1 and Usf2), an NR2 nuclear receptor site (potentially binding the orphan nuclear receptor Nrf2f), and two KLF sites (potentially binding Klf15). These sites are potentially responsible for the coordinate regulation of Aqp2 and Aqp3 seen in response to vasopressin (45). In addition, the 5′-flanking region of the Aqp3 gene contains some interesting unique sites that may provide some of the explanation for the difference in the tissue distribution of Aqp3 expression vs. Aqp2 expression, as well as differences in regulation. Foremost among these unique sites is a putative GRE (glucocorticoid response element), which can bind either the glucocorticoid receptor (Nr3c1) or the mineralocorticoid receptor (Nr3c2). Although only the former was found by MS analysis of the nuclear fraction in the current study, the latter is also known to be strongly expressed in the rat renal IMCD (46), providing a potential explanation for the large increase in Aqp3 protein abundance seen in the renal collecting duct in response to treatment of rats with the mineralocorticoid, aldosterone (25).

In addition, the 5′-flanking region of the Aqp3 gene contains two highly conserved “MAZ” or “MYC-associated zinc finger” sites at −62 and −79 bp relative to the transcriptional start site. Genomatix analysis predicts that one transcription factor from Table 1 would bind to these sites, viz. Zbtb19 (aka Zfp278 or “protein kinase A RI subunit α-associated protein”). Like other zinc-finger transcription factors, this transcription factor is associated with repressor activity. Finally, the 5′-flanking region of the Aqp3 gene contains a highly conserved SP1 binding element motif at −62 bp, which potentially binds another zinc-finger transcription factor found in this study by mass spectrometry, viz. Klf10 (also known as “TGF-β-inducible early growth response protein 1”), that likely manifests repressor function.

Transcriptional Co-regulators and Nucleic Acid Binding Proteins Identified by Proteomic Analysis of IMCD

Transcriptional co-regulators are listed in Table 2. Proteins classified as “nucleic acid binding” proteins by Panther analysis are listed in Supplementary Table S2. This category includes basal transcription factors, helicases, RNA polymerases, and other ubiquitous proteins. Both lists are made up largely of proteins that would be found in any cell type.

View this table:
Table 2.

Transcriptional co-regulators

Phosphoproteomic Analysis

Figure 6 summarizes the workflow utilized for the discovery of phosphoproteins in NE and NP fractions from rat IMCD. Three phosphorylation site verification tools were used [Ascore (2), Phosphate Localization Score (PLscore) (1), and PhosphoScore (39)] to increase our confidence in the identified phosphorylation sites. Table 3 summarizes the 122 phosphorylation sites found. Of those, 63 were previously unidentified phosphorylation sites (not present in Phosphosite, Eight phosphorylation sites in six proteins were identified as a “transcription factor” or “transcription co-regulator” by Panther, namely Bclaf1 (S512), Hbxap (S608 and S1352), Lrrfip2 (S133), Ptrf (S169), Safb (S309), and Trim28 (S52 and S474). The presence of phosphorylated COOH-terminal tails of two aquaporins in the nuclear fractions raises the possibility of a role for regulated intramembrane proteolysis (3) of aquaporin proteins in transcriptional regulation in the IMCD, a possibility that needs further investigation. However, we cannot rule out the presence of these aquaporin phosphopeptides in nuclear fractions owing to contamination with small amounts of ER membranes, which are essentially extensions of the nuclear envelope.

Fig. 6.

Phosphoproteomic work flow. Graphical representation of the sample preparation, LC-MS3, and bioinformatic steps taken in the nuclear phosphoproteomic study. Ovals represent the sample or proteomic data. Shaded boxes represent the biochemical and computational methods that were applied to the sample or proteomic data. The work flow utilized 2 phosphopeptide isolation kits (IMAC and TiO2 tips) to maximize phosphopeptide recovery and 3 phosphorylation site verification programs (Ascore, PLscore, and PhosphoScore) to ensure phosphosite validity.

View this table:
Table 3.

Phosphorylation sites identified in nuclear fractions

Kinases in Nuclear Fractions

Table 4 categorizes the kinases found in this study, which include 58 kinases present in the nuclear extract fraction. While all are candidates for regulatory roles in transcription in IMCD, some already have well documented roles in other tissues. For example, all three of the kinases known to phosphorylate Creb1 at Ser133 are present in either the NE or NP fraction, viz. protein kinase A catalytic subunit α (Prkaca), calcium/calmodulin-dependent protein kinase II δ and γ (Camk2d and Camk2g), and ribosomal protein S6 kinase (Rps6ka5 and Rps6kc1) (41). In addition several MAP kinases were detected, including Erk1 (Mapk3), Erk2 (Mapk1), Erk3 (Mapk6), and p38α (Mapk14), which are known to phosphorylate transcription factors of the ETS, AP1, and GATA families (Phosphosite: Finally, the Hipk2 protein phosphorylates a number of transcription factors including p53, Pax6, and Zbtb4 (Phosphosite).

View this table:
Table 4.

Protein kinases in nuclear fractions

In general, of the 82 kinases listed in Table 4, 36 of them are classified by Gene Ontology (component) as located in “nucleus.” Among these are kinases that have housekeeping roles in the nucleus, such as DNA repair (Atm and Prkdc), mRNA splicing (Stk3), and cell cycle regulation (Cit, Cdk5, Gsk3β, and Mtor).


Table 5 categorizes the 17 phosphatase proteins found in nuclear fractions in this study. These proteins include receptor and nonreceptor tyrosine phosphatases, serine/threonine phosphatases, and dual-specificity phosphatases. All could potentially play roles in transcriptional regulation in the collecting duct.

View this table:
Table 5.

Protein phosphatases in nuclear fractions

General Observations

In addition to investigating the regulation of the water channels Aqp2 and Aqp3, a general goal of our studies is the development of tools, including proteomics databases, that can be useful to the kidney research community. In so doing, we have developed several proteomics and transcriptomic databases profiling genes expressed in the IMCD, thick ascending limb and proximal tubule (https: Our nuclear proteomics data have been added to an upgraded IMCD Proteome Database website (

An additional goal has been to develop improved computational approaches to increase the yield of peptides that can be identified from high-quality spectra without a high rate of false positive identifications. In this study, the use of multiple search algorithms [SEQUEST (15), InsPecT (44), and OMSSA (17)] gave substantial numbers of identifications unique to each search algorithm. The largest increment in identifications was found with the InsPecT algorithm, a hybrid approach that uses de novo sequencing of a portion of each peptide to narrow the search space for pattern matching. InsPecT allowed the identification of a subset of peptides often missed by strictly peak-matching software such as SEQUEST or OMSSA. Further steps, such as target-decoy analysis as well as performing ambiguity checks on identified peptides using an in-house algorithm ProMatch, limited the number of false positive identifications and eliminated ambiguous identifications. We emphasize that target-decoy analysis has been demonstrated to be superior in limiting false positive identifications compared with use of the “two-peptide rule,” requiring protein identifications to those established by two or more peptides (18).

Overall, with the new data presented in this study, we have laid the groundwork for future studies of the role of vasopressin and other hormones in transcriptional regulation in renal collecting duct cells. The data provide a guide to proteins present in the nucleus of the “cells of interest” for such studies, which become candidates for genetic manipulation in cell culture and mouse models. The ultimate goal is to identify the relevant transcriptional regulatory networks involved in the control of Aqp2 and Aqp3 gene transcription. By sharing our data in a publicly accessible database, we encourage other investigators to partake in these investigations.


No conflicts of interest are declared by the authors.


  • 1 The online version of this article contains supplemental material.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.
  47. 47.
  48. 48.
  49. 49.
  50. 50.
  51. 51.
  52. 52.
  53. 53.
  54. 54.
View Abstract