|
|
||||||||
Review
1 Department of Biochemistry
2 McGill Cancer Center, McGill University, Montreal, Quebec, Canada H3G 1Y6
| ABSTRACT |
|---|
|
|
|---|
gene hunting; complementary DNA cloning; expressed sequence tags; expression validated gene
| INTRODUCTION |
|---|
|
|
|---|
A set of FL cDNA clones have "value-added" features that ESTs do not carry. FL cDNAs define the boundary of the transcriptional unit (and thus identify the immediate upstream basal promoter), provide a record of isoform diversity when posttranscriptional modifications alter the primary pre-mRNA transcript in various ways (such as alternate promoter usage, alternative splicing, alternate polyadenylation, and RNA editing), provide the complete coding region of the encoded protein for functional studies, and provide sequence characterization of two mRNA segments important for posttranscriptional gene regulation, i.e., the 5' and 3' untranslated regions (UTRs).
| COMPLEMENTARY DNA CLONING |
|---|
|
|
|---|
|
Given that metabolic labeling experiments in mammalian cells estimate that steady-state levels of heterogeneous nuclear RNA (hnRNA) represent about 7% of total RNA (24), it should be little surprise that a significant fraction of cDNA clones within any given library will contain intron sequences. Indeed, there are many published reports documenting the presence of introns within cDNA clones (e.g., 5, 8, 39, 55, 70, 122, 165; see also Ref. 89 for additional examples). To complicate matters, there is anecdotal data suggesting that 5' introns are excised more slowly than internal introns (89), possibly a consequence of differences in the mechanism by which these introns are recognized (13, 79). The presence of such cDNA clones in libraries can significantly delay the progress of gene characterization due to misleading interpretations on predicted protein structure, incorrect assignment of translation initiation codons, incorrect assignment of 5'-UTRs, and false information on promoter location. The frequency with which this problem is encountered is difficult to estimate, since reports documenting intron-containing cDNAs are generally not published.
Given these issues, it follows that mRNAs for cDNA library generation should, if possible, be prepared from cytoplasmic fractions obtained from cells or tissues. Alternatively, cDNAs obtained from libraries based on total mRNA (which is the case for the majority of current libraries) should be carefully examined to ensure they lack retained introns. By removing the bulk of nuclear pre-mRNA, this modification significantly decreases the amount of intron-containing clones that contaminate the final cDNA library. Protocols for isolating cytoplasmic mRNA have been extensively used with cultured cells, not very much with tissues, and produce RNA preparations of variable quality (12). It is reasonable to expect that the procedure may have to be tailored for different cell lines or tissues. Also, it is likely that some interesting nuclear transcripts, such as XIST, a nuclear transcript involved in X-chromosome inactivation, will be selected against by this strategy.
We have modified conventional cytoplasmic RNA purification schemes (12) to isolate cytoplasmic RNA of sufficient quality and purity for cDNA library construction from cell lines and tissues. The approach we utilize is described in the legend to Fig. 2A and has been successfully applied to isolation of RNA from a number of cell lines and tissues. When our isolation method was piloted on MDAH cells, an ovarian cancer cell line, the quality of the cytoplasmic RNA was as good as the quality of total RNA obtained in parallel (Fig. 2B, compare lane 2 with 1). We have successfully applied this procedure to a number of cell lines, as well as fresh and frozen tissues. In the cases of isolation from tissue, the material is first disrupted using a Teflon homogenizer. One tissue source from which we have failed to isolate good quality cytoplasmic RNA is kidney, presumably due to the presence of nucleases in this tissue.
|
There is potential valuable information to be gained from properly characterizing cDNAs prepared from cytoplasmic mRNA. Intron retention by cytoplasmic mRNAs may also occur as a mechanism of posttranscriptional regulation in some situations. Libraries generated from cytoplasmic mRNA will make it easier to identify these events above the background generally provided by total cellular mRNA preparations. The best studied example of this is the trans-regulation of the nucleocytoplasmic distribution of unspliced mRNA by the human immunodeficiency virus (HIV) rev product (48, 60, 101). This phenomenon requires a cis-acting Rev-responsive element (RRE) in the env region of HIV-1, within the common intron of Tat and Rev coding sequences. In the presence of rev, the RRE mediates the cytoplasmic appearance of HIV mRNAs containing introns. It appears that rev affects a general process that retains precursor mRNAs in the nucleus (36). Another example of this sort of regulation is the Mason-Pfizer monkey virus where efficient accumulation of unspliced mRNAs appears to depend on the interaction of a cis-acting RNA element with the ATP-dependent RNA helicase A (25, 150) or with other cellular factors (57, 118). Thus some representatives of specific genes within cDNA libraries generated from cytoplasmic mRNA might contain introns. The identification of such clones would be a first step toward cataloging cellular genes potentially utilizing this mechanism of posttranscription regulation. Recent descriptions of mRNAs harboring retained introns suggests that in some cases this phenomenon may be at work to regulate gene expression (84) and increase protein isoform diversity (117, 123, 125, 155). This form of regulation of gene expression may function similarly to a recently identified sequence element within the intronless H2a gene which functions to promote mRNA transport, inhibit splicing, and stimulate polyadenylation (75). Similar elements have been identified within HSV-TK (98) and HBV (74, 76). Hence, the presence of related elements within introns have the potential to inhibit splicing and increase accumulation of unspliced transcripts in the cytoplasm.
Oligo (dT) chromatography.
One fractionation procedure that most individuals utilize when synthesizing cDNA libraries is to generate poly(A)+ mRNA by oligo (dT) selection of total RNA. This makes good sense given that RNA PolII transcripts represent only
5% of the total RNA population within a given cell. However, what is generally not appreciated is that some mRNA transcripts are not polyadenylated (e.g., histones) or may have shortened poly(A) tails that do not permit selection by oligo (dT) chromatography (160). It is interesting to note that although to date, the only class of poly(A)- mRNA transcripts which have been defined are some of the histone mRNAs, estimates have indicated that there could be a large fraction of genes expressed in postnatal brain uniquely as poly(A)- transcripts (156). These reports indicate that a large proportion of the complexity in poly(A)- mRNA comprises sequences distinct from those in poly(A)+ mRNA. Hybridization with 3H-labeled poly(U) indicated that the poly(A)- mRNA fraction was relatively free of poly(A)+ mRNA since the equivalent of only one A tract of 20 residues per 100 average-sized molecules (1,500 nt) was observed (156). The complexities of the poly(A)+ and poly(A)- mRNA classes were also found to be additive, since the sum of their complexities equaled that of polysomal RNA (37, 156). Another interpretation of these data is that the poly(A)- mRNA species are derived from the poly(A)+ mRNA transcripts, and indeed in a number of situations, specific mRNAs have been found in both poly(A)+ and poly(A)- forms, such is the case for some histone species, casein mRNA, and protamine mRNA (52, 72, 130). It is also known that the length of the poly(A) tail can be regulated (41). Indeed, there is some suggestion that this phenomenon may be tissue restricted, since analysis of RNA from kidney indicates extensive homology between poly(A)+ mRNA and purified nominal poly(A)- mRNA (116). For poly(A)+ mRNA preparations to be used for cDNA library construction, it is imperative to assess the quality of the preparation and efficiency of fractionation. We assess the quality of our cytoplasmic RNA preparations by Northern blotting with probes to three ubiquitously expressed transcripts; histone H4 [a poly(A)- transcript], ß-actin (
1.8 kb), and protein tyrosine phosphatase 1E (mRNA is
8 kb) (Fig. 3A).
|
There are two procedures that are amenable to preparative mRNA size fractionation-agarose gel electrophoresis and sucrose gradient centrifugation. Preparative agarose gel electrophoresis is relatively simple and provides sharp size cuts and good resolving power between 500 and 10,000 bases, but the RNA recovery is lower than with sucrose gradients, and some lots of commercial agarose contain contaminants (that inhibit subsequent downstream enzymatic steps) that copurify with the RNA (49). On the other hand, sucrose gradient centrifugation allows the fractionation of large amounts of RNA and allows recovery in high yields, although the size cuts are not as sharp as with agarose gels (61, 90). An example of total RNA fractionated on methylmercury hydroxide sucrose gradients is shown in Fig. 3B. We generally load 100 µg of poly(A)+ mRNA on 1030% sucrose gradients. Following centrifugation, we collect 30-drop fractions, precipitate, and analyze 10% of the material by Northern blotting, probing for ß-actin transcripts and for the Huntington disease (HD) transcript (
11 kb). The HD gene is ubiquitous but of very low abundance, making it difficult to detect (Fig. 3C).
Correction of Mispriming by the Oligo (dT) Primer During First-Strand Synthesis
Priming from the mRNA poly(A) tract with oligo (dT) is necessary to obtain a copy of the entire 3'-UTR. However, it has been estimated that
1015% of clones in cDNA libraries have 3' truncations due to internal misannealing of the oligo (dT) primer to internal A-rich sites (23). Such clones are identified by the absence of a polyadenylation signal sequence (generally AATAAA)
30 nucleotides upstream of the oligo (dA) tail of the cDNA. If enhanced discrimination could be achieved between annealing of the oligo (dT) primer to the bona fide poly(A) tail vs. internal A-rich sequences, then the frequency of this mispriming artifact should be significantly reduced. Guo et al. (59) have shown that increased discrimination of single nucleotide polymorphisms by oligonucleotides can be achieved by introducing artificial mismatches into an oligonucleotide probe using the base analog 3-nitropyrrole (Fig. 4A). This base analog acts as a universal nucleoside that hydrogen bonds minimally with all four bases without steric disruption of the DNA duplex. Differences in thermal stability (
Tm) between hybrids formed with normal and single-nucleotide variant DNA targets are increased by as much as 200% over conventional hybridization (59). We tested whether an oligo (dT) primer in which some of the internal thymidine residues were replaced with 3-nitropyrrole [called oligo (dT)·Z1; 5' d(T)7·Z·d(T)9·Z·d(T)5 3'] significantly decreased the frequency of 3' end mispriming (Fig. 4).
|
RNA-Dependent DNA Polymerases
Inhibition of reverse transcriptase processivity by secondary structure.
The central enzyme to cDNA library construction has always been RT, since all current cDNA library generation procedures employ an RNA-dependent DNA polymerase to convert mRNA into a cDNA copy. This enzyme performs multiple functions during the life cycle of a retrovirus, including the copying of RNA in DNA, the hydrolysis of RNA from the RNA-DNA hybrid, and the copying of single-stranded DNA into double-stranded DNA. The two most commonly used RTs for cDNA library construction have been those derived from avian myeloblastosis virus (AMV) and Moloney murine leukemia virus (MMLV). The AMV RT is a heterodimeric molecule containing two related subunits, the
-subunit (
63 kDa) and the ß subunit (
95 kDa) (71). The
-subunit of AMV RT carries both polymerase and RNase H activity. The MMLV RT differs from the avian form in that it is a single 84-kDa polypeptide (159) with the polymerase activity mapping to the amino terminus and the RNase H activity to the carboxy terminus (149). The error rates of MMLV and AMV RT on DNA template has been estimated to be
1/30,000 and 1/17,000, respectively (81, 126). On RNA, the error rate of MMLV RT is 1/37,000 (81). The error rate of HIV RT has been found to be significantly higher
1/4,600 to 1/7,000 (6, 7, 78, 80, 81), precluding the use of the native HIV enzyme in cDNA library construction. One drawback with current RTs is that they have a very poor rate of processivity, in the range of 515 nucleotides per second (77, 100). Reducing the RNase H activity of MMLV RT improves the efficiency of RT (53), suggesting that generation of truncated products by RNase H+ RTs may be the result of pausing by the RT on the RNA template, followed by degradation of the RNA moiety by the RNase H activity. Recombinant MMLV RT engineered to be devoid of RNase H activity has resulted in the generation of several commercial products.
To assess pausing by RTs on mRNA templates during first-strand synthesis, we have developed a series of test vectors that have stable stem-loop structures in an NcoI site positioned 918 bp upstream of the WT1 3' end (4) (Fig. 5A). It is well documented that stable hairpin loops can inhibit RT processivity (154). Also, plasmid SP/flWT1 contains 433 bp of the 5'-UTR of WT1 and is
70% GC rich. Indeed, when cDNA clones for the murine WT1 gene were first isolated, none of the clones were full length, and 5 of 9 clones terminated within 21 nucleotides of each other, 182 bases upstream of the ATG codon, suggesting the presence of a strong RT stop signal in this region. The murine WT1 5' end could only be obtained by 5' RACE and genomic DNA sequencing (J. Pelletier, data not published). We have used in vitro generated WT1 transcripts (ranging in size from
1.4 to 2.0 kb) to elucidate and optimize conditions most effective in allowing RTs of various sources to proceed through these blocks.
|
Our results indicate that MMLV RNase H- enzyme appears to be best at negotiating regions of secondary structure. Recently a number of such products have appeared on the market, and although we have not tested all of them, our preliminary results indicate that many of the RNase H- MMLV RTs show similar activities. In our hands, Expand RT (Roche) also produces high-quality first-strand product (data not shown). We have also tested a number of RT RNase H inhibitors such as suramine (147), heparin (113), novobiocin (2), and illimaquinone (99) for their ability to suppress RNase H activity present in the native MMLV and AMV enzymes. This approach was to attempt to reduce the amount of pausing on flWT1/(GNRA)2 templates. However, in our hands, none of these improved the activity of the native enzymes on templates containing structural features inhibitory to RT processivity. Parenthetically, we have noticed variation in quality of some lots of buffers that accompanied Superscript II and that resulted in the synthesis of a major truncated product when the enzyme was assayed on poliovirus mRNA [an
7.5-kb RNA template with a poly(A)+ tail], compared with the same buffer made in our lab. One should thus be wary about utilizing any reagents from manufacturers unless they are first quality controlled in the investigators lab.
A number of conditions have been reported in the literature to improve the processivity of RT. These include denaturing the RNA template at 65°C for 5 min before starting the RT reaction, pretreatment of the RNA with methylmercury hydroxide before the RT reaction, addition of DMSO to the RT reaction, and performing the RT reaction at a higher temperature (55°C) in the presence of the thermostabilizer, trehalose (34). However, the usefulness of these conditions has never been tested on controlled test transcripts harboring defined structural features capable of blocking RT activity. RT reactions performed with Superscript II and either WT1 or flWT1 result in exclusive production of FL products as assessed by denaturing alkaline agarose gels (Fig. 5E, lanes 1 and 2). RT reactions on flWT1/(GNRA)2 template show FL product, as well as a truncated product due to a block in processivity at 918 bp by the GNRA stem-loop (denoted by an asterisk) (Fig. 5E, lane 3). None of the methods in common use today to denature RNA templates before the commencement of an RT reaction improves the processivity of Superscript II on the flWT1/(GNRA)2 template (Fig. 5E, lanes 511). We interpret these results to indicate that denaturation of local stem-loop structures by any of these conditions is transient, and once the treatment is terminated, structured regions rapidly reform. We do not know why trehalose did not improve the processivity of RT, but it is possible that even at 55°C, the GNRA stem-loop structure is very stable. Additionally, it is not clear how cDNA synthesis at 55°C, in the presence of trehalose, affects the error rate of Superscript II. Also, given the problem of metal-induced hydrolysis of RNA at high temperatures (see below), the utility of this modification remains to be established (108). It is evident that current treatments for enabling RT enzymes to proceed through regions of high secondary structure within RNAs simply are not effective.
Effect of temperature on mRNA template stability.
Many current cDNA protocols aim to perform reactions under conditions in which the temperature of the reaction is maintained as high as possible (without interfering with the activity of the enzyme). The underlying premise is that reduced hydrogen bonding at elevated temperatures will decrease mRNA secondary structure, leading to reduced pausing of the RT on the mRNA template and resulting in a higher proportion of FL product. Examples of this include 1) utilization of Thermoscript (LTI) at 55°C, 2) addition of trehalose to stabilize Superscript II at 60°C (34), and 3) use of Display Thermo-RT (Display Systems Biotech). However, these approaches disregard the fact that at elevated temperatures, mRNA is susceptible to metal catalyzed degradation (108). We have assessed the effect of incubating mRNA with a range of heavy metals at room temperature for 60 min (Fig. 6A). The results demonstrate that under these conditions La3+, Lu3+, and Zn2+ are very effective at hydrolyzing mRNA (Fig. 6A, compare lanes 2, 3, and 6 with lane 1), whereas Mg2+, Mn2+, and Cu2+ did not appreciably degrade the test template (Fig. 6A, compare lanes 4, 5, and 7 with lane 1). Incubation of test mRNA template with 3 mM MgCl2 at a series of temperatures demonstrated that at 55°C, degradation by Mg2+ is significant (Fig. 6B). Given the absolute requirement of RTs for divalent metal cations, our results would suggest that RT reactions performed at or above 55°C will result in significant damage to mRNA templates. As a matter of precaution, we do not perform RTs at temperatures greater than 45°C.
|
The ability of NC proteins to refold RNA has been demonstrated in a wide variety of assay systems: NC proteins accelerate annealing of complementary strands (40, 43, 93, 153, 163), facilitate transfer of a nucleic acid strand from one hybrid to a more stable hybrid (93, 153), cause unwinding of tRNA (83), and stimulate release of the products of hammerhead ribozyme-mediated RNA cleavage (16, 68, 110). In addition, supplementing HIV RT with recombinant HIV NC protein has been shown to reduce pausing and increase the efficiency of synthesis of FL DNA products (81, 148, 162). This effect is almost certainly due to the ability of the NC protein to transiently eliminate secondary structures in the template RNA that obstruct the polymerization process. Dose-response studies have shown that a threshold concentration of the protein is required to demonstrate nucleic acid chaperone effects (83, 162, 163). This minimum concentration is generally in the range of 1 protein molecule per 7 nucleotides, approximately the ratio at which a nucleic acid becomes saturated with NC protein.
To determine whether NC can function with MMLV RT, recombinant NCp7 (Fig. 7A) was added to native MMLV RT and Superscript II (Fig. 5D). Addition of NCp7 to MMLV RT did not improve the quality of first-strand product produced with this enzyme (Fig. 5D, compare lane 3 with 1). However, addition of NCp7 to Superscript II dramatically improved the quality of first-strand product obtained with this enzyme (compare lane 4 with lane 2). We noticed two effects of NCp7: 1) a reduction in the amount of truncated first-strand product and 2) a reduction in the amount of undesired hairpin primed second-strand product (Fig. 5D, compare lane 4 with 2). Titrations of NCp7 in RT reactions primed from WT1 or flWT1/(GNRA)2 mRNA templates were performed (Fig. 7B). Addition of NCp7 to Superscript II reactions containing WT1 RNA did not affect the quality of the products, producing only a slight reduction in yield (compare lanes 15). Addition of increasing amounts of NCp7 to flWT1/(GNRA)2 showed a significant improvement in the quality of the RT products (compare lanes 610). At the highest concentration of NC (1.2 µg), the majority of the RT products are full length (lane 10). These results demonstrate that NCp7 is capable of improving the processivity of Superscript II, and we have incorporated its use to generate better quality first-strand product. Parenthetically, we have found that not all RNase H- RTs are stimulated by NCp7 and that not all preparations of NCp7 are equally effective at providing the effect we have described herein. The reasons for this are currently not well understood.
|
In a second group of methods, initiation of second-strand synthesis takes place outside the sequence of the first strand. For that purpose, a homopolymeric tract is synthesized at the 3' end of the first-strand product with terminal deoxynucleotidyl transferase (TdT) (9, 51, 128, 129). Ligation of an oligonucleotide at the 3' end was also reported, but the efficiency of this reaction is low (153). In both cases, a complementary oligonucleotide annealed to the extension allows initiation. Another method involves addition of a homopolymeric tract to the first strand and of a complementary homopolymeric tract to the linearized plasmid vector, followed by annealing (115). Depending on the method, the second strand is synthesized with E. coli DNA polymerase I, AMV RT, or thermostable polymerases. This group of methods does not require digestion of the 3' end of the first strand, thereby allowing conservation of the sequences corresponding to the 5' end of the mRNA.
With respect to homopolymeric tailing, generally a stretch of guanosine residues are added to the 3' end of the first strand, since this reaction is self-limiting (
1015 nt) due to secondary structure formed by poly(dG). Second-strand synthesis is then primed by oligo (dC). Recently a series of thermostable enzymes [Taq DNA polymerase, Ampligase (thermostable ligase), and Hybridase (thermostable RNase H)] have been utilized to prime the second strand with oligo (dC) (34). A lesser known approach has been to utilize TdT to tail the first-strand cDNA product with dTTP, followed by second-strand synthesis utilizing oligo (dA) and T7 DNA polymerase (20, 21), an enzyme which has a higher 3'-exonuclease activity than polymerases previously used for second-strand synthesis. This results in a higher level of fidelity (error rate is
15 x 10-6) and allows trimming of the poly d(T) tract during the second-strand synthesis reaction. The size of the tract synthesized with terminal deoxynucleotidyl transferase is therefore not required to be within a given size range, and the resulting clones contain a tract of limited size (
40 nt; see Refs. 20 and 21). Furthermore, T7 DNA polymerase has a much higher processivity (
300 nt/s) than the polymerases previously used for second-strand synthesis, thus making it the enzyme of choice for synthesis of long molecules (120, 146). However, in our hands the efficiency of this approach can be quite low (probably due to inefficient tailing of the cDNA by TdT), resulting in low conversion of first-strand product to second-strand product.
Alternatively, an "oligo capping" approach, also known as reverse ligation-mediated PCR (RLPCR) (15) has been reported and used with some success (104, 145). In this approach, before generation of first-strand product, the 5' end m7G mRNA cap structure is removed using tobacco acid pyrophosphatase. An oligoribonucleotide is then ligated onto the mRNA 5' end using T4 RNA ligase. Following first-strand synthesis, the oligoribonucleotide at the mRNA 5' end is copied into DNA, if the polymerase makes it that far. This unique sequence at the 5' end then serves as an anchor for second-strand reaction, which incorporates an amplification step using thermostable enzymes (145). Unfortunately, T4 RNA ligase is a very inefficient enzyme, although the use of macromolecular crowding agents, such as polyethylene glycol 8000, can improve the efficiency of the reaction (64). The use of an amplification step in the generation of cDNA libraries based on oligo capping is likely the result of the low-efficiency ligation step, and is a cause for concern, given the changes in representation associated with such manipulations. A variation on this approach involves oligodeoxynucleotide ligation to first-strand product with T4 RNA ligase (45). Another method of anchoring defined sequences at the 3' end of first-strand product is based on the observation that Superscript II often adds three to four non-template-derived cytidine residues to the 3' end of cDNAs in the presence of manganese or high magnesium. The presence of these additional residues can be used to either ligate a specific DNA adaptor (132) or to add 5' end sequences during the RT process based on CapFinder technology (Clontech). In test assays utilizing flWT1/(GNRA)2, we found that nontemplate cytidine residues appeared to also be added to incomplete first-strand product, suggesting to us that minimal discrimination between FL and incomplete products is achieved with approaches relying on the addition of 3' end nontemplate residues by RTs during first strand.
We evaluated two criteria in choosing a method for second-strand synthesis. First, we avoided any procedures that required an amplification step, due to high mutation rate and alteration of transcript representation. Second, we were interested in a method that provided the highest conversion yield of first-strand into second-strand product. In our hands, the classic method of Okayama and Berg (115) and Gubler and Hoffman (58) provided the highest yield, with minimal amount of sample manipulation. As shown in Fig. 8A, Southern blotting of the second-strand product with a probe against ß-actin and PTP-1E demonstrates that the majority of second-strand product is still full length and that size fractionation of the mRNA template effectively produces fractions that are enriched for the different cDNA size classes.
|
|
|
Propagation Vector, Library Maintenance, and Clone Manipulation
Many methods have been reported in the literature for inserting cDNA fragments into appropriate propagation vectors. We will not attempt to review these methods, since they often are tailored to individual needs or functional applications. In our case, our application calls for a vector that can accommodate large inserts and that can easily be manipulated in a fashion compatible with current high-throughput sequencing technologies (at least with respect to identifying the nature of the ends of the insert).
Until 1983, almost all cDNA libraries were propagated in plasmid vectors, and were usually maintained as a collection of more than 105 independently transformed bacterial colonies. Sometimes these colonies were pooled, amplified in liquid culture, and stored at -70°C; more frequently they were maintained on the surfaces of nitrocellulose filters (62). However, these libraries were difficult to maintain without loss of clone viability, and screening by hybridization to multiple radioactive probes required labor-intensive replication of colonies from one nitrocellulose filter to another.
With the advent of bacteriophage cloning vectors, it became possible to take advantage of the high efficiency and reproducibility of packaging bacteriophage
DNA in vitro into infectious virus particles. The resulting libraries could be amplified and stored indefinitely without loss of clone viability and could be screened with both nucleic acid and antibody probes. Testimony to the usefulness of the
cloning vectors is that the majority of current cDNA libraries utilize these vectors. However, even the bacteriophage propagation system suffers from serious drawbacks. 1) Amplification of bacteriophage cDNA libraries leads to loss of clone representation due to differential growth rates of different clones. 2) Plaque hybridization methods are cumbersome, labor intensive, time consuming, and thus not adaptable for larger-scale screening efforts. Filter replicas of cDNA libraries have finite lifespans and must be periodically regenerated, resulting in the rapid consumption of unique and valuable resources. 3) Up to 10 kbp in length may be propagated in
ZAP, imposing an upper size limit on the inserts that can be propagated in this system. 4) Excision of inserts from the vectors is a labor-intensive process, even for
ZAP-based libraries.
There are a number of requirements that a vector system designed for library maintenance should fulfill to satisfy the needs of the genomics community. The vector must be easy to manipulate, allow propagation of large inserts, allow excision or manipulation of the intact cDNA, and be compatible with current sequencing technology. A plasmid-based cDNA library has the advantage that clones can be individually manipulated utilizing picking robots and pipetting stations. Clones from a given library can be pooled, and schemes for PCR-based screening can be implemented (111). Although the initial costs and labor in setting up these arrayed libraries is significant, the library pools can be stored indefinitely at -80°C and represent enough material for >10,000 individual screens; these libraries represent an essentially permanent and maintenance-free resource. In addition, PCR screening of arrayed libraries is adaptable to high throughput, and can be completed in a few days (111). For hybridization-based screening, the clones of the libraries can be gridded onto nitrocellulose filters and microarrays.
We have modified a pUC-based vector for library propagation (Fig. 9A). pUC plasmids are high copy due to a point mutation (G to A) immediately preceding the RNA I (antisense RNA) gene in the origin of replication. This mutation alters the conformation of RNA II (RNA primer), preventing the hybridization of RNA I to RNA II and resulting in an increased rate of DNA replication from the ColEI ori. The phenotypic effects of this mutation are suppressed by lowering the growth temperature to 30°C (97). Thus the ColEI origin of replication in pUC is temperature sensitive, and its copy number can be controlled by different growth temperatures for safe propagation of large inserts. We have deleted all nonessential regions from pUC to construct the pMD1 vector. We have confirmed that the new vector is temperature sensitive (Fig. 9B) and allows propagation of inserts as large as 16.8 kbp (Fig. 9C). This vector allows the generation of directionally cloned cDNA libraries utilizing the XhoI and BstXI restriction enzyme sites.
|
Normalization and Subtraction Strategies
Reassociation-kinetic analysis indicates that the mRNAs of a typical cell are distributed into three frequency classes: a highly abundant class consisting of 1015 mRNAs that together represent 1020% of the total mRNA mass, a middle abundance class consisting of 1,0002,000 mRNAs making up 3040% of the total mRNA, and a low-abundance class consisting of 15,00020,000 mRNAs covering
50% of the total mRNA (18). Hence, sequencing cDNAs from standard libraries is not an efficient approach for novel gene discovery or for identifying rare cDNAs. Thus for gene identification endeavors, normalization of cDNA libraries has been an important tool (140). Reassociation kinetics has been successfully used toward this end (23, 85, 119, 139). Indeed, an evaluation of the extent of normalization has indicated that, from an extreme range of abundance of four orders of magnitude in an original library, the frequency of occurrence of any clone examined in the normalized library can be brought within the narrow range of only one order of magnitude (139). However, many normalization procedures were not developed with the aim of selecting or maintaining large cDNA coding regions and thus involve library amplification, complex in vitro manipulations, and retransformation of normalized clones. Indeed, the length of some cDNAs are noticeably shortened in some normalized libraries (23). Potentially severe problems are associated with the need to use both amplified (rather than primary) libraries and protocols involving exponential amplification of complex pools of inserts.
Recently, Carninci et al. (35) have described a different approach to gene normalization and subtraction. In their approach, they have incorporated a methodology that appears to be compatible with maintaining selection for large cDNA clones. They normalize their cDNA preparations using an aliquot of biotinylated cellular mRNA that they prepare (utilizing a Rot of 10) by removing RNA/cDNA duplexes with streptavidin-coated magnetic beads. Additionally, they have developed a subtraction approach where nonredundant clones from the RIKEN libraries are used to generate biotinylated mRNA using in vitro transcription reactions. This RNA is then hybridized to first-strand product at a Rot of 5, and duplexes are removed utilizing streptavidin-coated magnetic beads (35). They clearly demonstrated the efficacy of their normalization and subtraction approach. One concern that needs to be monitored is the nonspecific removal of rare transcripts during the normalization steps.
| AFFINITY SELECTION OF FULL-LENGTH COMPLEMENTARY DNAS |
|---|
|
|
|---|
We have developed an affinity chromatography procedure, called CAPture, that allows for the purification of mRNA via the cap structure (44). Previous to this, several laboratories had used antibodies directed against the cap structure to select and purify eukaryotic mRNAs via their 5' end. Schwer et al. (134) and de Magistris and Stunnenberg (42) used a polyclonal anti-m7G antibody to demonstrate that vaccinia virus late mRNAs are discontinuously synthesized. Muhlrad et al. (109) used this antibody to define an mRNA decay pathway in which polyadenylation leads to decapping. However, this antibody resource is limiting, and the efficiency of cap selection is not known. In related studies, a monoclonal antibody directed against 2,2,7-trimethylguanosine was generated and used to purify snRNAs (19) and to demonstrate the presence of a caplike structure at the 5' end of mutant ß-globin transcripts (96). Although the antibody is not limiting, its use in purifying capped mRNAs is somewhat restricted due to its 15- to 20-fold lower affinity for m7G cap structures relative to trimethylated caps (19, 96). Nonetheless, these experiments demonstrated the feasibility of selecting mRNA molecules via their cap structure. The use of a bifunctional cap binding protein, such as protein A/meIF-4E, provides a highly efficient and specific method for cap selection (44). Physical immobilization of the eIF-4E cap binding protein produces an affinity column that is capable of binding to mRNA cap structures with high affinity (44) and which has been used in several analytical procedures (50, 105). In addition, following first-strand synthesis, one can treat the cDNA/mRNA hybrids with single strand-specific nucleases (such as RNase A) to remove the cap structure from mRNA in duplexes where the cDNA has not progressed all the way to the 5' end. In FL cDNA/mRNA hybrids, the mRNA is protected, and consequently only these hybrids will retain a cap structure. Selection by affinity chromatography utilizing immobilized eIF-4E leads to enrichment of FL cDNAs (44).
One shortfall of this approach that we have noticed is that secondary structure immediately adjacent to the cap structure lowers the efficiency of selection (J. Pelletier, data not shown). This is consistent with studies on eukaryotic translation indicating that the affinity of eIF-4E is rate limiting for mRNAs with inaccessible cap structures, compared with those where the cap structure is accessible (94). Hence, the lowered efficiency of CAPture for cDNA/mRNA duplexes can partly be due to a decreased affinity of eIF-4E for the m7G group due to steric hindrance from the 3' terminal nucleotide from the DNA strand of the duplex. This possibility is based on data from the cocrystal structure of eIF-4E bound to m7GDP (103). eIF-4E is shaped like a cupped hand with the base of the m7GDP ligand buried deep in the cup and the phosphate residues oriented toward the outside of the cup (103). It is predicted from the crystal structure of eIF-4E that the last template-derived base of the cDNA would abut against the first ß-sheet of the palm of the eIF-4E structure and is likely to interfere with binding. The situation is worsened by the fact that RTs, such as Superscript II, add additional non-template-derived nucleotides to the 3' ends of cDNAs. The use of other cap binding proteins, as well as mutants of eIF-4E, in the CAPture assay is currently under evaluation.
Recently, a variation on CAPture, called cap trapper, has been described in which a biotin group is introduced into the diol residue of the cap structure (after oxidation of the ribose moiety with NaIO4). Following treatment of the cDNA/mRNA duplex with RNase ONE (a single-strand-specific nuclease), selection of FL cDNAs is undertaken utilizing streptavidin-coated magnetic beads (33). Although this method overcomes the limitations described above for CAPture, in our hands, this method has not proven to be specific. That is, if we compare the ability of cap trapper to discriminate between cDNA duplex with capped mRNA (generated in vitro) or duplexed with uncapped mRNA (generated in vitro), then we are unable to obtain specific selection of capped over uncapped transcripts (J. Pelletier, data not shown). This is likely due to the fact that biotin-hydrazide can also react with unoxidized RNA due to incipient reaction of cytosine residues (66, 142). Hence, addition of biotin is not solely directed toward the cap structure. Also, it is important to note that the oxidation reaction with NaIO4 is difficult to control, and the molar ratio of periodate to substrate is important, otherwise one gets destruction of base rings (142, 151).
Many libraries have been generated with the cap trapper procedure, indicating that this technique has been well adopted to a cDNA cloning program currently in place at RIKEN (35). It is difficult for us to assess whether this approach is truly enriching for FL sequences. A perusal of the average cDNA insert size of cap trapper libraries suggests that many of the clones in these libraries are of the same size as clones from commercially available libraries. An additional concern with these 5' end affinity approaches is that they may actually select against certain genes. Thus genes with exceptionally difficult 5' ends (high GC content, abundant secondary structure) that will block RT processivity should not be present in cap trapper libraries. Hence, our experience with procedures aimed at enriching for FL cDNAs indicate that these shortcomings first need to be addressed and resolved before they are integrated into cDNA library synthesis protocols.
| AUTOMATION AND MINIATURIZATION OF THE PROCESS |
|---|
|
|
|---|