Physiol. Genomics 28: 141-145, 2007.
First published November 14, 2006; doi:10.1152/physiolgenomics.00097.2006
1094-8341/07 $8.00
Received 31 May 2006;
accepted in final form 1 November 2006.
Physiological Genomics 28:141-145 (2007)
1094-8341/07 $8.00 © 2007 American Physiological Society
Call For Papers: 2nd International Symposium on Animal Functional Genomics
Database for chicken full-length cDNAs
Yong Wang
,
Zhenggang Wang
,
Juan Li
,
Yajun Wang
and
Frederick C. C. Leung
Department of Zoology, The University of Hong Kong, Pokfulam, Hong Kong Special Administrative Region, China
ABSTRACT
The generation of full-length cDNA databases is essential for functional genomics studies as well as for correct annotation of species genomic sequences. Human and mouse full-length cDNA projects have provided the biomedical research community with a large amount of gene information. Recent completion of the chicken genome sequence draft now enables a similar full-length cDNA project to be initiated for this species. In this report, we introduce the development of a chicken full-length cDNA database, which will facilitate future research work in this biological system. In this project, chicken expressed sequence tags (ESTs) were aligned onto human and mouse full-length cDNAs (or open reading frames) on the basis of their similarity. More than 588,000 chicken ESTs were aligned to
170,000 full-length human and mouse templates obtained from the NEDO, RIKEN, and MGC databases. Many of these templates have known biological functions, and their orthologous chicken genes in the EMBL database are also provided in our database, which is available at http://bioinfo.hku.hk/chicken/. We will continue to collect known chicken full-length cDNAs to update the database for public use. The cDNA alignment results presented herein and on our database will be useful for animal science and veterinary researchers wishing to clone and to confirm full-length chicken cDNAs of interest.
cDNA; expressed sequence tag; alignment
THE CHICKEN IS A MODEL ORGANISM of both scientific and economic value and its genome structure, gene expression, and gene function have been extensively studied in relation to evolutionary and developmental biology and genetic improvement of economically important traits. A draft of the chicken genome sequence was first released in 2004 (5). Isolation of the full-length cDNAs and their splice variants is one of the major tasks in the postgenomic era. The human and mouse full-length cDNA projects have yielded large amounts of valuable information for biomedical researchers who employ functional genomics approaches to study key biological processes (68). However, the limited number of characterized chicken full-length cDNAs available in public database makes it difficult to perform systematic studies on gene expression and function in target tissues of this economically significant agricultural species, highlighting the need to establish a chicken full-length cDNA database.
The usual approach for individual researchers trying to predict the sequence of a full-length cDNA is first to explore the species' genomic DNA sequence and then use the in silico results for PCR primer design and empirical confirmation using experimental approaches. However, the current version of the chicken genome (assembly 6) still has many gaps and assembly errors, making it difficult to perform in silico analysis for PCR primer design and prohibiting a panoramic view of the structure of genes on the genome.
In this paper we demonstrate an alternative approach for predicting chicken full-length cDNAs. We did this by aligning chicken expressed sequence tags (ESTs) onto human and mouse full-length cDNAs and open reading frames (ORFs). All data were then used to build a chicken full-length cDNA database, which currently includes basic local alignment search tool (BLAST) alignment results for >588,000 chicken ESTs over
170,000 full-length templates derived from the New Energy and Industrial Technology Development Organization (NEDO), Rikagaku Kenkyusho (RIKEN), and Mammalian Gene Collection (MGC) databases. Subsequent experimental work will help to fill the gaps and check the quality of the predicted full-length cDNAs in this chicken database.1
MATERIALS AND METHODS
Alignment of chicken ESTs on human and mouse templates.
Chicken ESTs (588,739) were collected from the National Center for Biotechnology Information (NCBI) database (http://www.ncbi.nlm.nih.gov/), including 517,727 ESTs derived from the Delaware ChickEST project (3). Over 30,000 human full-length cDNAs were downloaded from the NCBI and NEDO databases (http://cdna.ims.u-tokyo.ac.jp) (8) and >100,000 mouse full-length cDNAs were obtained from the NCBI and RIKEN databases (http://genome.gsc.riken.go.jp) (6). We also obtained
38,000 human and mouse full-length ORFs from the MGC database (http://mgc.nci.nih.gov) (1, 7). Each full-length cDNA or ORF was used as template to find homologous chicken ESTs through BLAST searches. ESTs with matching sequences >60 bp in length (E-value<threshold) were stored for display. In addition, ESTs were selected if the distance between homologous fragments (<60 bp) of these were identical to those between the matching pieces of template (Fig. 1).

View larger version (26K):
[in this window]
[in a new window]
|
Fig. 1. Flow chart illustrating how chicken expressed sequence tag (EST) alignments were obtained using human and mouse templates. The template sequence is a full-length cDNA or open reading frame (ORF). A chicken EST would be selected if matching sequence is longer than 60 bp and the E-value is smaller than threshold. Although many basic local alignment search tool (BLAST) alignment fragments of an EST are shorter than 60 bp, ESTs were also obtained if the matched neighboring portions (E-value<threshold) were separated by the same distance as their alignment gap on the template.
|
|
Orthologous genes to human and mouse templates on the chicken genome.
Orthologous gene tables of chicken-human and chicken-mouse were retrieved from the European Molecular Biology Laboratory (EMBL) database (http://www.ensembl.org), including genomic locations of the genes and accession numbers of their corresponding protein sequences. In a two-step procedure described in Fig. 2, the orthologous chicken genes matching the human and mouse templates on the chicken genome were identified. In the first step, alignment positions of human and mouse templates on their genomes were found in the EMBL. Those with alignment positions within or overlapping gene positions listed in the orthologous table were selected. In the second step, the encoded peptides of the selected templates were collected from their files in GenBank format. These peptides were then used to compare with the protein sequences at the same genomic locations. Templates with protein similarity <50% with a gene in orthologous tables were retained. Protein similarity was evaluated using the p-match algorithm (4). With this two-step procedure, EMBL human or mouse genes identical to the templates were identified with evidence of both genomic position and protein similarity. Then, orthologous EMBL chicken genes that match to the templates could be found using the orthologous tables as references.

View larger version (24K):
[in this window]
[in a new window]
|
Fig. 2. Outline of data collection from the chicken full-length cDNA database. The alignment positions were obtained from the European Molecular Biology Laboratory (EMBL). MGC, Mammalian Gene collection; NEDO, New Energy and Industrial Technology Development Organization. "Template" means a full-length cDNA or ORF.
|
|
RESULTS AND DISCUSSION
The public website of the chicken full-length cDNA database developed in this study is: http://bioinfo.hku.hk/chicken/. Alignments between chicken ESTs and human and mouse templates can be obtained from this database in multiple ways. All full-length cDNAs and ORFs are presented in tables. Some of these have links to alignment information, and a few of them have links to orthologous chicken genes in the EMBL. Using search engines of the database, users are able to find the templates or interested ESTs through search queries of keywords such as accession number, organism, clone, or tissue (Fig. 3). The database also provides a BLAST search engine. Using a sequence of interest, one can find chicken full-length cDNAs or ORFs and ESTs. Because it is difficult to obtain target full-length cDNA or ORF by searching gene names, the BLAST search approach is recommended when the DNA sequence of a gene of interest is known.
An example of a graphic display of alignments between chicken ESTs and full-length templates of human is shown in Fig. 4. Alignments with different E-values are distinguished by colors. The positions and sequences of the similar parts between ESTs and templates are exhibited in a frame.

View larger version (15K):
[in this window]
[in a new window]
|
Fig. 4. Alignments of chicken ESTs on a full-length cDNA. The colors of the bars designate E-values of the sequence alignments.
|
|
In addition, the database houses information on 2,004 experimentally confirmed chicken genes. In the EMBL, gene information is found under the "description" icon. This information derives from experimental evidence published in scientific papers. In our database, known chicken full-length cDNAs that currently include 2,327 identified by RIKEN groups (2) are available for download.
We found 4,658 and 7,324 EMBL chicken genes that could be linked to human full-length templates from NEDO and MGC, respectively. High similarity between the chicken genes and the templates was not expected. In this study, we used protein coverage percentage in p-match alignment to assess the homology between the chicken genes and the full-length templates. More than 40% protein coverage was observed in 58% of the identified EMBL chicken genes in alignments to their human templates in NEDO. A much higher percent (74%) of the chicken genes shows >40% protein coverage in alignments to human templates from MGC (Fig. 5). The ORFs for NEDO full-length cDNAs were probably predicted by software tools and therefore contained some incorrect predictions. The presence of incorrect ORFs can explain the lower protein coverage in alignments of NEDO templates. Therefore, our findings on the chicken genome for the full-length templates can serve as references for experimental design, but further experimental confirmation of them is appropriate and required.

View larger version (18K):
[in this window]
[in a new window]
|
Fig. 5. Chicken gene coverage percentages for protein alignments on orthologous human templates. The pie charts show percentages of chicken genes that fall into 5 ranges of coverage among results of peptide alignments on orthologous human templates (A: NEDO human full-length cDNAs; B: MGC-Human full-length ORFs). The coverage percentage was measured as length of homologous parts to that of the whole peptide sequence.
|
|
The database described in this study has been used to predict full-length cDNAs of chicken genes. The predictions for two such cDNAs were then subjected to experimental verification, chicken STAT3 (AY641397) and chicken SMAD1 (AY953143). Two human MGC full-length ORFs, BC000627 and BC001878, were identified by a search of gene names, "STAT3" and "SMAD1". Figures 6 and 7 show that many chicken ESTs align with the full-length ORFs of human STAT3 and SMAD1. In Fig. 6, ESTs 15 cover the whole ORF region of human STAT3, and thus chicken full-length cDNA of STAT3 can be predicted directly based on our alignment results. After further experiments on gene characterization, we noticed that full-length cDNA of chicken STAT3 is also 92% similar to the predicted one. In the case of SMAD1, two ESTs were not sufficient to predict the full-length cDNA (Fig. 7). However, this alignment information from the database allowed the design of specific primers to amplify the full-length chicken SMAD1 cDNA.

View larger version (8K):
[in this window]
[in a new window]
|
Fig. 6. Chicken ESTs on human MGC full-length ORF BC000627. BC000627 is the full-length ORF for the human STAT3 gene. Chicken ESTs with GenBank IDs of 25752517, 25486940, 53895356, 53895346, and 25924503 were labeled with numbers 15, respectively. Alignment and linkage of these ESTs allowed prediction of the corresponding full-length cDNA for chicken STAT3.
|
|

View larger version (5K):
[in this window]
[in a new window]
|
Fig. 7. Chicken ESTs on human MGC full-length ORF BC001878. BC0001878 is the full-length ORF for the human SMAD1 gene. Chicken ESTs 1 (green) and 2 (red) were used to design primers for amplifying the full-length cDNA for chicken SMAD1.
|
|
To conclude, the chicken full-length cDNA database described herein was developed with the aim of providing a platform for researchers that bridges bioinformatics and biology. Main sources of data included in the database were obtained from two chicken EST databases: BBSRC (http://chick.umist.ac.uk/) and Delaware Chick EST (http://www.chickest.udel.edu/), which are continually updated with EST information derived from different tissues and developmental stages of chickens. In the Delaware database, the chicken ESTs from different tissues are even assembled based on similarity, helping to accurately annotate the chicken genome and develop ESTs for microarray spotting (3). Biological researchers now will be responsible to examine the predicted cDNAs and return feedback as to their accuracy to the database. Full-length cDNAs confirmed in this way can then be used to detect gene expression differences and splice variants in different chicken tissues and over different developmental stages. In this way, the database will ultimately provide reliable, accurate gene annotations for the chicken genome. With these virtues, databases such as this could attract more researchers to select chicken as a model species for their studies.
At present, 34 eukaryotic genome-sequencing projects have been finished (http://www.genomesonline.org/). Apart from the human and mouse genomes, on which redundant sequencing works were done, most of the genomes remain at assembly 5 or 6 (5x-6x coverage), including the chicken genome. However, such draft genomes have proven extremely informative, allowing us to profile most basic molecular features contained inside them. Even so, draft genomes are not sufficient for comprehensive studies on genes, but it will be economically unfeasible to fully sequencing these genomes as was done with human and mouse. Thus, our full-length cDNA database contributes a valuable resource to chicken researchers that may also serve as a model for other economically and societally valuable animal species.
GRANTS
This work was partly funded and supported by Research Grant Council of the Hong Kong Government, HKU7345/03M; HKU Faculty of Science Research Development Fund; and The University of Hong Kong Research Development Fund for Strategic Research Theme on Genomics, Proteomics and Bio-informatics Grant 10206152-11222-21700-302-01.
ACKNOWLEDGMENTS
We thank Tommy Lam and staff in computer centre of the University of Hong Kong for technical support in database construction. We appreciate critical reading of this manuscript by Drs. Rajesh Jeewon, Jeanne Burton, and Guilherme Rosa.
FOOTNOTES
Address for reprint requests and other correspondence: Frederick C. C. Leung, Dept. of Zoology, Univ. of Hong Kong, Pokfulam, Hong Kong SAR, China (e-mail fcleung{at}hkucc.hku.hk).
Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).
1 The 2nd International Symposium on Animal Functional Genomics was held May 1619, 2006 at Michigan State University in East Lansing, MI, and was organized by Jeanne Burton of Michigan State University and Guilherme J. M. Rosa of University of Wisconsin-Madison (see meeting report by Drs. Burton and Rosa, Physiol Genomics 28: 14, 2006). 
REFERENCES
- Baross A, Butterfield YSN, Coughlin SM, Zeng T, Griffith M, Griffith OL, Petrescu AS, Smailus DE, Khattra J, McDonald HL, McKay SJ, Moksa M, Holt RA, Marra MA. Systematic recovery and analysis of full-ORF human cDNA clones. Genome Res 14: 20832092, 2004.[Abstract/Free Full Text]
- Caldwell R, Kierzek A, Arakawa H, Bezzubov Y, Zaim J, Fiedler P, Kutter S, Blagodatski A, Kostovska D, Koter M, Plachy J, Carninci P, Hayashizaki Y, Buerstedde JM. Full-length cDNAs from chicken bursal lymphocytes to facilitate gene function analysis. Genome Biol 6: R6, 2004.
- Carre W, Wang X, Porter TE, Nys Y, Tang J, Bernberg E, Morgan Robin Burnside J, Aggrey SE, Simon J, Cogburn LA. Chicken genomics resource: sequencing and annotation of 35,407 ESTs from single and multiple tissue cDNA libraries and CAP3 assembly of a chicken gene index. Physiol Genomics 25: 514524, 2006.[Abstract/Free Full Text]
- Curwen V, Eyras E, Andrews TD, Clarke L, Mongin E, Searle SMJ, Clamp M. The Ensembl automatic gene annotation system. Genome Res 14: 942950, 2004.[Abstract/Free Full Text]
- International Chicken Sequencing Consortium. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432: 695716, 2004.[CrossRef][Medline]
- Kawai J, Shinagawa A, Shibata K, Yoshino M, Itoh M, Ishii Y, Arakawa T, Hara A, Fukunishi Y, Konno H, Adachi J, Fukuda S, Aizawa K, Izawa M, Nishi K, Kiyosawa H, Kondo S, Yamanaka I, Saito T. Functional annotation of a full-length mouse cDNA collection. Nature 409: 685, 2001.[CrossRef][Medline]
- Mammalian Gene Collection (MGC) Program Team. Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. Proc Natl Acad Sci USA 99: 1689916903, 2002.[Abstract/Free Full Text]
- Ota T, Suzuki Y, Nishikawa T, Otsuki T, Sugiyama T, Irie R, Wakamatsu A, Hayashi K, Sato H, Nagai K, Kimura K, Makita H, Sekine M, Obayashi M, Nishi T, Shibahara T, Tanaka T, Ishii S, Yamamoto Ji, Saito K, Kawai Y, Isono Y, Nakamura Y, Nagahari K, Murakami K, Yasuda T, Iwayanagi T, Wagatsuma M, Shiratori A, Sudo H, Hosoiri T, Kaku Y, Kodaira H, Kondo H, Sugawara M, Takahashi M, Kanda K, Yokoi T, Furuya T, Kikkawa E, Omura Y, Abe K, Kamihara K, Katsuta N, Sato K, Tanikawa M, Yamazaki M, Ninomiya K, Ishibashi T, Yamashita H, Murakawa K, Fujimori K, Tanai H, Kimata M, Watanabe M, Hiraoka S, Chiba Y, Ishida S, Ono Y, Takiguchi S, Watanabe S, Yosida M, Hotuta T, Kusano J, Kanehori K, Takahashi-Fujii A, Hara H, Tanase To, Nomura Y, Togiya S, Komai F, Hara R, Takeuchi K, Arita M, Imose N, Musashino K, Yuuki H, Oshima A, Sasaki N, Aotsuka S, Yoshikawa Y, Matsunawa H, Ichihara T, Shiohata N, Sano S, Moriya S, Momiyama H, Satoh N, Takami S, Terashima Y, Suzuki O, Nakagawa S, Senoh A, Mizoguchi H, Goto Y, Shimizu F, Wakebe H, Hishigaki H, Watanabe T, Sugiyama A, Takemoto M, Kawakami B, Yamazaki M, Watanabe K, Kumagai A, Itakura S, Fukuzumi Y, Fujimori Y, Komiyama M, Tashiro H, Tanigami A, Fujiwara T, Ono T, Yamada K, Fujii Y, Ozaki K, Hirao M, Ohmori Y, Kawabata A, Hikiji T, Kobatake N, Inagaki H, Ikema Y, Okamoto S, Okitani R, Kawakami T, Noguchi S, Itoh T, Shigeta K, Senba T, Matsumura K, Nakajima Y, Mizuno T, Morinaga M, Sasaki M, Togashi T, Oyama M, Hata H, Watanabe M, Komatsu T, Mizushima-Sugano J, Satoh T, Shirai Y, Takahashi Y, Nakagawa K, Okumura K, Nagase T, Nomura N, Kikuchi H, Masuho Y, Yamashita R, Nakai K, Yada T, Nakamura Y, Ohara O, Isogai T, Sugano S. Complete sequencing and characterization of 21,243 full-length human cDNAs. Nat Genet 36: 40, 2004.[CrossRef][Web of Science][Medline]
This article has been cited by other articles:

|
 |

|
 |
 
D. W. Burt
Emergence of the Chicken as a Model Organism: Implications for Agriculture and Biology
Poult. Sci.,
July 1, 2007;
86(7):
1460 - 1471.
[Abstract]
[Full Text]
[PDF]
|
 |
|
Copyright © 2007 by the American Physiological Society.