|
|
||||||||
1 Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokahama Institute, Yokohama City, Kanagawa 230-0045, Japan
2 Department of Biotechnology, University of Tokyo, Tokyo 113-8657, Japan
3 Core Research for Evolutional Science and Technology, Japan Science and Technology Corporation, Ibaraki 305-0074
4 Cooperative Graduate School of Medicine, Tsukuba University, Ibaraki 305-8575, Japan
| ABSTRACT |
|---|
|
|
|---|
expression profiling; data processing; cluster analysis
| INTRODUCTION |
|---|
|
|
|---|
| MATERIALS AND METHODS |
|---|
|
|
|---|
Experimental conditions.
mRNA extracted from the 49 tissues was labeled by incorporating Cy3 during random-primed reverse transcription. cDNA derived from entire day E17.5 embryos, which we labeled with Cy5, was used as the expression reference for all tissues. The labeling was carried out at 42°C for 1 h in a total volume of 30 µl containing 400 U SuperScript II (GIBCO BRL); 0.5 mM each dATP, dCTP, and dGTP; 0.2 mM dTTP, 10 mM DTT, 6 µl of 5x first-strand buffer, and 6 µg random primers. To remove unincorporated nucleotide, we labeled cDNA with 500 µl of binding buffer [5 M guanidine-SCN, 10 mM Tris (pH 7.0), 0.1 mM EDTA, 0.03% gelatin, and 2 ng/µl tRNA] and 50 µl of silica matrix buffer [10% matrix, 3.5 M guanidine chloride, 20% glycerol, 0.1 mM EDTA, and 200 mM sodium acetate (pH 4.85.0)], transferred the mixture to a GFX column (Amersham Pharmacia), and centrifuged it at 15,000 rpm for 30 s. The flow-through material was discarded, and the column was washed with 500 µl of wash buffer. The adsorbed probe was eluted with distilled water into a final volume of 17 µl. This labeled probe was mixed with blocking solution containing 3 µl of 10 µg/µl oligo-dA, 3 µl of 20 µg/µl yeast tRNA, 1 µl of 20 µg/µl mouse Cot1 DNA, 5.1 µl of 20x SSC, and 0.9 µl of 10% SDS. The RIKEN full-length mouse cDNA that comprised the target was hybridized in a final volume of 30 µl: the entire array consisted of three multi-blocks, and each multi-block required 10 µl of hybridization solution. Before hybridization, probe aliquots were heated at 95°C for 1 min and cooled to room temperature. Coverslips were hybridized overnight at 65°C in a Hybricasette. After hybridization, slides were washed in 2x SSC, 0.1% SDS until the coverslips dropped off. The slides were then transferred into 1x SSC, shaken gently for 2 min, and rinsed with 0.1x SSC for 2 min. After the slides were washed, they were spun in a centrifuge. These slides were scanned on a ScanArray 5000 confocal laser scanner, and the images were analyzed by using ImaGene (BioDiscovery). Each spot was defined by manually positioning a grid of circles over the array image, and spots deemed unsuitable for accurate quantitation because of array artifacts were flagged and excluded from analysis. cDNA clones not amplified by PCR were also excluded. The duplicate experiments were performed using the same template mRNA, and the labeling was done separately for each experiment followed by hybridization.
Filtering procedure.
The filtering procedure started with the ImaGene output file was applied to those spots that were correctly amplified by PCR reaction (about 14,000 genes). This procedure consisted of three steps (Fig. 1): 1) eliminate spots that were flagged manually; 2) eliminate spots whose signal intensity is less than µbg + x
bg (x
0) in both channels, where µbg and
bg are the mean and standard deviation of the background signal intensity; 3) eliminate spots more than y
from the least-mean-square line. The parameters x and y were determined so that the value of NR has the maximum score (Smax) at each step (please refer the figure in the Supplemental Material1 for this article, published online at the Physiological Genomics web site).
|
| RESULTS |
|---|
|
|
|---|
As shown in Fig. 1, PRIM is composed of three steps. The flagged data is first excluded, and then spots having a signal intensity less than µbg + x
bg are excluded. Finally, the spots that are farther than y
from the best-fit line are excluded. PRIM is designed to set up the N and R values so that their product is as large as possible.
For each of the 49 tissues, the increase of final correlation coefficient R at each step can be seen by comparing the three curves in Fig. 2. There is a tendency that the lower the R value of the initial data (just after the removal of manually flagged spots), the larger is the increase of the R. In all cases the R value after the PRIM filtration was greater than that after the first filtration. The average R value after the first filtration was 0.707, and the average R value after the third filtration was 0.780. The number of the same terminal branches formed in the hierarchical clustering is a good index of the reproducibility of the duplicate experiments. Figure 3 shows dendrograms of initial data (Fig. 3A) and filtered data (Fig. 3B) with complete linkage hierarchical clustering. The filtered data formed 47 same terminal branches out of 49 sets, whereas the initial data formed only 44 same terminal branches. These results clearly show that we can get a better result after the PRIM filtration.
|
|
|
|
| DISCUSSION |
|---|
|
|
|---|
The feasibility of our method was demonstrated by the R for the filtered data being higher than the R for the initial data (Fig. 2) and by the number of duplicated sets with the same terminal branches being higher for the filtered data than for the initial data. In Fig. 2 the increase of the correlation coefficient after the PRIM filtration tends to be larger when the initial correlation coefficient is lower. This means our filtration method is especially effective when the reproducibility of the initial data set is relatively low. The minimum value of R that results in the formation of the same terminal branches in a duplicate experiment is about 0.7 (Fig. 3B) in these data sets. In other words, most of the results of duplicate experiments are clustered in the same terminal branches if the R value was greater than 0.7. In this sense our method is good, because in most cases we can extract data sets for which R is greater than 0.7. This is important because tumor samples from a patient are sometimes scarce, and we cannot always repeat the experiment (2). The extraction of useful information from data sets having relatively low R values is especially valuable in these cases. The bone tissue in the present work is a good example: the R value was only 0.4382 for the initial bone tissue data but was 0.7503 after the third filtration (Fig. 3). The bone was not clustered in the same terminal branch before the filtration but was clustered in the same terminal branch after the filtration.
The formation of the same terminal branches in duplicate experiments is a good criterion for evaluating the usefulness of a filtering method (5). We used this criterion to compare our method with previously reported filtering methods. As shown by the values listed in Table 1, with the method of White et al. (9), the R is highest, but the N is lowest. With the method of Alizadeh et al. (1), the N value is the highest, but the R value is not high, and the number of same terminal branches was lower than it was with our method. With regard to having as many same terminal branches as possible and a high N value, our method was the best. The two sets of tissues that did not form the same terminal branches were E11 (11-day-old embryo) and cortex (Fig. 3B). The E11 was clustered in a group containing E11 head and olfactory brain. Also, the cortex was clustered in a group of cerebellum and adult brain. These tissues share many cell types in common, so they probably need a higher R score to distinguish them from each other.
Duplicate sets of initial data and data after PRIM filtration were also compared from the biological point of view. The duplicate sets of data were averaged and used for this analysis (Fig. 4, A and B). The clusters obtained after PRIM filtration (Fig. 4B) seem to be a more reasonable than those obtained from the initial data (Fig. 4A). For example, in Fig. 4B the small intestine is clustered close to the colon and cecum tree, which is more reasonable than the cluster tree of stomach with colon and cecum shown in Fig. 4A. Other examples of results more reasonable after data filtration are the N0 (neonatal day 0) head clustered next to the tree of N10 and N6 head, the liver and liver tumor clustered in the same tree, lung and N0 lung also being in the same terminal branch. All of these results obtained after filtration are, from a biological point of view, more reasonable than the results obtained from the initial data.
Finally, PRIM is the first method that applies the concept of extracting highly reproducible data with the consideration of higher N value. The feasibility was demonstrated with a large set of data. The program was designed for use with duplicate data sets but is in principle applicable to data sets made up of triplicate samples or even more replicate samples. In fact, we confirmed that PRIM filtration for any combination of two experiments among triplicate data set (E14 liver, skeletal muscle, and eyeball) produced high R values, and the three average data of each were clustered in the same group (data not shown). The program is also applicable, with a few modifications, to oligo DNA chip data. We strongly recommend the PRIM strategy be widely used for data filtration.
| ACKNOWLEDGMENTS |
|---|
This study was supported in part by Special Coordination Funds for Promoting Science and Technology from the Science and Technology Agency of the Japanese Government to Y. Okazaki. This study was also supported by Special Coordination Funds and a Research Grant for the RIKEN Genome Exploration Research Project, Core Research for Evolutional Science and Technology, and Research and Development for Applying Advanced Computational Science and Technology of Japan Science and Technology Corporation to Y. Hayashizaki. This work was also supported by a Grant-in-Aid for scientific Research on Priority Areas and Human Genome Program, from the Ministry of Education, Science and Culture, and by a Grant-in-Aid for a Second Term Comprehensive 10-Year Strategy for Cancer Control from the Ministry of Health and Welfare to Y. Hayashizaki.
| FOOTNOTES |
|---|
Address for reprint requests and other correspondence: Y. Okazaki, Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokahama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa 230-0045, Japan (E-mail: okazaki{at}gsc.riken.go.jp and rgscerg{at}gsc.riken.go.jp).
1 Supplemental material to this article is available online at http://physiolgenomics.physiology.org/cgi/content/full/4/3/183/DC1. ![]()
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
K. Mochizuki, A. Nishiyama, M. K. Jang, A. Dey, A. Ghosh, T. Tamura, H. Natsume, H. Yao, and K. Ozato The Bromodomain Protein Brd4 Stimulates G1 Gene Transcription and Promotes Progression to S Phase J. Biol. Chem., April 4, 2008; 283(14): 9040 - 9048. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Sakakura, K. Hasegawa, K. Miyagawa, S. Nakashima, T. Yoshikawa, S. Kin, Y. Nakase, S. Yazumi, H. Yamagishi, T. Okanoue, et al. Possible Involvement of RUNX3 Silencing in the Peritoneal Metastases of Gastric Cancers Clin. Cancer Res., September 15, 2005; 11(18): 6479 - 6488. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Sauer, C. Preininger, and R. Hany-Schmatzberger Quick and simple: quality control of microarray data Bioinformatics, April 15, 2005; 21(8): 1572 - 1578. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Tanaka, Y. Tomaru, Y. Nomura, H. Miura, M. Suzuki, and Y. Hayashizaki Comprehensive search for HNF-1{beta}-regulated genes in mouse hepatoma cells perturbed by transcription regulatory factor-targeted RNAi Nucleic Acids Res., May 17, 2004; 32(9): 2740 - 2750. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Futaki, Y. Hayashi, M. Yamashita, K. Yagi, H. Bono, Y. Hayashizaki, Y. Okazaki, and K. Sekiguchi Molecular Basis of Constitutive Production of Basement Membrane Components: GENE EXPRESSION PROFILES OF ENGELBRETH-HOLM-SWARM TUMOR AND F9 EMBRYONAL CARCINOMA CELLS J. Biol. Chem., December 12, 2003; 278(50): 50691 - 50701. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Gariboldi, M. Spinola, S. Milani, C. Pignatiello, K. Kadota, H. Bono, Y. Hayashizaki, T. A. Dragani, and Y. Okazaki Gene expression profile of normal lungs predicts genetic predisposition to lung cancer in mice Carcinogenesis, November 1, 2003; 24(11): 1819 - 1826. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Bono, K. Yagi, T. Kasukawa, I. Nikaido, N. Tominaga, R. Miki, Y. Mizuno, Y. Tomaru, H. Goto, H. Nitanda, et al. Systematic Expression Profiling of the Mouse Transcriptome Using RIKEN cDNA Microarrays Genome Res., June 1, 2003; 13(6): 1318 - 1323. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Nikaido, C. Saito, Y. Mizuno, M. Meguro, H. Bono, M. Kadomura, T. Kono, G. A. Morris, P. A. Lyons, M. Oshimura, et al. Discovery of Imprinted Transcripts in the Mouse Transcriptome Using Large-Scale Expression Profiling Genome Res., June 1, 2003; 13(6): 1402 - 1409. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Gariboldi, B. Peissel, A. Fabbri, A. Saran, D. Zaffaroni, F. S. Falvella, M. Spinola, J.-i. Tanuma, S. Pazzaglia, M. T. Mancuso, et al. SCCA2-like Serpins Mediate Genetic Predisposition to Skin Tumors Cancer Res., April 15, 2003; 63(8): 1871 - 1875. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Kadota, S.-I. Nishimura, H. Bono, S. Nakamura, Y. Hayashizaki, Y. Okazaki, and K. Takahashi Detection of genes with tissue-specific expression patterns using Akaike's information criterion procedure Physiol Genomics, February 6, 2003; 12(3): 251 - 259. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Bono, T. Kasukawa, Y. Hayashizaki, and Y. Okazaki READ: RIKEN Expression Array Database Nucleic Acids Res., January 1, 2002; 30(1): 211 - 213. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Comander, G. M. Weber, M. A. Gimbrone Jr., and G. Garcia-Cardena Argus---A New Database System for Web-Based Analysis of Multiple Microarray Data Sets Genome Res., September 1, 2001; 11(9): 1603 - 1610. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Miki, K. Kadota, H. Bono, Y. Mizuno, Y. Tomaru, P. Carninci, M. Itoh, K. Shibata, J. Kawai, H. Konno, et al. Delineating developmental and metabolic pathways in vivo by expression profiling using the RIKEN set of 18,816 full-length enriched mouse cDNA arrays PNAS, February 27, 2001; 98(5): 2199 - 2204. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Visit Other APS Journals Online |