R语言中RTCGA包的基本用法

R语言中的RTCGA包用于下载并分析TCGA上的数据,如要查看RTCGA的示例用法,可以使用如下命令打开”RTCGA package workflow”

1
2
3
4
> browseVignettes("RTCGA")
# 运行之后打开下面这个页面:[RTCGA package workflow](http://127.0.0.1:14929/session/Rvig.2eb41b5b1ec3.html)
# 如果要查看RTCGA.rnaseq的示例,则使用如下命令:
> browseVignettes("RTCGA.rnaseq")

一、查看可用数据集

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
> library(RTCGA)
> x <- infoTCGA()
> class(x)
[1] "data.frame"
> nrow(x)
[1] 38
> x
Cohort BCR Clinical CN LowP Methylation mRNA mRNASeq miR miRSeq
ACC-counts ACC 92 92 90 0 80 0 79 0 80
BLCA-counts BLCA 412 412 410 112 412 0 408 0 409
BRCA-counts BRCA 1098 1097 1089 19 1097 526 1093 0 1078
CESC-counts CESC 307 307 295 50 307 0 304 0 307
CHOL-counts CHOL 51 45 36 0 36 0 36 0 36
COAD-counts COAD 460 458 451 69 457 153 457 0 406
COADREAD-counts COADREAD 631 629 616 104 622 222 623 0 549
DLBC-counts DLBC 58 48 48 0 48 0 48 0 47
ESCA-counts ESCA 185 185 184 51 185 0 184 0 184
FPPP-counts FPPP 38 38 0 0 0 0 0 0 23
GBM-counts GBM 613 595 577 0 420 540 160 565 0
GBMLGG-counts GBMLGG 1129 1110 1090 52 936 567 676 565 512
HNSC-counts HNSC 528 528 522 108 528 0 520 0 523
KICH-counts KICH 113 113 66 0 66 0 66 0 66
KIPAN-counts KIPAN 973 941 883 0 892 88 889 0 873
KIRC-counts KIRC 537 537 528 0 535 72 533 0 516
KIRP-counts KIRP 323 291 289 0 291 16 290 0 291
LAML-counts LAML 200 200 197 0 194 0 179 0 188
LGG-counts LGG 516 515 513 52 516 27 516 0 512
LIHC-counts LIHC 377 377 370 0 377 0 371 0 372
LUAD-counts LUAD 585 522 516 120 578 32 515 0 513
LUSC-counts LUSC 504 504 501 0 503 154 501 0 478
MESO-counts MESO 87 87 87 0 87 0 87 0 87
OV-counts OV 602 591 586 0 594 574 304 570 453
PAAD-counts PAAD 185 185 184 0 184 0 178 0 178
PCPG-counts PCPG 179 179 175 0 179 0 179 0 179
PRAD-counts PRAD 499 499 492 115 498 0 497 0 494
READ-counts READ 171 171 165 35 165 69 166 0 143
SARC-counts SARC 261 261 257 0 261 0 259 0 259
SKCM-counts SKCM 470 470 469 118 470 0 469 0 448
STAD-counts STAD 443 443 442 107 443 0 415 0 436
STES-counts STES 628 628 626 158 628 0 599 0 620
TGCT-counts TGCT 150 134 150 0 150 0 150 0 150
THCA-counts THCA 503 503 499 98 503 0 501 0 502
THYM-counts THYM 124 124 123 0 124 0 120 0 124
UCEC-counts UCEC 560 548 540 106 547 54 545 0 538
UCS-counts UCS 57 57 56 0 57 0 57 0 56
UVM-counts UVM 80 80 80 51 80 0 80 0 80
RPPA MAF rawMAF
ACC-counts 46 90 0
BLCA-counts 344 130 395
BRCA-counts 887 977 0
CESC-counts 173 194 0
CHOL-counts 30 35 0
COAD-counts 360 154 367
COADREAD-counts 491 223 489
DLBC-counts 33 48 0
ESCA-counts 126 185 0
FPPP-counts 0 0 0
GBM-counts 238 290 290
GBMLGG-counts 668 576 806
HNSC-counts 212 279 510
KICH-counts 63 66 66
KIPAN-counts 756 644 799
KIRC-counts 478 417 451
KIRP-counts 215 161 282
LAML-counts 0 197 0
LGG-counts 430 286 516
LIHC-counts 63 198 373
LUAD-counts 365 230 542
LUSC-counts 328 178 0
MESO-counts 63 0 0
OV-counts 426 316 469
PAAD-counts 123 150 184
PCPG-counts 80 179 0
PRAD-counts 352 332 498
READ-counts 131 69 122
SARC-counts 223 247 0
SKCM-counts 353 343 366
STAD-counts 357 289 395
STES-counts 483 474 395
TGCT-counts 118 149 0
THCA-counts 222 402 496
THYM-counts 90 123 0
UCEC-counts 440 248 0
UCS-counts 48 57 0
UVM-counts 12 80 0

如上信息也可以在Broad GDAC Firehose的网页中找到。

如果只想查看肿瘤名称,则使用如下语法提取行名,共38种。

1
2
3
4
5
6
7
8
9
10
> cohorts <- infoTCGA() %>% 
+ rownames() %>%
+ sub('-counts', '', x=.)
> cohorts
[1] "ACC" "BLCA" "BRCA" "CESC" "CHOL" "COAD" "COADREAD"
[8] "DLBC" "ESCA" "FPPP" "GBM" "GBMLGG" "HNSC" "KICH"
[15] "KIPAN" "KIRC" "KIRP" "LAML" "LGG" "LIHC" "LUAD"
[22] "LUSC" "MESO" "OV" "PAAD" "PCPG" "PRAD" "READ"
[29] "SARC" "SKCM" "STAD" "STES" "TGCT" "THCA" "THYM"
[36] "UCEC" "UCS" "UVM"

二、数据集发布日期

1
2
3
4
5
6
7
8
9
10
11
12
> checkTCGA('Dates')
[1] "2011-10-26" "2011-11-15" "2011-11-28" "2011-12-06" "2011-12-30" "2012-01-10"
[7] "2012-01-24" "2012-02-17" "2012-03-06" "2012-03-21" "2012-04-12" "2012-04-25"
[13] "2012-05-15" "2012-05-25" "2012-06-06" "2012-06-23" "2012-07-07" "2012-07-25"
[19] "2012-08-04" "2012-08-25" "2012-09-13" "2012-10-04" "2012-10-18" "2012-10-20"
[25] "2012-10-24" "2012-11-02" "2012-11-14" "2012-12-06" "2012-12-21" "2013-01-16"
[31] "2013-02-03" "2013-02-22" "2013-03-09" "2013-03-26" "2013-04-06" "2013-04-21"
[37] "2013-05-08" "2013-05-23" "2013-06-06" "2013-06-23" "2013-07-15" "2013-08-09"
[43] "2013-09-23" "2013-10-10" "2013-11-14" "2013-12-10" "2014-01-15" "2014-02-15"
[49] "2014-03-16" "2014-04-16" "2014-05-18" "2014-06-14" "2014-07-15" "2014-09-02"
[55] "2014-10-17" "2014-12-06" "2015-02-02" "2015-02-04" "2015-04-02" "2015-06-01"
[61] "2015-08-21" "2015-11-01" "2016-01-28"

三、查看特定肿瘤类型的数据集名称和大小

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
> checkTCGA('DataSets',
+ cancerType = 'BRCA',
+ date = '2016-01-28') %>% dim
[1] 43 2
> checkTCGA('DataSets',
+ cancerType = 'BRCA',
+ date = '2016-01-28')
Size
1 37M
2 50K
3 723K
4 135K
5 135K
6 77K
7 78K
8 1.5M
9 57K
10 1.2K
11 160K
12 3.4M
13 86K
14 83M
15 3.2G
16 1.1M
17 15M
18 2.9M
19 44M
20 1.3M
21 2.6G
22 277M
23 195M
24 298M
25 93M
26 869M
27 249M
28 2.8G
29 243M
30 18M
31 18M
32 5.3M
33 4.6M
34 37M
35 399M
36 10M
37 1.1G
38 81M
39 1.9M
40 37M
41 1.5G
42 6.6M
43 7.4M
Name
1 BRCA-FFPE.Merge_methylation__humanmethylation450__jhu_usc_edu__Level_3__within_bioassay_data_set_function__data.Level_3.2016012800.0.0.tar.gz
2 BRCA-FFPE.Merge_mirnaseq__illuminahiseq_mirnaseq__bcgsc_ca__Level_3__miR_gene_expression__data.Level_3.2016012800.0.0.tar.gz
3 BRCA-FFPE.Merge_mirnaseq__illuminahiseq_mirnaseq__bcgsc_ca__Level_3__miR_isoform_expression__data.Level_3.2016012800.0.0.tar.gz
4 BRCA-FFPE.Merge_snp__genome_wide_snp_6__broad_mit_edu__Level_3__segmented_scna_hg18__seg.Level_3.2016012800.0.0.tar.gz
5 BRCA-FFPE.Merge_snp__genome_wide_snp_6__broad_mit_edu__Level_3__segmented_scna_hg19__seg.Level_3.2016012800.0.0.tar.gz
6 BRCA-FFPE.Merge_snp__genome_wide_snp_6__broad_mit_edu__Level_3__segmented_scna_minus_germline_cnv_hg18__seg.Level_3.2016012800.0.0.tar.gz
7 BRCA-FFPE.Merge_snp__genome_wide_snp_6__broad_mit_edu__Level_3__segmented_scna_minus_germline_cnv_hg19__seg.Level_3.2016012800.0.0.tar.gz
8 BRCA-FFPE.Methylation_Preprocess.Level_3.2016012800.0.0.tar.gz
9 BRCA-FFPE.miRseq_Mature_Preprocess.Level_3.2016012800.0.0.tar.gz
10 BRCA-FFPE.miRseq_Preprocess.Level_3.2016012800.0.0.tar.gz
11 BRCA.Clinical_Pick_Tier1.Level_4.2016012800.0.0.tar.gz
12 BRCA.Merge_Clinical.Level_1.2016012800.0.0.tar.gz
13 BRCA.Merge_cna__illuminahiseq_dnaseqc__hms_harvard_edu__Level_3__segmentation__seg.Level_3.2016012800.0.0.tar.gz
14 BRCA.Merge_methylation__humanmethylation27__jhu_usc_edu__Level_3__within_bioassay_data_set_function__data.Level_3.2016012800.0.0.tar.gz
15 BRCA.Merge_methylation__humanmethylation450__jhu_usc_edu__Level_3__within_bioassay_data_set_function__data.Level_3.2016012800.0.0.tar.gz
16 BRCA.Merge_mirnaseq__illuminaga_mirnaseq__bcgsc_ca__Level_3__miR_gene_expression__data.Level_3.2016012800.0.0.tar.gz
17 BRCA.Merge_mirnaseq__illuminaga_mirnaseq__bcgsc_ca__Level_3__miR_isoform_expression__data.Level_3.2016012800.0.0.tar.gz
18 BRCA.Merge_mirnaseq__illuminahiseq_mirnaseq__bcgsc_ca__Level_3__miR_gene_expression__data.Level_3.2016012800.0.0.tar.gz
19 BRCA.Merge_mirnaseq__illuminahiseq_mirnaseq__bcgsc_ca__Level_3__miR_isoform_expression__data.Level_3.2016012800.0.0.tar.gz
20 BRCA.Merge_protein_exp__mda_rppa_core__mdanderson_org__Level_3__protein_normalization__data.Level_3.2016012800.0.0.tar.gz
21 BRCA.Merge_rnaseq__illuminahiseq_rnaseq__unc_edu__Level_3__exon_expression__data.Level_3.2016012800.0.0.tar.gz
22 BRCA.Merge_rnaseq__illuminahiseq_rnaseq__unc_edu__Level_3__gene_expression__data.Level_3.2016012800.0.0.tar.gz
23 BRCA.Merge_rnaseq__illuminahiseq_rnaseq__unc_edu__Level_3__splice_junction_expression__data.Level_3.2016012800.0.0.tar.gz
24 BRCA.Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes__data.Level_3.2016012800.0.0.tar.gz
25 BRCA.Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.Level_3.2016012800.0.0.tar.gz
26 BRCA.Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_isoforms__data.Level_3.2016012800.0.0.tar.gz
27 BRCA.Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_isoforms_normalized__data.Level_3.2016012800.0.0.tar.gz
28 BRCA.Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__exon_quantification__data.Level_3.2016012800.0.0.tar.gz
29 BRCA.Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__junction_quantification__data.Level_3.2016012800.0.0.tar.gz
30 BRCA.Merge_snp__genome_wide_snp_6__broad_mit_edu__Level_3__segmented_scna_hg18__seg.Level_3.2016012800.0.0.tar.gz
31 BRCA.Merge_snp__genome_wide_snp_6__broad_mit_edu__Level_3__segmented_scna_hg19__seg.Level_3.2016012800.0.0.tar.gz
32 BRCA.Merge_snp__genome_wide_snp_6__broad_mit_edu__Level_3__segmented_scna_minus_germline_cnv_hg18__seg.Level_3.2016012800.0.0.tar.gz
33 BRCA.Merge_snp__genome_wide_snp_6__broad_mit_edu__Level_3__segmented_scna_minus_germline_cnv_hg19__seg.Level_3.2016012800.0.0.tar.gz
34 BRCA.Merge_transcriptome__agilentg4502a_07_3__unc_edu__Level_3__unc_lowess_normalization_gene_level__data.Level_3.2016012800.0.0.tar.gz
35 BRCA.Methylation_Preprocess.Level_3.2016012800.0.0.tar.gz
36 BRCA.Mutation_Packager_Calls.Level_3.2016012800.0.0.tar.gz
37 BRCA.Mutation_Packager_Coverage.Level_3.2016012800.0.0.tar.gz
38 BRCA.Mutation_Packager_Oncotated_Calls.Level_3.2016012800.0.0.tar.gz
39 BRCA.RPPA_AnnotateWithGene.Level_3.2016012800.0.0.tar.gz
40 BRCA.mRNA_Preprocess_Median.Level_3.2016012800.0.0.tar.gz
41 BRCA.mRNAseq_Preprocess.Level_3.2016012800.0.0.tar.gz
42 BRCA.miRseq_Mature_Preprocess.Level_3.2016012800.0.0.tar.gz
43 BRCA.miRseq_Preprocess.Level_3.2016012800.0.0.tar.gz

四、数据下载

1
2
3
4
5
6
7
8
9
> downloadTCGA(
+ cancerTypes = "BRCA",
+ dataSet = "Merge_Clinical.Level_1",
+ destDir = "./"
+ )
trying URL 'http://gdac.broadinstitute.org/runs/stddata__2016_01_28/data/BRCA/20160128/gdac.broadinstitute.org_BRCA.Merge_Clinical.Level_1.2016012800.0.0.tar.gz'
Content type 'application/x-gzip' length 3615793 bytes (3.4 MB)
downloaded 3.4 MB
# 这里是下载到了当前工作目录下,也可选择其他文件夹,当然,首先得新建目的目录,使用dir.create("download_folder")

五、读取数据

1
readTCGA(path, dataType, ...)
  • 本文作者:括囊无誉
  • 本文链接: TCGA/RTCGAbasic/
  • 版权声明: 本博客所有文章均为原创作品,转载请注明出处!
------ 本文结束 ------
坚持原创文章分享,您的支持将鼓励我继续创作!