TCGA学习笔记5-差异表达分析

前面,我们已经对KIRC的611个CASE的表达数据进行了预处理,现在,我们可以开始进行基因的差异表达分析了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
> library(DESeq2)
> d.des <- DESeqDataSetFromMatrix(data_select,
+ colData = colData,
+ design = ~group)
> View(d.des)
> res <- DESeq(d.des)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 795 genes
-- DESeq argument 'minReplicatesForReplace' = 7
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing

> result <- results(res)
> class(result)
[1] "DESeqResults"
attr(,"package")
[1] "DESeq2"
> dim(result)
[1] 18129 6
> head(result)
log2 fold change (MLE): group cancer vs normal
Wald test p-value: group cancer vs normal
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE
<numeric> <numeric> <numeric>
ENSG00000000419 1451.58517917849 -0.0533634770734555 0.0523193279028319
ENSG00000000457 758.009368176777 -0.360582717778673 0.0418677088475957
ENSG00000000460 221.763750558644 0.61271651131726 0.0607242024231328
ENSG00000000938 1056.54189255002 1.50798734334441 0.0999113699084657
ENSG00000000971 5355.66857772629 0.701185856702199 0.172116557494952
ENSG00000001036 4735.26426651376 0.0468730754150524 0.0584416346104715
stat pvalue padj
<numeric> <numeric> <numeric>
ENSG00000000419 -1.01995723592938 0.307748742642885 0.339965691022659
ENSG00000000457 -8.61243014494163 7.1527882737753e-18 2.24580704217652e-17
ENSG00000000460 10.0901532974906 6.10740651462378e-24 2.50173271685147e-23
ENSG00000000938 15.0932505952622 1.79382718013598e-51 2.14797179317603e-50
ENSG00000000971 4.07390123825107 4.62320932058293e-05 7.25789416114027e-05
ENSG00000001036 0.802049356207667 0.422524408432647 0.45690098422162
# 上面生成的result,就是这18129个基因的表达数据经差异分析后的结果,下面需要进一步的筛选

> result_select <- result[result$pvalue<0.05&abs(result$log2FoldChange)>1,] # 筛选pvalue<0.05且FoldChange>2的基因
> class(result_select)
[1] "DESeqResults"
attr(,"package")
[1] "DESeq2"
> dim(result_select)
[1] 4913 6 # 筛选之后,基因数目从18129个减少至4913个
> head(result_select)
log2 fold change (MLE): group cancer vs normal
Wald test p-value: group cancer vs normal
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat
<numeric> <numeric> <numeric> <numeric>
ENSG00000000938 1056.54189255002 1.50798734334441 0.0999113699084657 15.0932505952622
ENSG00000001617 6403.79587078303 1.13804574850297 0.0897072677370336 12.6862157014861
ENSG00000001630 218.029634030269 -1.15446295077517 0.120053363145966 -9.61624831260688
ENSG00000002586 9712.04932748132 1.26513626330331 0.0853396109411807 14.8247249940627
ENSG00000002746 240.436190572231 -3.72546669707377 0.224130353161834 -16.6218749246506
ENSG00000002933 29346.9573200702 1.27343989833302 0.125910512841406 10.1138488724688
pvalue padj
<numeric> <numeric>
ENSG00000000938 1.79382718013598e-51 2.14797179317603e-50
ENSG00000001617 7.05112262109708e-37 4.97005451002601e-36
ENSG00000001630 6.82760868085803e-22 2.56374726129402e-21
ENSG00000002586 1.01394301829956e-49 1.13889547575916e-48
ENSG00000002746 4.83951527252971e-62 8.09368748853239e-61
ENSG00000002933 4.7962181780376e-24 1.97615089431008e-23

# 为了作图方便,可以进一步计算出上调与下调的基因
> result_up <- result[result$pvalue<0.05&result$log2FoldChange>1,]
> dim(result_up)
[1] 3432 6 # 上调的基因有3432个
> result_down <- result[result$pvalue<0.05&result$log2FoldChange< -1,]
> dim(result_down)
[1] 1481 6 # 下调的基因有1481个

# 下面开始作火山图
plot(result$log2FoldChange,-log10(result$pvalue),xlab="Log2FoldChange",ylab="-Log10(pvalue)",pch=16,cex=0.6)
points(result_up$log2FoldChange,-log10(result_up$pvalue),col="red",pch=16,cex=0.6)
points(result_down$log2FoldChange,-log10(result_down$pvalue),col="blue",pch=16,cex=0.6)
abline(v=c(-1,1),col="#636363",lwd=2,lty=2)
abline(h=1.3,col="#636363",lwd=2,lty=2)
text(-7.5,260,"n=1481")
text(7.5,260,"n=3432")

  • 本文作者:括囊无誉
  • 本文链接: TCGA/TCGA5Diff/
  • 版权声明: 本博客所有文章均为原创作品,转载请注明出处!
------ 本文结束 ------
坚持原创文章分享,您的支持将鼓励我继续创作!