R语言包regioneR可以用来统计检验两个基因组特征是否相关

看到这个R包的论文
Variation in abundance of predicted resistance genes in the Brassica oleracea pangenome
方法部分写到
The R-package regioneR v1.8 (Gel et al., 2016; R Core Team, 2016) was used to test resistance genes and genes exhibiting PAV and transposable elements for association using 500 permutations. For PAV association, the evaluation function numOverlaps was used to check whether the number of gene overlaps is higher than expected. For TE association, the evaluation function meanDistance was used since we do not expect TEs to overlap with RGA candidates due to repeats having been masked during the annotation process
帮助文档
https://bioconductor.org/packages/release/bioc/vignettes/regioneR/inst/doc/regioneR.html
文档里介绍的第一个小例子
Gene promoter regions are GC rich and there are many CpG islands that lie inside promoters. However, is there a statistically significant association between them? Do CpG islands overlap with promoters more than one would expect by chance?
我理解的的大体的意思是 CpG 岛和基因的启动子区互相有重叠,这个重叠的概率是否比随机的大,这个方法会给出一个显著性的P值
需要准备的输入数据
cpg岛的bed文件 (这个文件里不能有表头,需要把表头去掉)

图片

image.png
启动子区的bed文件

image.png
染色体长度的bed文件

image.png
代码

BiocManager::install(“regioneR”)

library(regioneR)

cpgHMM <- toGRanges(“hg19.cpg01.bed”)
cpgHMM
promoters <- toGRanges(“hg19_promoters.bed”)
promoters
cpgHMM <- filterChromosomes(cpgHMM, organism=”hg”, chr.type=”canonical”)
promoters <- filterChromosomes(promoters, organism=”hg”, chr.type=”canonical”)

hg19<-toGRanges(“hg19.tsv”)

cpgHMM_2K <- sample(cpgHMM, 2000)

pt <- overlapPermTest(cpgHMM_2K, promoters,
ntimes=1000,
genome=hg19,
count.once=TRUE)
pt

plot(pt)
会得到一个p值,然后得到一个图

image.png
这个图怎么看暂时没搞懂
示例数据和代码可以给推文打赏一元获取

声明:文中观点不代表本站立场。本文传送门:https://eyangzhen.com/411903.html

(0)
联系我们
联系我们
分享本页
返回顶部