引言

❝
上次推文 rgt-hint 转录因子预测结果可视化留下个问题, 就是绘制 footprint 曲线时,无法把下面的 motif 绘制出来, 因为软件没有输出类似的位置权重矩阵(pwm) 。仔细想了一下, 这次我们来解决这个问题。
测试

结果解读:
我们在做完 rgt-motifanalysis matching 分析后,产生的文件在 match 文件夹下,内容如下,第四列就是预测到的转录因子:
(chip) [laojunjun@cloud 6.footprint-data]$ head ./match/CTL_mpbs.bed
chr1 756871 756882 MA1628.1.Zic1::Zic2 11.801793739070824 +
chr1 756887 756899 MA1602.1.ZSCAN29 13.982648767774405 +
chr1 756922 756934 MA1602.1.ZSCAN29 13.982648767774405 +
chr1 762540 762552 MA0039.4.KLF4 14.937741792469735 –
chr1 762541 762552 MA0493.1.Klf1 10.875777015291094 –
chr1 762541 762551 MA0599.1.KLF5 13.508886663843134 –
chr1 762687 762697 MA0048.2.NHLH1 10.565938430193414 +
chr1 762687 762697 MA0048.2.NHLH1 11.529766662529898 –
chr1 762711 762722 MA0076.2.ELK4 13.336245664270239 +
chr1 762682 762692 MA0092.1.Hand1::Tcf3 11.40795144811544 +
既然结果文件是预测到的转录因子,那么前面的位置肯定就是对应的转录因子结合的序列信息,我们就可以自己提取序列然后绘制 motif 了。下面我来做这样的事情。
提取序列:
library(BSgenome.Hsapiens.UCSC.hg19)
library(ChIPpeakAnno)
library(ggseqlogo)
library(aplot)
library(ggplot2)
library(ggforce)

mtf <- data.table::fread(“../match/CTL_mpbs.bed”)

mtf <- rtracklayer::import.bed(“../match/CTL_mpbs.bed”)

filter target TF

my_TF <- “MA0039.4.KLF4”

TFtarget <- subset(mtf,name == my_TF)

TFtarget_ed <- resize(TFtarget,width = 200,fix = “center”)

subtract sequence for TF

mtf_seq <- getAllPeakSequence(myPeakList = TFtarget_ed,
upstream = 0,downstream = 0,
genome = BSgenome.Hsapiens.UCSC.hg19)
我们先把 footprinting 折线图绘制出来:

footprinting line plot

footprint <- ggplot(ft_df %>%
dplyr::filter(TF == “MA0039.4.KLF4”),
aes(x = pos,y = Occupancy,color = group)) +
geom_line() +
theme_bw() +
theme(panel.grid = element_blank(),
strip.background = element_rect(fill = “grey90”),
legend.background = element_blank(),
strip.text = element_text(face = “bold.italic”,size = rel(1)),
legend.position = c(0.1,0.8),
axis.text = element_text(colour = “black”)) +
facet_wrap(~TF)

footprint

现在根据提取的序列绘制 motif:

plot sequence(motif)

denovo_mt <-
ggseqlogo(mtf_seq$sequence,seq_type = “dna”) +
theme_bw() +
theme(panel.grid = element_blank(),
axis.text.x = element_blank())

denovo_mt

可以看到和软件输出的默认结果是一样的。

我们拼个图:

combine plot

cowplot::plot_grid(plotlist = list(denovo_mt,footprint),ncol = 1)

也许你会发现中间结合主要的 motif 显得非常拥挤,我们可以进行放大来更好的查看是哪些 motif:

zoom for center motif

denovo_mt <-
ggseqlogo(mtf_seq$sequence,seq_type = “dna”) +
theme_bw() +
theme(panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()) +
facet_zoom(xlim = c(100 – 10,100 + 10),
zoom.size = 0.5)

combine plot

cowplot::plot_grid(plotlist = list(denovo_mt,footprint),ncol = 1)

这个图就完美多了!
结尾

❝
路漫漫其修远兮,吾将上下而求索。
欢迎加入生信交流群。加我微信我也拉你进微信群聊老俊俊生信交流群 (微信交流群需收取 20 元入群费用,一旦交费,拒不退还!(防止骗子和便于管理)) 。QQ 群可免费加入, 记得进群按格式修改备注哦。

声明：文中观点不代表本站立场。本文传送门：https://eyangzhen.com/415591.html

rgt-hint 转录因子预测结果可视化(续)

mtf <- data.table::fread(“../match/CTL_mpbs.bed”)

filter target TF

subtract sequence for TF

footprinting line plot

plot sequence(motif)

combine plot

zoom for center motif

combine plot

作者专栏

老俊俊的生信笔记