引言
❝
在 Molecular Cell 文章部分结果复现(续) 中我们计算了一些在 eIF5A 缺失条件下一些核糖体占据明显增多的三肽基序。
❝
接下来我们计算核糖体的 P/E 位点距离这些三肽基序密度的分布情况,下面是图中结果:
安装
install.packages(“devtools”)
devtools::install_github(“junjunlab/RiboProfiler”)
or
remotes::install_github(“junjunlab/RiboProfiler”)
library(RiboProfiler)
测试
adjust_offset 函数默认矫正后是 P 位点的位置:
test
offset <- PsiteOffsetCheck(qc_data = qc_df,read_length = c(28,28))
offset$plot
offset$summary_offset
sample readLengths Offsets bamFiles bamLegends
1 eIF5Ad 28 -12 eIF5Ad eIF5Ad
2 WT 28 -12 WT WT
offset_df <- adjust_offset(offset_df = offset$summary_offset,
qc_df = qc_df)
norm_df <- get_nomalized_counts(longest_trans_file = “longest_info_extend.txt”,
qc_df = offset_df)
❝
计算距离 PPP 基序的相对距离的密度分布, min_counts 筛选 counts 数大于等于 64 的,保持和文章中一致,motif 指定我们的基序,upstream/downstream 则是基序上下游的距离 :
rel_dist_motif(amino_file = “./sac_cds_aa.fa”,
longest_trans_file = “longest_info_extend.txt”,
normed_file = norm_df,
min_counts = 64,
motif = “PPP”,
upstream = -50,
downstream = 50)
筛选原理就是匹配指定的氨基酸并返回位置,筛选在指定位置范围里的目标,然后把 density 储存起来,下面是代码:
#
2_load longtest transcript anno file
#
with open(normed_file,’r’) as input:
for line in input:
# fileds = line.split()
# print(line.split())
,,,,,,trans_pos,trans_id,counts,,,exp = line.split()
if trans_id in gene_info:
seq = str(amino[trans_id])
# search motif patterns for amino acids
# pattern = re.compile("PPP",flags=re.IGNORECASE)
pattern = re.compile(motif, flags=re.IGNORECASE)
pos_res = pattern.finditer(seq)
# motif codon position
codon_motif_pos = [pos.start() + 1 for pos in pos_res]
# check whether has target motif
if len(codon_motif_pos) > 0:
# motif codon position to nt position
motif_nt_pos = [pos*3- 2 for pos in codon_motif_pos]
# calculate relative distance to motif
utr5_len = gene_info[trans_id][0]
rel_pos_to_motif = [int(trans_pos) - utr5_len - nt_pos for nt_pos in motif_nt_pos]
# filter rel_pos in limited range (-50,50)
rel_pos_to_motif_in_range = [i for i in rel_pos_to_motif if i >= int(upstream) and i <= int(downstream)]
# save in dict
for j in rel_pos_to_motif_in_range:
if j in rel_dist:
rel_dist[j][0] += 1
rel_dist[j][1] += float(exp)
# rel_dist[j][1] += int(counts)
normalize density | Average ribosome occupancy
sum_counts = sum([val[1] for key,val in rel_dist.items()])
average_exp = sum_counts/len(range(int(upstream),int(downstream + 1)))
rel_dist_norm = {key:val[1]/average_exp for key,val in rel_dist.items()}
然后会在 relative_distance_motif 文件夹下产生这些文件:
然后画图:
library(ggplot2)
plot
files <- c(“relative_distance_motif/WT_PPP_motif_occupancy.txt”,
“relative_distance_motif/eIF5Ad_PPP_motif_occupancy.txt”)
samples <- c(“WT”,”eIF5Ad”)
rel_dist_motif_plot(motif_occupancy_file = files,
sample_name = samples,
motif = “PPP”,
site = “P”)
❝
如果我们要算 E site 的话,adjust_offset 函数指定一下 shift 参数再向左移动 3nt 即可,A site 则向右移动 3nt:
E site
offset_df <- adjust_offset(offset_df = offset$summary_offset,
qc_df = qc_df,shift = -3)
A site
offset_df <- adjust_offset(offset_df = offset$summary_offset,
qc_df = qc_df,shift = 3)
同样的我们计算 PP 二肽基序的看看:
rel_dist_motif(amino_file = “./sac_cds_aa.fa”,
longest_trans_file = “longest_info_extend.txt”,
normed_file = norm_df,
min_counts = 64,
motif = “PP”,
upstream = -50,
downstream = 50)
files <- c(“relative_distance_motif/WT_PP_motif_occupancy.txt”,
“relative_distance_motif/eIF5Ad_PP_motif_occupancy.txt”)
samples <- c(“WT”,”eIF5Ad”)
rel_dist_motif_plot(motif_occupancy_file = files,
sample_name = samples,
motif = “PP”,
site = “P”)
tri-peptide 计算
前面我讲过这个问题:
❝
仔细想想,我们海油一种更综合的算法,就是争对特定的一个氨基酸位置,可分为向两边取,向前取两位和向后取两位三种方式,我们都纳入进来,就可以把氨基酸长度不是 3 的倍数的序列都容纳进来了。
with open(codon_exp_file,’r’) as input:
for line in input:
trans_id,codon_pos,exp = line.split()
# # get tripeptide position
# if int(codon_pos)%3 == 0:
# pos = int(codon_pos)//3
# else:
# pos = int(int(codon_pos)//3) + 1
# # check transid in AA fasta file
# if trans_id in amino.keys():
# tripeptide_motif = amino.fetch(trans_id,(pos*3 - 2,pos*3))
# # make sure length is 3
# if len(tripeptide_motif) == 3:
# # get tripeptide and save in dict
# if tripeptide_motif in tripeptide_seq:
# tmp_exp,tmp_count = tripeptide_seq[tripeptide_motif]
# tripeptide_seq[tripeptide_motif] = [float(exp) + tmp_exp,tmp_count + 1]
# else:
# tripeptide_seq[tripeptide_motif] = [float(exp),1]
# check transid in AA fasta file (revised code)!
if trans_id in amino.keys():
# 3nt window to fetch tri-peptide motif
type1 = amino.fetch(trans_id,(int(codon_pos) - 1,int(codon_pos) + 1))
type2 = amino.fetch(trans_id,(int(codon_pos) - 2,int(codon_pos)))
type3 = amino.fetch(trans_id,(int(codon_pos),int(codon_pos) + 2))
tripeptide_motif_list = [type1, type2, type3]
# filter motif length is 3
tripeptide_motif_list_filterd = [i for i in tripeptide_motif_list if len(i) == 3]
# get tripeptide and save in dict
for motif in tripeptide_motif_list_filterd:
if motif in tripeptide_seq:
tmp_exp,tmp_count = tripeptide_seq[motif]
tripeptide_seq[motif] = [float(exp) + tmp_exp,tmp_count + 1]
else:
tripeptide_seq[motif] = [float(exp),1]
再分析看看:
calculate
peptide_motif_score(amino_file = “sac_cds_aa.fa”,
codon_exp_file = c(“codon_occupancy/WT_codon_pos_exp.txt”,
“codon_occupancy/eIF5Ad_codon_pos_exp.txt”),
sample_name = c(“WT”,”eIF5Ad”),
occurrence_threshold = 100)
plot
triAmino_scater_plot(occupancy_file = c(“peptide_motif/WT_tripeptide_occupancy.txt”,
“peptide_motif/eIF5Ad_tripeptide_occupancy.txt”),
sampe_name = c(“WT”,”eIF5Ad”),
mark_motif = c(“DDP”,”DPP”,”PPP”,”PDK”,”PDG”,
“GPP”,”PDP”,”DNP”,”DGP”,”SPP”))
结尾
❝
路漫漫其修远兮,吾将上下而求索。
欢迎加入生信交流群。加我微信我也拉你进 微信群聊 老俊俊生信交流群 (微信交流群需收取 20 元入群费用,一旦交费,拒不退还!(防止骗子和便于管理)) 。QQ 群可免费加入, 记得进群按格式修改备注哦。
声明:文中观点不代表本站立场。本文传送门:https://eyangzhen.com/420878.html