Cell Reports 文献结果复现

引言


摘要: 这项研究调查了泛素结合酶 Rad6 在调节氧化应激期间翻译的作用。研究人员使用核糖体测序(Ribo-seq)和 RNA 测序来表征野生型(WT)和 rad6Δ 酵母细胞在氧化应激条件下的翻译景观。他们发现,氧化应激会导致核糖体在特定的氨基酸序列上暂停,特别是那些含有 异亮氨酸-脯氨酸(XIP)序列的序列, 这种情况需要 Rad6 的参与。这种 “氧化还原暂停” 特征在缺乏 Rad6 的情况下基本消失,表明 Rad6 介导的 泛素化 是这种翻译调节所需的。进一步分析表明,Rad6 影响翻译的方式独立于核糖体相关质量控制(RQC)通路,并导致整合应激反应(ISR)通路的激活发生变化。总的来说,这项研究揭示了 Rad6 如何在应对氧化应激时重塑翻译景观的关键机制。

机制图:


我们尝试分析一下这篇文章的数据。
安装

install.packages(“devtools”)

devtools::install_github(“junjunlab/RiboProfiler”)

or

remotes::install_github(“junjunlab/RiboProfiler”)

library(RiboProfiler)
下载数据

直接 ascp 下载,下载后修改一下文件名称,sra-exploer 下载一下放后台就行了:

!/usr/bin/env bash

ascp -QT -l 300m -P33001 -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR236/023/SRR23617123/SRR23617123.fastq.gz . && mv SRR23617123.fastq.gz SRR23617123_GSM7062915_SUB280_WT_rep1_SM061F_Saccharomyces_cerevisiae_OTHER.fastq.gz
ascp -QT -l 300m -P33001 -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR236/021/SRR23617121/SRR23617121.fastq.gz . && mv SRR23617121.fastq.gz SRR23617121_GSM7062917_SUB280_WT_peroxide_rep1_SM062F_Saccharomyces_cerevisiae_OTHER.fastq.gz
ascp -QT -l 300m -P33001 -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR236/020/SRR23617120/SRR23617120.fastq.gz . && mv SRR23617120.fastq.gz SRR23617120_GSM7062918_SUB280_WT_peroxide_rep2_SM072F_Saccharomyces_cerevisiae_OTHER.fastq.gz

去接头

文章上游数据处理部分:

他这里是含有 UMI 序列的,文章没有给代码,找了该作者以前发的文章才了解测序结构:

下面是文章里提供的信息:

注意作者上传 GEO 里面是这样描写的,注意红框内容:

接着就可以去接头了:
for i in SUB280_WT_rep1_F SUB280_WT_rep2_F SUB280_WT_peroxide_rep1_F SUB280_WT_peroxide_rep2_F SUB280_rad6_rep1_F SUB280_rad6_rep2_F SUB280_rad6_peroxide_rep1_F SUB280_rad6_peroxide_rep2_F SUB280_hel2_F SUB280_hel2_peroxide_F SUB280_rad6_RAD6_F SUB280_rad6_RAD6_peroxide_F SUB280_rad6_RAD6_C88A_F SUB280_rad6_RAD6_C88A_peroxide_F S288C_WT_F S288C_WT_peroxide_F S288C_rad6_F S288C_rad6_peroxide_F SUB280_WT_rep1_Fd SUB280_WT_rep2_Fd SUB280_WT_peroxide_rep1_Fd SUB280_WT_peroxide_rep2_Fd SUB280_rad6_Fd SUB280_rad6_peroxide_Fd
do
cutadapt -j 20 -u 2 -u -5 -o ./${i}_trim.fq.gz ../1.raw-data/${i}.fastq.gz
done
去除 rRNA

去 NCBI 等相关网站下载酵母的 rRNA 序列,然后用 bowtie2 建索引,然后就可以去除了:

trim rRNA

for i in SUB280_WT_rep1_F SUB280_WT_rep2_F SUB280_WT_peroxide_rep1_F SUB280_WT_peroxide_rep2_F SUB280_rad6_rep1_F SUB280_rad6_rep2_F SUB280_rad6_peroxide_rep1_F SUB280_rad6_peroxide_rep2_F SUB280_hel2_F SUB280_hel2_peroxide_F SUB280_rad6_RAD6_F SUB280_rad6_RAD6_peroxide_F SUB280_rad6_RAD6_C88A_F SUB280_rad6_RAD6_C88A_peroxide_F S288C_WT_F S288C_WT_peroxide_F S288C_rad6_F S288C_rad6_peroxide_F SUB280_WT_rep1_Fd SUB280_WT_rep2_Fd SUB280_WT_peroxide_rep1_Fd SUB280_WT_peroxide_rep2_Fd SUB280_rad6_Fd SUB280_rad6_peroxide_Fd
do
bowtie2 -p 20 -x ../../index-data/sac-rRNA-index/Saccharomyces-cerevisiae-rRNA \
–un-gz ${i}.rmrRNA.fq.gz \
-U ../2.trim-data/${i}_trim.fq.gz \
-S ./null
done
比对到基因组

去 ensembl 网站下载酵母的基因组和注释文件建立索引:
STAR –runThreadN 15 \
–runMode genomeGenerate \
–genomeDir sac-star-genome-index/ \
–genomeFastaFiles ./Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa \
–sjdbGTFfile Saccharomyces_cerevisiae.R64-1-1.112.gtf
你也可以去专门酵母的网站( https://www.yeastgenome.org/ ):

选择 download:

比对:

map to genome

for i in SUB280_WT_rep1_F SUB280_WT_rep2_F SUB280_WT_peroxide_rep1_F SUB280_WT_peroxide_rep2_F SUB280_rad6_rep1_F SUB280_rad6_rep2_F SUB280_rad6_peroxide_rep1_F SUB280_rad6_peroxide_rep2_F SUB280_hel2_F SUB280_hel2_peroxide_F SUB280_rad6_RAD6_F SUB280_rad6_RAD6_peroxide_F SUB280_rad6_RAD6_C88A_F SUB280_rad6_RAD6_C88A_peroxide_F S288C_WT_F S288C_WT_peroxide_F S288C_rad6_F S288C_rad6_peroxide_F SUB280_WT_rep1_Fd SUB280_WT_rep2_Fd SUB280_WT_peroxide_rep1_Fd SUB280_WT_peroxide_rep2_Fd SUB280_rad6_Fd SUB280_rad6_peroxide_Fd
do
STAR –runThreadN 20 \
–outFilterType Normal \
–alignEndsType EndToEnd \
–outFilterMismatchNmax 1 \
–outFilterMultimapNmax 1 \
–genomeDir ../../index-data/sac-star-genome-index \
–readFilesCommand gunzip -c \
–readFilesIn ../3.rmrRNA-data/${i}.rmrRNA.fq.gz \
–outFileNamePrefix ./${i} \
–outSAMtype BAM SortedByCoordinate \
–quantMode GeneCounts \
–outSAMattributes All
done
riboseq 质量控制分析


和我们之前介绍的 Ribo-SEQ 数据质量评估 一样,好的数据质量才好往下进行分析。整体上看,这篇文章的测序质量还是非常不错的,非常符合 ribo-seq 的特征指标。
准备注释文件

酵母UTR很少,对 CDS 两端延长 50nt 的长度作为 UTR:

devtools::install_github(“junjunlab/RiboProfiler”,force = T)

library(RiboProfiler)

==============================================================================

1_prepare gene annotation file

==============================================================================

pre_longest_trans_info(gtf_file = “../../index-data/Saccharomyces_cerevisiae.R64-1-1.112.gtf”,
out_file = “longest_info.txt”)

extend CDS

df_extend <- gene_anno_extend(longest_trans_file = “longest_info.txt”,
upstream_extend = 50,
downstream_extend = 50,
output_file = “longest_info_extend.txt”)
获得质控数据

这里我们保持和文章一致,使用 read 的 3‘end 来进行分析,因为文章测了 monosome(单核糖体) 和 disome(二核糖体) 两种测序,我们把数据分开来保存一下:

==============================================================================

2_prepare Ribo QC data

==============================================================================

bam_file = list.files(“../4.map-data/”,pattern = “.bam$”,full.names = TRUE)
bam_file

sample_name <- sapply(strsplit(list.files(“../4.map-data/”,pattern = “.bam$”),
split = “Aligned.sortedByCoord.out.bam”), “[“,1)
sample_name

[1] “S288C_rad6_F” “S288C_rad6_peroxide_F”

[3] “S288C_WT_F” “S288C_WT_peroxide_F”

[5] “SUB280_hel2_F” “SUB280_hel2_peroxide_F”

error files

16 “SUB280_rad6_rep2_F” 21 “SUB280_WT_rep1_F”

perform QC analysis

pre_qc_data(longest_trans_file = “longest_info_extend.txt”,
bam_file = bam_file,
out_file = paste(sample_name,”.qc.txt”,sep = “”),
mapping_type = “genome”,
assignType = “end3”,
seq_type = “singleEnd”)

load qc data

qc_df <- load_qc_data()

qc_df$length <- factor(qc_df$length)

filter monosome and disome

disome <- qc_df %>%
dplyr::filter(endsWith(sample,”_Fd”))

save data

save(disome,file = “disome.rda”)

monosome <- qc_df %>%
dplyr::filter(!endsWith(sample,”_Fd”))

save data

save(monosome,file = “monosome.rda”)
质控指标

monosome 和 disome 的我们都看看:

length distribution 8×20

qc_plot(qc_data = monosome,type = “length”,
facet_wrap_list = list(ncol = 6))

5×12

qc_plot(qc_data = disome,type = “length”,
facet_wrap_list = list(ncol = 3))

length with frame 8×20

qc_plot(qc_data = monosome,
type = “length_frame”,
facet_wrap_list = list(ncol = 6))

5×12

qc_plot(qc_data = disome,
type = “length_frame”,
facet_wrap_list = list(ncol = 3))

region features 10×20

qc_plot(qc_data = monosome,
type = “feature”,
facet_wrap_list = list(ncol = 6))

5×20

qc_plot(qc_data = disome,
type = “feature”,
facet_wrap_list = list(ncol = 6))

relative to start/stop site 8×20

rel_to_start_stop(qc_data = monosome,
type = “relst”,
facet_wrap_list = list(ncol = 6,scales = “free”))

5×12

rel_to_start_stop(qc_data = disome,
type = “relst”,
dist_range = c(-50,250),
facet_wrap_list = list(ncol = 3,scales = “free”))

8×20

rel_to_start_stop(qc_data = monosome,
type = “relsp”,
facet_wrap_list = list(ncol = 6,scales = “free”))

5×12

rel_to_start_stop(qc_data = disome,
type = “relsp”,
dist_range = c(-250,50),
facet_wrap_list = list(ncol = 3,scales = “free”))

offset 检测

看看不同长度片段的 offset 情况:

==============================================================================

offset check

==============================================================================

offset check

offset_momo <- PsiteOffsetCheck(qc_data = monosome,
relative_distance = c(-25,25),
read_length = c(25,34))
offset_momo$plot
offset_momo$summary_offset

sample readLengths Offsets

1 S288C_rad6_F 25, 26, 27, 28, 29, 30, 31, 32, 33, 34 15, 15, 15, 15, 15, 15, 15, 21, 23, 25

11 S288C_rad6_peroxide_F 25, 26, 27, 28, 29, 30, 31, 32, 33, 34 15, 15, 15, 15, 15, 15, 25, 17, 23, 25

21 S288C_WT_F 25, 26, 27, 28, 29, 30, 31, 32, 33, 34 15, 15, 15, 15, 15, 15, 25, 21, 23, 25

offset_disome <- PsiteOffsetCheck(qc_data = disome,
relative_distance = c(-10,70),
read_length = c(57,63))
offset_disome$plot

计算校正和标准化的数据


根据文章说明,对特定长度的 reads 保留分析,使用 15nt 的 offset 来校正:

==============================================================================

offset_momo <- PsiteOffsetCheck(qc_data = monosome,
relative_distance = c(-25,25),
read_length = c(25,34))

offset_momo_shift <- offset_momo$summary_offset
offset_momo_shift$Offsets <- c(“15,15,15,15,15,15,15,15,15,15”)

adjust offset

adjusted_offset_mono <- adjust_offset(offset_df = offset_momo_shift,
qc_df = monosome)

get normalized data

mono_norm_df <- get_nomalized_counts(longest_trans_file = “longest_info_extend.txt”,
qc_df = adjusted_offset_mono,
exclude_upstreamCodon = 5,
exclude_downstreamCodon = 5)

save data

save(mono_norm_df,file = “mono_norm_df.rda”)

==============================================================================

ajust offset with reads for disosome

load(“disome.rda”)

offset_disome <- PsiteOffsetCheck(qc_data = disome,
relative_distance = c(-10,70),
read_length = c(57,63))

offset_disome_shift <- offset_disome$summary_offset
offset_disome_shift$Offsets <- c(“15,15,15,15,15,15,15”)

adjust offset

adjusted_offset_diso <- adjust_offset(offset_df = offset_disome_shift,
qc_df = disome)

get normalized data

diso_norm_df <- get_nomalized_counts(longest_trans_file = “longest_info_extend.txt”,
qc_df = adjusted_offset_diso,
exclude_upstreamCodon = 5,
exclude_downstreamCodon = 5)

save data

save(diso_norm_df,file = “diso_norm_df.rda”)
metagene plot


作者发现缺失 RAD6 以后在氧化应激的条件下并不明显影响核糖体在UTR 区域的堆积。文章里的图可以看到 reads 应该是没有 shift 的,所以应该用没有 shift 的数据来分析。

Because oxidative stress induced by peroxide was previously associated with increased translation of 5′ and 3′ untranslated regions (UTRs),7,13 we tested whether Rad6 impacts translation outside of main open reading frames. Metagene analysis, performed by averaging data from genes aligned by their start or stop codons, showed modest changes in the occupancy of 5’and 3‘ UTRs with peroxide treatment in WT cells, as expected (Figures S1F–S1G). The absence of Rad6 did not affect these trends (Figures S1F–S1G), suggesting that Rad6 does not strongly impact translation of UTRs under oxidative stress

此外计算 ribosome occupancy 的时候需要排除前后 15nt 的影响,所以参数 exclude_upstreamCodon 和 exclude_downstreamCodon 设置为 3:

计算:
qc_df <- load_qc_data()

filter monosome and disome

mono <- qc_df %>%
dplyr::filter(sample %in% c(“SUB280_WT_rep1_F”,”SUB280_WT_rep2_F”,
“SUB280_WT_peroxide_rep1_F”,”SUB280_WT_peroxide_rep2_F”,
“SUB280_rad6_rep1_F”,”SUB280_rad6_rep2_F”,
“SUB280_rad6_peroxide_rep1_F”,”SUB280_rad6_peroxide_rep2_F”)) %>%
dplyr::filter(length %in% 25:34)

get normalized data

mono_norm_df <- get_nomalized_counts(longest_trans_file = “longest_info_extend.txt”,
qc_df = mono,
exclude_upstreamCodon = 3,
exclude_downstreamCodon = 3)

add group name

mono_norm_df <- mono_norm_df %>%
dplyr::mutate(group = dplyr::case_when(sample %in% c(“SUB280_WT_rep1_F”,”SUB280_WT_rep2_F”) ~ “WT”,
sample %in% c(“SUB280_WT_peroxide_rep1_F”,”SUB280_WT_peroxide_rep2_F”) ~ “WT_peroxide”,
sample %in% c(“SUB280_rad6_rep1_F”,”SUB280_rad6_rep2_F”) ~ “Δrad6”,
sample %in% c(“SUB280_rad6_peroxide_rep1_F”,”SUB280_rad6_peroxide_rep2_F”) ~ “Δrad6_peroxide”))

save(mono_norm_df,file = “mono_norm_df_noshift.rda”)
load(“mono_norm_df_noshift.rda”)
relative to start codon

plot WT WT_peroxide 3×6

metagene_plot(longest_trans_file = “longest_info_extend.txt”,
normed_file = mono_norm_df %>%
dplyr::filter(group %in% c(“WT”,”WT_peroxide”)),
min_counts = 64,
min_cds_length = 600,
relative_distance = c(-20,80),
collapse = T,
mode = “nt”) +
ggplot2::scale_color_manual(values = c(“WT” = “black”,”WT_peroxide” = “#0099CC”))

plot Δrad6 Δrad6_peroxide 3×6

metagene_plot(longest_trans_file = “longest_info_extend.txt”,
normed_file = mono_norm_df %>%
dplyr::filter(group %in% c(“Δrad6″,”Δrad6_peroxide”)),
min_counts = 64,
min_cds_length = 600,
relative_distance = c(-20,80),
collapse = T,
mode = “nt”) +
ggplot2::scale_color_manual(values = c(“Δrad6” = “black”,”Δrad6_peroxide” = “#993399”))

relative to stop codon
metagene_plot(longest_trans_file = “longest_info_extend.txt”,
normed_file = mono_norm_df %>%
dplyr::filter(group %in% c(“WT”,”WT_peroxide”)),
min_counts = 64,
min_cds_length = 600,
type = “sp”,
relative_distance = c(-60,40),
collapse = T,
mode = “nt”) +
ggplot2::scale_color_manual(values = c(“WT” = “black”,”WT_peroxide” = “#0099CC”)) +
ggplot2::ylim(0,15)

metagene_plot(longest_trans_file = “longest_info_extend.txt”,
normed_file = mono_norm_df %>%
dplyr::filter(group %in% c(“Δrad6″,”Δrad6_peroxide”)),
min_counts = 64,
min_cds_length = 600,
type = “sp”,
relative_distance = c(-60,40),
collapse = T,
mode = “nt”) +
ggplot2::scale_color_manual(values = c(“Δrad6” = “black”,”Δrad6_peroxide” = “#993399”)) +
ggplot2::ylim(0,15)

gene track plot


作者发现在 RAD6 缺失后,氧化应激下在某些特定三肽基序 XIP 位点的核糖体停滞现象发生了丢失,文章举例了 SKI2 和 PCF11 两个基因,图里标注了暂停的基序:

track plot:
load(“mono_norm_df.rda”)

filter samples

track_data <- mono_norm_df %>%
dplyr::filter(sample %in% c(“SUB280_WT_peroxide_rep1_F”,”SUB280_WT_peroxide_rep2_F”,
“SUB280_rad6_peroxide_rep1_F”,”SUB280_rad6_peroxide_rep2_F”))

track_df <- get_track_df(longest_trans_file = “longest_info_extend.txt”,
select_gene = c(“SKI2″,”PCF11”),
normed_file = track_data)

5×10

track_plot(signal_data = track_df,
show_ribo_only = T,
gene_anno = “longest_info_extend.txt”)


我们可以检查一下最高的位置处的基序:
tmp <- track_df %>% dplyr::filter(gene_name == “PCF11”)


最高的位置处为 1140 位置,减去 50nt 5’UTR,然后 除以 3 等于 364 处的 codon,然后加上末尾 8 个空字符,最终是 372,可以看到刚好是 KIP 的 I 基序处:

tri amino acid pause score


这里我尝试使用之前的方式来计算 计算 Ribosome 距离二肽/三肽基序距离的密度 发现结果和文章相差还挺大的,仔细看了文章,发现算的方法不一样,只能自己按照文章的思路计算看看了。思路大概是先计算每个位置处的 rpm,然后计算这个位置上下游 50nt 左右的 average reads,这个位置处相邻的 EPA 位置所代表的基序就是 pause score。然后筛选次数大于 100 次的来分析,去除噪音。

(D) Schematics for calculation of pause score, computed by dividing reads at a motif by the average reads in a window around the motif of interest (±50 nt). We calculated pause score at tri-amino acid motifs that map to the ribosomal E (penultimate position of the nascent peptide), P, and A sites.


这也是文章的第一张主图,通过 Ribo-seq 测序发现缺失 RAD6 后,在某些 XIP 基序氧化应激条件下的核糖体停滞现象消失或者明显减弱了。

大概是这样的:

#

4_get motif pause score in window

#

pause_score = {}
with open(input_file,’r’) as input:
for line in input:
,,,,,,trans_pos,trans_id,counts,,,exp = line.split()
utr5,cds,_ = gene_info[trans_id]
cds_pos = int(trans_pos) – utr5

    # exclude first and last 50nt
    if cds_pos >= window + 1 and cds_pos <= cds - window:
        nt_exp = exp_whole_trans[trans_id]
        window_exp = nt_exp[(cds_pos - window - 1):(cds_pos + window)]

        # normalize average reads in window
        average_mean = sum(window_exp)/(window + window + 1)
        normalized_window_exp = [i/average_mean for i in window_exp]

        # get EPA tri amino acid score
        average_sum_epa = sum(normalized_window_exp[(window - 3 - 1):(window + 5)])

        # calculate codon position and get tri-amino acid motif
        if cds_pos%3 == 0:
            codon_pos = int(cds_pos/3)
        else:
            codon_pos = cds_pos//3 + 1

        motif = amino.fetch(trans_id,(codon_pos - 1,codon_pos + 1))

        # save in dict
        if motif in pause_score:
            tmp_exp,tmp_count = pause_score[motif]
            pause_score[motif] = [average_sum_epa + tmp_exp,tmp_count + 1]
        else:
            pause_score[motif] = [average_sum_epa,1]

准备氨基酸序列:

==============================================================================

get sequence

==============================================================================

library(RiboProfiler)

extract amino acids sequnce

FetchSequence(gene_file = “longest_info.txt”,
genome_file = “Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa”,
output_file = “sac_amino_acid.fa”,
type = “cds”,
coding_type = “AA”)

extract cds sequnce

FetchSequence(gene_file = “longest_info.txt”,
genome_file = “Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa”,
output_file = “sac_cds.fa”,
type = “cds”,
coding_type = “NT”)
然后我们使用 peptide_motif_score2 函数来计算:
load(“mono_norm_df.rda”)

unique(mono_norm_df$sample)

filter samples

ft_data <- mono_norm_df %>%
dplyr::filter(sample %in% c(“SUB280_WT_rep1_F”,”SUB280_WT_rep2_F”,
“SUB280_WT_peroxide_rep1_F”,”SUB280_WT_peroxide_rep2_F”,
“SUB280_rad6_rep1_F”,”SUB280_rad6_rep2_F”,
“SUB280_rad6_peroxide_rep1_F”,”SUB280_rad6_peroxide_rep2_F”,
“SUB280_rad6_RAD6_F”,”SUB280_rad6_RAD6_peroxide_F”,
“SUB280_rad6_RAD6_C88A_F”,”SUB280_rad6_RAD6_C88A_peroxide_F”,
“S288C_WT_F”,”S288C_WT_peroxide_F”,
“S288C_rad6_F”,”S288C_rad6_peroxide_F”))

calculate pause score for motif

peptide_motif_score2(amino_file = “sac_amino_acid.fa”,
normed_exp_file = ft_data,
longest_trans_file = “longest_info_extend.txt”,
norm_type = “normed_count”,
window = 50,
occurrence_threshold = 100)

S288C_rad6_F has been processed!

S288C_rad6_peroxide_F has been processed!


然后绘图可视化(感觉没有文章里的那么明显和点那么分散):

plot

library(patchwork)

p1 <-
triAmino_scater_plot(occupancy_file = c(“normalized_count_data/SUB280_WT_rep1_F_tripeptide_occupancy.txt”,
“normalized_count_data/SUB280_WT_rep2_F_tripeptide_occupancy.txt”,
“normalized_count_data/SUB280_WT_peroxide_rep1_F_tripeptide_occupancy.txt”,
“normalized_count_data/SUB280_WT_peroxide_rep2_F_tripeptide_occupancy.txt”),
sample_name = c(“WT-rep1″,”WT-rep2″,”WT-peroxide-rep1″,”WT-peroxide-rep2”),
group_name = c(“WT”,”WT”,”WT + peroxide”,”WT + peroxide”),
mark_motif = c(“KIP”,”MIP”,”QIP”,”KIH”,”EIH”,
“RKK”,”PPD”,”PPE”),
pcol = “#0099CC”,
mark_color = “black”)

p2 <-
triAmino_scater_plot(occupancy_file = c(“normalized_count_data/SUB280_rad6_rep1_F_tripeptide_occupancy.txt”,
“normalized_count_data/SUB280_rad6_rep2_F_tripeptide_occupancy.txt”,
“normalized_count_data/SUB280_rad6_peroxide_rep1_F_tripeptide_occupancy.txt”,
“normalized_count_data/SUB280_rad6_peroxide_rep2_F_tripeptide_occupancy.txt”),
sample_name = c(“rad6-rep1″,”rad6-rep2″,”rad6-peroxide-rep1″,”rad6-peroxide-rep2”),
group_name = c(“rad6″,”rad6″,”rad6 + peroxide”,”rad6 + peroxide”),
mark_motif = c(“KIP”,”MIP”,”QIP”,”KIH”,”EIH”,
“RKK”,”PPD”,”PPE”),
pcol = “#993399”,
mark_color = “black”)

4×8

p1 | p2

回补型

RAD6 是泛素连接酶,为了验证氧化应激下的核糖体暂停是否与它的泛素连接功能相关,作者对 RAD6 泛素功能区域进行了突变,发现在缺失的条件下,只有回补野生型的 RAD6 而非突变型 才能恢复氧化应激导致的核糖体停滞现象。作者还在不同菌种(S288C)同样发现了类似的现象,表面现象的普适性。

p1 <-
triAmino_scater_plot(occupancy_file = c(“normalized_count_data/SUB280_rad6_RAD6_F_tripeptide_occupancy.txt”,
“normalized_count_data/SUB280_rad6_RAD6_peroxide_F_tripeptide_occupancy.txt”),
sample_name = c(“rad6_RAD6″,”rad6_RAD6_peroxide”),
mark_motif = c(“KIP”),
pcol = “#003366”,
mark_color = “black”)

p2 <-
triAmino_scater_plot(occupancy_file = c(“normalized_count_data/SUB280_rad6_RAD6_C88A_F_tripeptide_occupancy.txt”,
“normalized_count_data/SUB280_rad6_RAD6_C88A_peroxide_F_tripeptide_occupancy.txt”),
sample_name = c(“rad6_RAD6_C88A”,”rad6_RAD6_C88A_peroxide”),
mark_motif = c(“KIP”),
pcol = “#993399”,
mark_color = “black”)

4×8

p1 | p2


D 图在 S288C 里标记了 Pro 氨基酸结尾的基序(即 A 位点)橙色点,自定义一下绘图函数:
triAmino_scater_plot2 <- function(occupancy_file = NULL, sample_name = NULL, group_name = NULL, mark_motif = NULL, pcol = “grey”, mark_color = “orange”){ … df_plot_pro <- df_plot_wide %>%
dplyr::filter(endsWith(motif,”P”))

# plot
ggplot(df_plot_wide) +

geom_point(data = df_plot_pro,aes(x = .data[[var[1]]],y = .data[[var[2]]]),
color = “orange”) +

}

p1 <-
triAmino_scater_plot2(occupancy_file = c(“normalized_count_data/S288C_WT_F_tripeptide_occupancy.txt”,
“normalized_count_data/S288C_WT_peroxide_F_tripeptide_occupancy.txt”),
sample_name = c(“S288C_WT”,”S288C_WT_peroxide”),
mark_motif = c(“KIP”),
pcol = “#3399CC”,
mark_color = “black”)

p2 <-
triAmino_scater_plot2(occupancy_file = c(“normalized_count_data/S288C_rad6_F_tripeptide_occupancy.txt”,
“normalized_count_data/S288C_rad6_peroxide_F_tripeptide_occupancy.txt”),
sample_name = c(“S288C_rad6″,”S288C_rad6_peroxide”),
mark_motif = c(“KIP”),
pcol = “#FF33CC”,
mark_color = “black”)

4×8

p1 | p2

relative distance to tri-peptide motif


作者发现 PA 位点主要是 异亮氨酸-脯氨酸,E 位点则不保守,所以定义了 XIP 基序,X 为任意氨基酸。然后分析了 reads 在 XIP 富集的富集情况,包括野生型,RAD6 缺失型,及回补 RAD6 和突变的 RAD6 的分析。观察图形可以发现,reads 也是没有经过 offset 校正的,分析的时候要注意一下。

对数据进行标准化:
load(“monosome.rda”)

monosome <- monosome %>%
dplyr::mutate(length = as.numeric(as.character(length))) %>%
dplyr::filter(length %in% 25:34)

get normalized data

mono_norm_df <- get_nomalized_counts(longest_trans_file = “longest_info_extend.txt”, qc_df = monosome, exclude_upstreamCodon = 5, exclude_downstreamCodon = 5) ❝ 计算 reads 和 XIP 的相对距离,注意 X 是任意氨基酸,但是我们也不能写 20 种分别去寻找然后整合数据,这时候可以使用正则表达式的通配符,指定 motif = “.IP” 即可匹配到所有类型的 IP 结尾的基序: rel_dist_motif(amino_file = “sac_amino_acid.fa”, longest_trans_file = “longest_info_extend.txt”, normed_file = mono_norm_df %>%
dplyr::filter(sample %in% c(“SUB280_WT_rep1_F”,”SUB280_WT_rep2_F”,
“SUB280_WT_peroxide_rep1_F”,”SUB280_WT_peroxide_rep2_F”,
“SUB280_rad6_rep1_F”,”SUB280_rad6_rep2_F”,
“SUB280_rad6_peroxide_rep1_F”,”SUB280_rad6_peroxide_rep2_F”,
“SUB280_rad6_RAD6_F”,”SUB280_rad6_RAD6_peroxide_F”,
“SUB280_rad6_RAD6_C88A_F”,”SUB280_rad6_RAD6_C88A_peroxide_F”)),
min_counts = 64,
motif = “.IP”,
upstream = -50,
downstream = 50)
拿到数据了就可以绘图可视化了:

plot

library(patchwork)

p1 <-
rel_dist_motif_plot(motif_occupancy_file = c(“relative_distance_motif/SUB280_WT_rep1_F_.IP_motif_occupancy.txt”,
“relative_distance_motif/SUB280_WT_rep2_F_.IP_motif_occupancy.txt”,
“relative_distance_motif/SUB280_WT_peroxide_rep1_F_.IP_motif_occupancy.txt”,
“relative_distance_motif/SUB280_WT_peroxide_rep2_F_.IP_motif_occupancy.txt”),
sample_name = c(“WT-rep1″,”WT-rep2”,
“WT-peroxide-rep1″,”WT-peroxide-rep2”),
group_name = c(“WT”,”WT”,”WT + peroxide”,”WT + peroxide”),
motif = “XIP”,
site = “3′ end”) +
scale_color_manual(values = c(“WT” = “black”,”WT + peroxide” = “#3399CC”))

p2 <-
rel_dist_motif_plot(motif_occupancy_file = c(“relative_distance_motif/SUB280_rad6_rep1_F_.IP_motif_occupancy.txt”,
“relative_distance_motif/SUB280_rad6_rep2_F_.IP_motif_occupancy.txt”,
“relative_distance_motif/SUB280_rad6_peroxide_rep1_F_.IP_motif_occupancy.txt”,
“relative_distance_motif/SUB280_rad6_peroxide_rep2_F_.IP_motif_occupancy.txt”),
sample_name = c(“rad6-rep1″,”rad6-rep2”,
“rad6-peroxide-rep1″,”rad6-peroxide-rep2”),
group_name = c(“rad6″,”rad6″,”rad6 + peroxide”,”rad6 + peroxide”),
motif = “XIP”,
site = “3′ end”) +
scale_color_manual(values = c(“rad6” = “black”,”rad6 + peroxide” = “#FF33CC”)) +
ylim(0,10)

p1 / p2

看看回补野生型和突变型 RAD6 的情况:
p1 <-
rel_dist_motif_plot(motif_occupancy_file = c(“relative_distance_motif/SUB280_rad6_RAD6_F_.IP_motif_occupancy.txt”,
“relative_distance_motif/SUB280_rad6_RAD6_peroxide_F_.IP_motif_occupancy.txt”),
sample_name = c(“rad6_RAD6″,”rad6_RAD6_peroxide”),
motif = “XIP”,
site = “3′ end”) +
scale_color_manual(values = c(“rad6_RAD6” = “black”,”rad6_RAD6_peroxide” = “#3399CC”)) +
ylim(0,8)

p2 <-
rel_dist_motif_plot(motif_occupancy_file = c(“relative_distance_motif/SUB280_rad6_RAD6_C88A_F_.IP_motif_occupancy.txt”,
“relative_distance_motif/SUB280_rad6_RAD6_C88A_peroxide_F_.IP_motif_occupancy.txt”),
sample_name = c(“rad6_RAD6_C88A”,”rad6_RAD6_C88A_peroxide”),
motif = “XIP”,
site = “3′ end”) +
scale_color_manual(values = c(“rad6_RAD6_C88A” = “black”,”rad6_RAD6_C88A_peroxide” = “#FF33CC”)) +
ylim(0,8)

p1 / p2

motif logo


文章第一张主图对氧化应激处理上升的 motif 还进行了可视化,使用的是 pLogo(https://plogo.uconn.edu/#) 工具,作者图注里写了选择 1.5 倍的,然后输入 foreground 和 backgroud 可视化。这里我们的复现结果和文章还是有区别的,这里就不分析了,注意介绍一下怎么使用。进入网页后你可以选择上传的 foreground 和 backgroud 序列,中间部分有简短的操作教程,点击 generate pLogo 即可生成图形。

disome-seq analysis


前面的分析都是基于 monsome 的,作者还测了 disome 的数据,同样的方法我们也来分析看看。先看看文章为什么要做这个东西。之前的研究表明,E3 泛素连接酶 Hel2 可以通过泛素化碰撞的核糖体来触发 RQC 通路,从而促进核糖体的解离和重新利用。作者进一步在 hel2Δ 突变株中进行了 Ribo-seq 实验,发现在氧化应激条件下,XIP 序列的”氧化应激诱导的核糖体暂停”现象仍然存在。这表明,Rad6 介导的核糖体暂停并不依赖于 RQC 通路,而是通过一种独立的机制来调控转录过程。

注意左下那个图还是 monosome 的,做了一个 hel2 缺失的实验。
Δhel2 pause score

hel2 monosome data

load(“mono_norm_df.rda”)

filter samples

ft_data <- mono_norm_df %>%
dplyr::filter(sample %in% c(“SUB280_hel2_F”,”SUB280_hel2_peroxide_F”))

calculate pause score for motif

peptide_motif_score2(amino_file = “sac_amino_acid.fa”,
normed_exp_file = ft_data,
longest_trans_file = “longest_info_extend.txt”,
norm_type = “normed_count”,
window = 50,
occurrence_threshold = 100)

SUB280_hel2_F has been processed!

SUB280_hel2_peroxide_F has been processed!

plot

triAmino_scater_plot(occupancy_file = c(“normalized_count_data/SUB280_hel2_F_tripeptide_occupancy.txt”,
“normalized_count_data/SUB280_hel2_peroxide_F_tripeptide_occupancy.txt”),
sample_name = c(“hel2″,”hel2_peroxide”),
mark_motif = c(“KIP”,”MIP”,”QIP”,”KIH”,”EIH”,”RKK”,”PPD”,”PPE”),
pcol = “#006600”,
mark_color = “black”)

disome-seq pause score
先计算 pause score:
load(“diso_norm_df.rda”)

unique(diso_norm_df$sample)

calculate pause score for motif

peptide_motif_score2(amino_file = “sac_amino_acid.fa”,
normed_exp_file = diso_norm_df,
longest_trans_file = “longest_info_extend.txt”,
norm_type = “normed_count”,
window = 50,
occurrence_threshold = 100)
然后绘图,可以看到有些基序没有找到:

plot

library(patchwork)

p1 <-
triAmino_scater_plot(occupancy_file = c(“normalized_count_data/SUB280_WT_rep1_Fd_tripeptide_occupancy.txt”,
“normalized_count_data/SUB280_WT_rep2_Fd_tripeptide_occupancy.txt”,
“normalized_count_data/SUB280_WT_peroxide_rep1_Fd_tripeptide_occupancy.txt”,
“normalized_count_data/SUB280_WT_peroxide_rep2_Fd_tripeptide_occupancy.txt”),
sample_name = c(“WT-rep1″,”WT-rep2″,”WT-peroxide-rep1″,”WT-peroxide-rep2”),
group_name = c(“WT”,”WT”,”WT + peroxide”,”WT + peroxide”),
mark_motif = c(“KIP”,”MIP”,”QIP”,”KIH”,”EIH”,”RKK”,”PPD”,”PPE”),
pcol = “#0099CC”,
mark_color = “black”)

p2 <-
triAmino_scater_plot(occupancy_file = c(“normalized_count_data/SUB280_rad6_Fd_tripeptide_occupancy.txt”,
“normalized_count_data/SUB280_rad6_peroxide_Fd_tripeptide_occupancy.txt”),
sample_name = c(“rad6″,”rad6_peroxide”),
mark_motif = c(“KIP”,”MIP”,”QIP”,”KIH”,”EIH”,”RKK”,”PPD”,”PPE”),
pcol = “#9933CC”,
mark_color = “black”)

4×8

p1 | p2

relative to XIP motif
计算:

relative distance

load(“disome.rda”)

disome <- disome %>%
dplyr::mutate(length = as.numeric(as.character(length))) %>%
dplyr::filter(length %in% 57:63)

get normalized data

disome_norm_df <- get_nomalized_counts(longest_trans_file = “longest_info_extend.txt”,
qc_df = disome,
exclude_upstreamCodon = 5,
exclude_downstreamCodon = 5)

calculate relative distance density

rel_dist_motif(amino_file = “sac_amino_acid.fa”,
longest_trans_file = “longest_info_extend.txt”,
normed_file = disome_norm_df,
min_counts = 64,
motif = “.IP”,
upstream = -50,
downstream = 50)
绘图:

plot

library(patchwork)

p1 <-
rel_dist_motif_plot(motif_occupancy_file = c(“relative_distance_motif/SUB280_WT_rep1_Fd_.IP_motif_occupancy.txt”,
“relative_distance_motif/SUB280_WT_rep2_Fd_.IP_motif_occupancy.txt”,
“relative_distance_motif/SUB280_WT_peroxide_rep1_Fd_.IP_motif_occupancy.txt”,
“relative_distance_motif/SUB280_WT_peroxide_rep2_Fd_.IP_motif_occupancy.txt”),
sample_name = c(“WT-rep1″,”WT-rep2”,
“WT-peroxide-rep1″,”WT-peroxide-rep2”),
group_name = c(“WT”,”WT”,”WT + peroxide”,”WT + peroxide”),
motif = “XIP”,
site = “3′ end”) +
scale_color_manual(values = c(“WT” = “black”,”WT + peroxide” = “#3399CC”)) +
labs(title = “Disome seq”)

p2 <-
rel_dist_motif_plot(motif_occupancy_file = c(“relative_distance_motif/SUB280_rad6_Fd_.IP_motif_occupancy.txt”,
“relative_distance_motif/SUB280_rad6_peroxide_Fd_.IP_motif_occupancy.txt”),
sample_name = c(“rad6″,”rad6-peroxide”),
motif = “XIP”,
site = “3′ end”) +
scale_color_manual(values = c(“rad6” = “black”,”rad6-peroxide” = “#FF33CC”)) +
labs(title = “Disome seq”) +
ylim(0,13)

5X7

p1 / p2


最后就差不多结束了,有些地方其实并不是那么满意。
结尾


路漫漫其修远兮,吾将上下而求索。
欢迎加入生信交流群。加我微信我也拉你进 微信群聊 老俊俊生信交流群 (微信交流群需收取 20 元入群费用,一旦交费,拒不退还!(防止骗子和便于管理)) 。

声明:文中观点不代表本站立场。本文传送门:https://eyangzhen.com/421321.html

(0)
联系我们
联系我们
分享本页
返回顶部