引言

❝
复现 Molecular Cell 文章的部分结果。
文章背景：
论文探讨了翻译因子 eIF5A 在翻译延长和终止中的作用。通过利用核糖体剖析和体外生化实验,作者发现 eIF5A 的功能比之前理解的要广泛得多。eIF5A 的耗竭会导致延长和终止过程中的全局性缺陷,核糖体会在许多序列上停滞,而不仅仅是聚脯氨酸结构。作者表明,eIF5A 可以刺激终止过程中的肽基-tRNA 水解,并促进他们在剖析实验中鉴定出的易停滞序列的翻译。这些发现有助于解释 eIF5A 在真核细胞中的必需性和高丰度。

文章数据- GSE89704
研究的是酵母物种。

建立 rRNA 和基因组索引

rRNA 序列可以从 NCBI 等公共数据库中获得，注释文件和基因组从 ensembl 数据库下载：
wget https://ftp.ensembl.org/pub/release-112/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_ceree.R64-1-1.dna.toplevel.fa.gz
wget https://ftp.ensembl.org/pub/release-112/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.112.gtf.gz

make rRNA and genome index

mkdir sac-rRNA-index
bowtie2-build Saccharomyces-cerevisiae-rRNA.fasta sac-rRNA-index/Saccharomyces-cerevisiae-rRNA

gunzip Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa.gz
gunzip Saccharomyces_cerevisiae.R64-1-1.112.gtf.gz

mkdir sac-star-genome-index

STAR –runThreadN 15 \
–runMode genomeGenerate \
–genomeDir sac-star-genome-index/ \
–genomeFastaFiles ./Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa \
–sjdbGTFfile Saccharomyces_cerevisiae.R64-1-1.112.gtf
测序数据下载

❝
用 ascp 快速下载，样本分为 WT，siEIF5A 和 WT，siEIF5A 的高盐处理，每个样本两个生物学重复，共八个样本，单端测序。便于方便可以自己手动修改一下文件名。

!/usr/bin/env bash

ascp -QT -l 300m -P33001 -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR500/004/SRR5008134/SRR5008134.fastq.gz . && mv SRR5008134.fastq.gz SRR5008134_GSM2387080_WT.1_Saccharomyces_cerevisiae_RNA-Seq.fastq.gz
ascp -QT -l 300m -P33001 -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR500/005/SRR5008135/SRR5008135.fastq.gz . && mv SRR5008135.fastq.gz SRR5008135_GSM2387081_WT.2_Saccharomyces_cerevisiae_RNA-Seq.fastq.gz
ascp -QT -l 300m -P33001 -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR500/006/SRR5008136/SRR5008136.fastq.gz . && mv SRR5008136.fastq.gz SRR5008136_GSM2387082_eIF5Ad.1_Saccharomyces_cerevisiae_RNA-Seq.fastq.gz
ascp -QT -l 300m -P33001 -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR500/007/SRR5008137/SRR5008137.fastq.gz . && mv SRR5008137.fastq.gz SRR5008137_GSM2387083_eIF5Ad.2_Saccharomyces_cerevisiae_RNA-Seq.fastq.gz
ascp -QT -l 300m -P33001 -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR500/008/SRR5008138/SRR5008138.fastq.gz . && mv SRR5008138.fastq.gz SRR5008138_GSM2387084_WT.1HS_Saccharomyces_cerevisiae_RNA-Seq.fastq.gz
ascp -QT -l 300m -P33001 -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR533/004/SRR5335874/SRR5335874.fastq.gz . && mv SRR5335874.fastq.gz SRR5335874_GSM2532891_WT.2HS_Saccharomyces_cerevisiae_RNA-Seq.fastq.gz
ascp -QT -l 300m -P33001 -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR533/005/SRR5335875/SRR5335875.fastq.gz . && mv SRR5335875.fastq.gz SRR5335875_GSM2532892_eIF5Ad.2HS_Saccharomyces_cerevisiae_RNA-Seq.fastq.gz
ascp -QT -l 300m -P33001 -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR500/009/SRR5008139/SRR5008139.fastq.gz . && mv SRR5008139.fastq.gz SRR5008139_GSM2387085_eIF5Ad.1HS_Saccharomyces_cerevisiae_RNA-Seq.fastq.gz
去接头

trim

for i in WT.1 WT.2 eIF5Ad.1 eIF5Ad.2 WT.1HS WT.2HS eIF5Ad.1HS eIF5Ad.2HS
do
cutadapt -j 10 -m 15 -M 35 –match-read-wildcards \
-a CTGTAGGCACCATCAAT \
-o 1.trim-data/${i}.trim.fq.gz 0.raw-data/${i}.fastq.gz
done
去除 rRNA

trim rRNA

for i in WT.1 WT.2 eIF5Ad.1 eIF5Ad.2 WT.1HS WT.2HS eIF5Ad.1HS eIF5Ad.2HS
do
bowtie2 -p 20 -x ../index-data/sac-rRNA-index/Saccharomyces-cerevisiae-rRNA \
–un-gz 2.rmrRNA-data/${i}.rmrRNA.fq.gz \
-U 1.trim-data/${i}.trim.fq.gz \
-S 2.rmrRNA-data/null
done
合并生物学重复

为了简化分析，我这里合并了生物学重复样本：

combine data

cat WT.1.rmrRNA.fq.gz WT.2.rmrRNA.fq.gz > WT.rmrRNA.fq.gz
cat eIF5Ad.1.rmrRNA.fq.gz eIF5Ad.2.rmrRNA.fq.gz > eIF5Ad.rmrRNA.fq.gz
cat WT.1HS.rmrRNA.fq.gz WT.2HS.rmrRNA.fq.gz > WTHS.rmrRNA.fq.gz
cat eIF5Ad.1HS.rmrRNA.fq.gz eIF5Ad.2HS.rmrRNA.fq.gz > eIF5AdHS.rmrRNA.fq.gz
比对

使用 STAR 软件比对至基因组：

mapping

for i in WT eIF5Ad WTHS eIF5AdHS
do
STAR –runThreadN 15 \
–outFilterType Normal \
–outWigType wiggle \
–outWigStrand Stranded \
–outWigNorm RPM \
–alignEndsType EndToEnd \
–outFilterMismatchNmax 1 \
–outFilterMultimapNmax 1 \
–genomeDir ../index-data/sac-star-genome-index \
–readFilesCommand gunzip -c \
–readFilesIn 2.rmrRNA-data/${i}.rmrRNA.fq.gz \
–outFileNamePrefix 3.map-data/${i} \
–outSAMtype BAM SortedByCoordinate \
–quantMode TranscriptomeSAM GeneCounts \
–outSAMattributes All
done

sort and index

cd 3.map-data/
for i in WT eIF5Ad WTHS eIF5AdHS
do
samtools sort -@ 15 ${i}Aligned.toTranscriptome.out.bam -o ${i}.sorted.trans.bam
samtools sort -@ 15 ${i}Aligned.sortedByCoord.out.bam -o ${i}.sorted.genome.bam
samtools index -@ 15 ${i}.sorted.trans.bam
done
QC 质检

提取最长转录本信息，并拓展酵母的 UTR 区域：
devtools::install_github(“junjunlab/RiboProfiler”,force = T)

setwd(“junjun_proj/GSE89704_ribo_proj/4.QC-data/”)

library(RiboProfiler)

==============================================================================

1_prepare gene annotation file

==============================================================================

pre_longest_trans_info(gtf_file = “Saccharomyces_cerevisiae.R64-1-1.112.gtf”,
out_file = “longest_info.txt”)

extend CDS

df_extend <- gene_anno_extend(longest_trans_file = “longest_info.txt”,
upstream_extend = 50,
downstream_extend = 50,
output_file = “longest_info_extend.txt”)
提取 reads 的 QC 信息：

==============================================================================

2_prepare Ribo QC data

==============================================================================

bam_file = list.files(“../3.map-data”,pattern = “.genome.bam$”,full.names = TRUE)
bam_file

1] “../3.map-data/eIF5Ad.sorted.genome.bam” “../3.map-data/eIF5AdHS.sorted.genome.bam”

[3] “../3.map-data/WT.sorted.genome.bam” “../3.map-data/WTHS.sorted.genome.bam”

sample_name <- sapply(strsplit(list.files(“../3.map-data/”,pattern = “.sorted.genome.bam$”),
split = “.sorted.genome.bam”), “[“,1)
sample_name

[1] “eIF5Ad” “eIF5AdHS” “WT” “WTHS”

perform QC analysis

pre_qc_data(longest_trans_file = “longest_info_extend.txt”,
bam_file = bam_file,
out_file = paste(sample_name,”.qc.txt”,sep = “”),
mapping_type = “genome”,
seq_type = “singleEnd”)

load qc data

qc_df <- load_qc_data()
片段大小分布情况：
qc_df$length <- factor(qc_df$length)

length distribution 9×9

qc_plot(qc_data = qc_df,type = “length”,
facet_wrap_list = list(ncol = 2))

对框情况：

length with frame

qc_plot(qc_data = qc_df,type = “length_frame”,
facet_wrap_list = list(ncol = 2))

功能分区占比：

region features 7×9

qc_plot(qc_data = qc_df,type = “feature”,
facet_wrap_list = list(ncol = 2))

整体的对框占比：

all frames

qc_plot(qc_data = qc_df,type = “frame”,
facet_wrap_list = list(ncol = 2))

28nt 的片段对框不错，展示 metaplot：

relative to start/stop site 5×10

rel_to_start_stop(qc_data = qc_df %>% dplyr::filter(length == 28),
type = “relst”,
shift = 0,
facet_wrap_list = list(ncol = 2,scales = “free”))

rel_to_start_stop(qc_data = qc_df %>% dplyr::filter(length == 28),
type = “relsp”,
shift = 0,
facet_wrap_list = list(ncol = 2,scales = “free”))

检测片段的 offset

offset check

PsiteOffsetCheck(qc_data = qc_df)
$summary_offset
sample readLengths Offsets bamFiles bamLegends
1 eIF5Ad 27, 28, 29, 30, 31, 32 -8, -12, -10, -11, -15, -10 eIF5Ad eIF5Ad
7 eIF5AdHS 27, 28, 29, 30, 31, 32 -8, 0, -10, -11, 0, -10 eIF5AdHS eIF5AdHS
13 WT 27, 28, 29, 30, 31, 32 -11, -12, -10, 4, 0, -10 WT WT
19 WTHS 27, 28, 29, 30, 31, 32 -11, -12, -10, 4, 0, -10 WTHS WTHS

$plot

metaplot

这是文章里的：

提取数据并绘图，结果和上图类似：

==============================================================================

calculate Ribo density

==============================================================================

pre_ribo_density_data(bam_file = bam_file,
out_file = paste(sample_name,”.density.txt”,sep = “”))

==============================================================================

Coordinate transformation

transform genomic coordinate into transcriotomic coordinate

==============================================================================

pre_gene_trans_density(gene_anno = “longest_info_extend.txt”,
density_file = c(paste(sample_name,”.density.txt”,sep = “”)),
out_file = c(paste(sample_name,”.trans.txt”,sep = “”)))

==============================================================================

metagene plot

==============================================================================

prepare data

pre_metagene_data(gene_anno = “longest_info_extend.txt”,
density_file = paste(sample_name,”.trans.txt”,sep = “”),
out_file = paste(sample_name,”.st.meta.txt”,sep = “”),
mapping_type = “genome”,
mode = “st”)

pre_metagene_data(gene_anno = “longest_info_extend.txt”,
density_file = paste(sample_name,”.trans.txt”,sep = “”),
out_file = paste(sample_name,”.sp.meta.txt”,sep = “”),
mapping_type = “genome”,
mode = “sp”)

load data

meta_df <- load_metagene_data(group_name = rep(c(“sp”,”st”),4))
meta_df$group <- factor(meta_df$group,levels = c(“st”,”sp”))
meta_df$sample <- factor(meta_df$sample,levels = c(“WT”,”eIF5Ad”,”WTHS”,”eIF5AdHS”))

plot

library(ggplot2)

7×9

ggplot(meta_df) +
geom_line(aes(x = pos,y = density),linewidth = 1,color = “#663366”) +
geom_hline(yintercept = 1,lty = “dashed”,color = “red”,linewidth = 0.75) +
theme_bw() +
theme(axis.text = element_text(colour = “black”),
panel.grid = element_blank(),
strip.text = element_text(face = “bold.italic”,size = rel(1))) +
ggh4x::facet_grid2(sample~group,scales = “free_x”,independent = “x”) +
ylab(“Relative footprint density (AU)”) +
xlab(“Distance from start/stop codon (codon)”)

计算 polarity score

这是文章里的：

计算并绘图：
density_file <- list.files(“2.density-data/”,pattern = “.trans.txt$”,full.names = TRUE)

calculate polarity score

df <- calculatePolarity(gene_anno_file = “longest_info_extend.txt”,
gtf_file = “Saccharomyces_cerevisiae.R64-1-1.112.gtf”,
input_file = bam_file,
density_file = density_file,
group = c(“Normal”,”HS”,”Normal”,”HS”))

plot

polarity_score_plot(polarity_score_df = df,
facet = T) +
scale_color_brewer(palette = “Set2”)

gene track

一些基因发生了核糖体的停滞，在消减 EIF5A 的条件下：

提取基因并绘图：

==============================================================================

5_load denisty and coverage data for specific gene

==============================================================================

track_df <- load_track_data(ribo_file = paste(sample_name,”.trans.txt”,sep = “”),
sample_name = sample_name,
gene_list = c(“CHC1″,”THO1″,”PCL6”))

track plot

track_plot(signal_data = track_df,
gene_anno = “longest_info_extend.txt”,
sample_order = c(“WT”,”eIF5Ad”,”WTHS”,”eIF5AdHS”),
remove_trans_panel_border = T,
show_ribo_only = T)

在特定三肽基序位置的影响

❝
含有 polyproline motifs 的基因，这类基因中包含连续的 2 个或 3 个脯氨酸密码子(Pro-Pro 或 Pro-Pro-Pro) 。在这些基因中,当 eIF5A 被耗竭时,会出现强烈的翻译暂停,并在暂停位点上游积累大量核糖体。下面是文章里的：

提取 CDS 序列：
FetchSequence(gene_file = “longest_info_extend.txt”,
genome_file = “./Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa”,
output_file = “sac_cds_sequences.fa”,
type = “cds”,
coding_type = “NT”)
把注释文件转换成 RiboMiner 格式的：
gene_anno2Ribominer(longest_trans_file = “longest_info_extend.txt”,
output_file = “longest_info_extend.new.txt”)
计算：
pip install RiboMiner

ribosome density at each tri-AA motif

mkdir 5.motif_density
RiboDensityAroundTripleteAAMotifs -f attributes.txt \
-c longest_info_extend.new.txt \
-o 5.motif_density/ -M RPKM \
-l 100 -n 10 –table 1 \
-F sac_cds_sequences.fa \
–type2 PPP –type1 PP
读取输出数据画图：
library(tidyverse)
library(ggplot2)

motif <- read.delim(“./5.motif_density/_motifDensity_dataframe.txt”)
motif$pos <- c(c(-50:50),c(-50:50))

motif <- motif %>%
reshape2::melt(id.vars = c(“motif”,”pos”),
value.name = “density”,
variable.name = “sample”) %>%
mutate(group = ifelse(sample %in% c(“si.WT”,”si.eIF5Ad”),”non-HS”,”HS”))

ggplot(motif) +
geom_line(aes(x = pos,y = density,color = sample)) +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_grid(motif~group,scales = “free”)

结尾

❝
路漫漫其修远兮,吾将上下而求索。
欢迎加入生信交流群。加我微信我也拉你进微信群聊老俊俊生信交流群 (微信交流群需收取 20 元入群费用,一旦交费,拒不退还!(防止骗子和便于管理)) 。QQ 群可免费加入, 记得进群按格式修改备注哦。

声明：文中观点不代表本站立场。本文传送门：https://eyangzhen.com/420459.html

Molecular Cell 文章部分结果复现

make rRNA and genome index

!/usr/bin/env bash

trim

trim rRNA

combine data

mapping

sort and index

setwd(“junjun_proj/GSE89704_ribo_proj/4.QC-data/”)

==============================================================================

1_prepare gene annotation file

==============================================================================

extend CDS

==============================================================================

2_prepare Ribo QC data

==============================================================================

1] “../3.map-data/eIF5Ad.sorted.genome.bam” “../3.map-data/eIF5AdHS.sorted.genome.bam”

[3] “../3.map-data/WT.sorted.genome.bam” “../3.map-data/WTHS.sorted.genome.bam”

[1] “eIF5Ad” “eIF5AdHS” “WT” “WTHS”

perform QC analysis

load qc data

length distribution 9×9

length with frame

region features 7×9

all frames

relative to start/stop site 5×10

offset check

==============================================================================

calculate Ribo density

==============================================================================

==============================================================================

Coordinate transformation

transform genomic coordinate into transcriotomic coordinate

==============================================================================

==============================================================================

metagene plot

==============================================================================

prepare data

load data

plot

7×9

calculate polarity score

plot

==============================================================================

5_load denisty and coverage data for specific gene

==============================================================================

track plot

ribosome density at each tri-AA motif

作者专栏