学习Nature正刊论文中eQTL分析前对基因表达量的预处理

论文

Graph pangenome captures missing heritability and empowers tomato breeding
https://www.nature.com/articles/s41586-022-04808-9
论文中的方法部分写到
RNA-seq data were quantified as transcripts per million(TPM). Genes with TPM values of > 0.5 were defined as expressed. Only genes expressed in at least 100 accessions were retained for the downstream analyses. Raw expression levels were normalized with quantile-quantile normalization. To remove potential batch effects and confounding factors affecting gene expression, the probabilistic estimation residuals method were applied with the top four factors as covariates.
表达量数据可以在论文中提供的链接处下载
读取数据,这里表达量文件还挺大的,可以用data.table这个R包的fread函数读取
library(data.table)
library(tidyverse)

dat<-fread(“D:/Jupyter/practice/WGCNA_nature_tomato/expression_raw_332.txt”) dim(dat) dat[1:6,1:4] 这里的数据行是样本,列是基因,首先做个转置 dat %>%
column_to_rownames(“target_id”) %>%
t() %>%
as.data.frame() -> dat.t
第一步

依据表达量对数据进行过滤,至少在100个样本中TPM值大于0.5
dat.t[rowSums(dat.t > 0.5) >= 100,] -> dat.t.filter
dim(dat.t.filter)
第二步

标准化
preprocessCore::normalize.quantiles(dat.t.filter %>% as.matrix()) %>%
as.data.frame() -> dat.t.filter.norm

colnames(dat.t.filter.norm)<-colnames(dat.t.filter)
rownames(dat.t.filter.norm)<-rownames(dat.t.filter)

dat.t.filter.norm[1:5,1:5]

dat.t.filter.norm %>%
rownames_to_column(“geneid”) %>%
write_tsv(“D:/Jupyter/practice/WGCNA_nature_tomato/dat.t.filter.norm.tsv”)
第三步

运行peer
这个需要用到R 3.5版本的环境
代码内容
library(peer)

args <- commandArgs(trailingOnly = TRUE)

expr<-read.table(args[1],row.names=1,header=TRUE,sep=”\t”)
model = PEER()
PEER_setPhenoMean(model,as.matrix(t(expr)))
PEER_setNk(model,20)
PEER_getNk(model)
PEER_update(model)
factors = as.data.frame(t(PEER_getX(model)))
colnames(factors)<-colnames(expr)
rownames(factors)<-paste0(“peer”,1:20)
write.table(factors,args[2],quote=F,row.names=T,sep=”\t”,col.names=TRUE)
time Rscript run_peer.R dat.t.filter.norm.tsv exp.peer.covar.tsv
这一步需要的时间比较长,运行了167分钟
论文中用top4 factors作为协变量,应该是选择输出结果的前5行就可以了
欢迎大家关注我的公众号
小明的数据分析笔记本

声明:文中观点不代表本站立场。本文传送门:https://eyangzhen.com/416774.html

(0)
联系我们
联系我们
分享本页
返回顶部