https://www.sciencedirect.com/science/article/pii/S0092867420306188
Pan-Genome of Wild and Cultivated Soybeans
大豆基因组数据下载链接
library(tidyverse)
read_tsv(“cell.soybean.PanGenome/Orthogroups.GeneCount.tsv”) %>%
dplyr::select(-Total) %>%
column_to_rownames(“Orthogroup”) %>%
mutate(across(everything(),~ifelse(.>0,1,0))) %>%
write_delim(file = “cell.soybean.PanGenome/cell.soybean.PanGP.input”,
delim = “”,
col_names = FALSE)
PanGP输入数据的部分截图
每行是一个基因家族,每列是一个样本,1代表这个样本里有这个基因家族,0代表这个样本里没有这个基因家族
输入数据类型选择0/1矩阵
算法选择 total random
sample size选择200
sample repeat 选择4
这个最快,如果样本量比较大的话运行相对也比较慢
直接出一个图
在file选项里可以把作图数据导出
在 analyze选项里可以拟合方程
它的拟合泛基因家族用到的是幂函数,核心基因家族用到的是指数函数拟合
导出数据自己用R语言作图
导出的数据格式截图
先拟合一下数据
核心和泛基因家族都用指数函数来拟合
library(tidyverse)
read_tsv(“cell.soybean.PanGenome/Orthogroups.GeneCount.tsv”) %>%
dplyr::select(-Total) %>%
column_to_rownames(“Orthogroup”) %>%
mutate(across(everything(),~ifelse(.>0,1,0))) %>%
write_delim(file = “cell.soybean.PanGenome/cell.soybean.PanGP.input”,
delim = “”,
col_names = FALSE)
library(ggtrendline)
PanGP.dat<-read_tsv(“cell.soybean.PanGenome/PanGenomeData.txt”) PanGP.dat %>% colnames()
sample.size.x<-PanGP.dat %>% pull(GenomeNum)
pan.y<-PanGP.dat %>% pull(Pan-Genome Size
)
core.y<-PanGP.dat %>% pull(Core Genome Size
)
ggtrendline(sample.size.x,pan.y,
model = “exp3P”,eSize = 5,
eq.x = 20,
eq.y = 52000,
rrp.x = 20,
rrp.y = 50000)+
theme_bw()+
theme(panel.grid = element_blank())+
labs(x=”Sample size”,y=”Pan gene family”) -> p1
ggtrendline(sample.size.x,core.y,
model = “exp3P”,eSize = 5,
eq.x = 20,
eq.y = 32000,
rrp.x = 20,
rrp.y = 30000)+
theme_bw()+
theme(panel.grid = element_blank())+
labs(x=”Sample size”,y=”Core gene family”) -> p2
library(patchwork)
p1+p2
做箱线图的代码
PanGP.dat %>%
pivot_longer(!GenomeNum) %>%
mutate(GenomeNum=factor(GenomeNum,levels=1:27)) %>%
ggplot(aes(x=GenomeNum,y=value))+
geom_boxplot(aes(fill=name,color=name),
outliers = FALSE)+
theme_bw(base_size = 15)+
theme(panel.grid = element_blank(),
legend.position = c(0.7,0.5))+
scale_fill_manual(values = c(“#f07571″,”#4ea5d6”),
name=NULL)+
scale_color_manual(values = c(“#f07571″,”#4ea5d6”),
name=NULL)+
geom_line(data = data.frame(x=seq(1,27,by=0.1)) %>%
mutate(y=-14916exp(-0.152x)+58428),
aes(x=x,y=y),
lty=”dashed”,
color=”grey”)+
geom_line(data = data.frame(x=seq(1,27,by=0.1)) %>%
mutate(y=18675exp(-0.0865x)+22056),
aes(x=x,y=y),
lty=”dashed”,
color=”grey”)+
scale_x_discrete(breaks=c(seq(1,27,3),27))+
labs(x=”Sample number”,
y=”Family number”)
欢迎大家关注我的公众号
小明的数据分析笔记本
声明:文中观点不代表本站立场。本文传送门:https://eyangzhen.com/423799.html