跟着Nature学数据分析:SV数据构建进化树(NJ树)

论文
From genotype to phenotype with 1,086 near telomere-to-telomere yeast genomes

https://www.nature.com/articles/s41586-025-09637-0

数据和代码
https://zenodo.org/records/15698884

https://github.com/HaploTeam/1086YeastGenomes/tree/main

论文中对这部分的方法描述

Neighbour-joining trees were constructed independently from SNPs and SV matrices (1,474,884 and 6,587 markers, respectively, for 1,086 isolates) using the R packages ape and SNPRelate.

分析代码
我用 smoove_filtered.vcf 数据做示例,需要这个数据文件的话可以留言

第一步是 plink将vcf文件转换为bed格式

plink –vcf smoove_filtered.vcf –out sv.plink –make-bed
接下来是在R语言里的操作

library(SNPRelate)
library(ape)

prefix =”sv.plink”

snpgdsBED2GDS(‘sv.plink.bed’, ‘sv.plink.fam’, ‘sv.plink.bim’, ‘sv.plink.gds’)
genofile <- snpgdsOpen(paste0(prefix, “.gds”))
snpgdsSummary(genofile)
dissMatrix <- snpgdsDiss(genofile, sample.id = NULL, snp.id = NULL, autosome.only = FALSE,remove.monosnp = TRUE, maf = NaN, missing.rate = NaN, num.thread = 4, verbose = TRUE)
saveRDS(dissMatrix, paste0(prefix, “.rds”))
colnames(dissMatrix$diss) <- dissMatrix$sample.id
tr <- bionjs(dissMatrix$diss)
write.tree(tr, file = paste0(prefix, “.newick”), append = FALSE,digits = 10, tree.names = FALSE)
用ggtree可视化展示

library(ggtree)

read.tree(“C:/Users/lenovo/Desktop/sv.plink.newick”) %>%
ggtree(layout=”circular”,branch.length = “none”)

欢迎大家关注我的公众号

小明的数据分析笔记本

声明:来自小明的数据分析笔记本,仅代表创作者观点。链接:https://eyangzhen.com/7524.html

小明的数据分析笔记本的头像小明的数据分析笔记本

相关推荐

添加微信
添加微信
Ai学习群
返回顶部