跟着Nature Genetics学数据分析：使用GEC软件计算有效位点数从而确定GWAS的阈值

论文

Super-pangenome analyses highlight genomic diversity and structural variation across wild and cultivated tomato species

https://www.nature.com/articles/s41588-023-01340-y

西红柿NG_superPan正文.pdf

数据分析的代码

https://github.com/HongboDoll/TomatoSuperPanGenome

论文里提供了绝大部分的数据处理代码，很好的学习材料，今天的推文我们学习一下论文中确定GWAS分析阈值的方法，论文里写到

The genome-wide significance thresholds (7.58 × 10−7) were determined using a uniform threshold of 1/n, where n is the effective number of independent SNPs and SVs calculated using the Genetic type 1 Error Calculator (v.0.2)

但是没有在论文里找到对应的这部分代码

在另外一篇黄瓜的Nature Communications的论文里也提到了这个方法，论文是

Graph-based pan-genome reveals structural and sequence variations related to agronomic traits and domestication in cucumber

方法部分写到

The genome-wide significance threshold (3.46 × 10−5) was determined by a uniform threshold of 1/n, where n was the effective number of independent SVs calculated using Genetic type 1 Error Calculator (v0.2)

GEC软件的主页

http://pmglab.top/gec/#/download

帮助手册

http://pmglab.top/gec/data/archive/v0.2/UserManualV0.2.pdf

软件对应的论文

Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets

https://link.springer.com/article/10.1007/s00439-011-1118-2

下载下来直接解压就能用

这个软件的全称是 GEC: The Genetic Type I error calculator

首先使用plink把vcf文件转换成bed文件

/biotools/plink19/plink --vcf input.vcf --make-bed --out abc

abc是输出文件的前缀，我把这些文件输出到了outputfolder这个文件夹下

计算有效位点数

java -jar -Xmx8g ~/biotools/GEC/gec/gec.jar --effect-number --plink-binary outputfolder/abc --genome --out test1

生成了一个文件 test1.sum，里面的信息有

能得到结果，不太确定整个过程有没有错误，如果有懂行的大佬欢迎留言指教！

这个应该是专门为人类数据分析准备的，计算过程会输出，不知道换成其他物种是否有参数需要更改

欢迎大家关注我的公众号

声明：文中观点不代表本站立场。本文传送门：https://eyangzhen.com/31779.html

跟着Nature Genetics学数据分析：使用GEC软件计算有效位点数从而确定GWAS的阈值

论文

作者专栏

小明的数据分析笔记本