使用paragraph软件利用二代测序数据对已知结构变异（SV）进行基因型分型（genotyping）

paragraph软件对应的论文
Paragraph: a graph-based structural variant genotyper for short-read sequence data
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1909-7
软件对应的github主页
https://github.com/Illumina/paragraph
软件可以直接使用conda进行安装
参考基因组、变异vcf文件、二代测序数据下载自链接
https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=publications/vgsv2019/simulation/
来源于论文
Genotyping structural variants in pangenome graphs using the vg toolkit
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1941-7
参考代码也来源于这篇论文，链接是
https://github.com/vgteam/sv-genotyping-paper/blob/master/simulation/genotype-other-methods.sh
有了参考基因组、已知结构变异的vcf文件、和一些样本的二代测序数据，对已知的结构变异进行基因型分型
第一步是二代测序数据与参考基因组进行比对
bwa index ref.fa
bwa mem ref.fa s2.fastq.gz -p -R “@RG\tID:$s2\tSM:s2” -t 8 | samtools sort – > s2.bam
bwa mem ref.fa s1.fastq.gz -p -R “@RG\tID:$s1\tSM:s1” -t 8 | samtools sort – > s1.bam

samtools index s2.bam
samtools index s1.bam
第二步是准备配置文件
配置文件的内容如下
id,path,depth,read length
s10,s1.bam,20,150
s20,s2.bam,20,150
运行paragraph
multigrmpy.py -m samples_for_paragraph.txt -i truth.vcf -r ref.fa -o paragraph.output -t 8
输出文件的内容

image.png
欢迎大家关注我的公众号
小明的数据分析笔记本

声明：文中观点不代表本站立场。本文传送门：https://eyangzhen.com/416248.html

使用paragraph软件利用二代测序数据对已知结构变异（SV）进行基因型分型（genotyping）

作者专栏