跟着Nature学数据分析：GWAS分析前对表型数据进行标准化转换（基于秩次的逆正态变换）

论文
From genotype to phenotype with 1,086 near telomere-to-telomere yeast genomes

https://www.nature.com/articles/s41586-025-09637-0

论文中方法部分写

LDAK v.4.2 was used for the computation. Phenotypes were normalized using a rank-based inverse normal transformation. We ran GWAS using a linear mixed model implemented in FaST-LMM v.0.4.6. Phenotypes were normalized in the same way as for heritability estimates.

在大模型中为了这个标准化的方法

论文中提供的R语言代码

rank.based.INT <- function(x, c=3/8, method=”average”)
{
# This function performs the rank-based inverse normal transformation (INT)
# If method is “average” ties will share the same average value.
# If method is “random”, ties are given rank randomly
# Formula found in Beasley 2009 with an offset of c=3/8 as recommended in Blom 1958

r <- rank(x, ties.method = method)
r[is.na(x)] <- NA # reput NA values in the vector because rank() gives a rank to NAs
N <- length(x[!is.na(x)])
qnorm((r-c)/(N-2*c+1))
}

用论文中提供的表型数据测试一下

原始表达数据的分布

read_tsv(“D:/1086YeastGenomes-main/Phenotypes_8391Traits/ExpressionTraits/YAL001C.RNASeq.phen”) %>%
ggplot(aes(x=YAL001C.RNASeq))+
geom_histogram(color=”white”,fill=”#d55e00″)+
theme_bw(base_size = 15)

标准化以后的表型分布

data.frame(x=rank.based.INT(pheno)) %>%
ggplot(aes(x=x))+
geom_histogram(color=”grey”,fill=”#009f71″)+
theme_bw(base_size = 15)

标准化以后看起来的正态分布更规整了

阅读原文 >>

声明：来自小明的数据分析笔记本，仅代表创作者观点。链接：https://eyangzhen.com/6959.html

跟着Nature学数据分析：GWAS分析前对表型数据进行标准化转换（基于秩次的逆正态变换）

相关推荐