论文
From genotype to phenotype with 1,086 near telomere-to-telomere yeast genomes
https://www.nature.com/articles/s41586-025-09637-0
论文中方法部分写
LDAK v.4.2 was used for the computation. Phenotypes were normalized using a rank-based inverse normal transformation. We ran GWAS using a linear mixed model implemented in FaST-LMM v.0.4.6. Phenotypes were normalized in the same way as for heritability estimates.
在大模型中为了这个标准化的方法
论文中提供的R语言代码
rank.based.INT <- function(x, c=3/8, method=”average”)
{
# This function performs the rank-based inverse normal transformation (INT)
# If method is “average” ties will share the same average value.
# If method is “random”, ties are given rank randomly
# Formula found in Beasley 2009 with an offset of c=3/8 as recommended in Blom 1958
r <- rank(x, ties.method = method)
r[is.na(x)] <- NA # reput NA values in the vector because rank() gives a rank to NAs
N <- length(x[!is.na(x)])
qnorm((r-c)/(N-2*c+1))
}
用论文中提供的表型数据测试一下
原始表达数据的分布
read_tsv(“D:/1086YeastGenomes-main/Phenotypes_8391Traits/ExpressionTraits/YAL001C.RNASeq.phen”) %>%
ggplot(aes(x=YAL001C.RNASeq))+
geom_histogram(color=”white”,fill=”#d55e00″)+
theme_bw(base_size = 15)
标准化以后的表型分布
data.frame(x=rank.based.INT(pheno)) %>%
ggplot(aes(x=x))+
geom_histogram(color=”grey”,fill=”#009f71″)+
theme_bw(base_size = 15)
标准化以后看起来的正态分布更规整了
声明:来自小明的数据分析笔记本,仅代表创作者观点。链接:https://eyangzhen.com/6959.html