引言

❝
单细胞和单核 RNA 测序中存在普遍的环境 RNA 污染问题,会导致基因表达水平的偏差，影响分析结果。环境 RNA 指的是在单细胞 RNA 测序过程中，存在于细胞外溶液中的 RNA 分子。这些 RNA 主要来源于：死亡或破碎细胞释放的 RNA，样本制备过程中细胞破裂释放的 RNA，细胞自然分泌的 RNA。目前针对单细胞去噪的工具不算多，下面介绍一下有哪些。
decontX

decontX 在 2020 年 3 月发表在 Genome Biology 期刊上：

github 地址：https://github.com/campbio/decontX

SoupX

SoupX，在 2020 年 12 月发表在 GigaScience 期刊上：

github 地址：https://github.com/constantAmateur/SoupX

scAR

scAR，在 2022 年 3 月发表在预印本上：

github 地址：https://github.com/Novartis/scAR

CellBender

CellBender，在 2023 年 8 月发表在 Nature Methods 期刊上：

github 地址：https://github.com/broadinstitute/CellBender

原理：

不同数据集效果展示：

安装：
conda create -n cellbender python=3.7
conda activate cellbender
pip install cellbender
使用：
❝
需要输入 cellranger count 输出没有 filter 的矩阵：
cellbender remove-background \
–cuda \
–input raw_feature_bc_matrix.h5 \
–output output.h5

DecontPro

DecontPro，在 2024 年 1 月发表在 Nucleic Acids Research 期刊上：

github 地址：https://github.com/campbio/decontX

scCDC

scCDC，在 2024 年 5 月发表在 Genome Biology 期刊上：

github 地址：https://github.com/ZJU-UoE-CCW-LAB/scCDC

方法原理：

和之前工具的比较：

❝
现有的去污染方法存在一些局限性,如 DecontX 和 CellBender 会对高度污染基因校正不足,而 SoupX 和 scAR 则会对低度/非污染基因过度校正。
作者提出了一种新的方法 scCDC,它可以检测导致污染的基因(GCGs),并只校正这些基因的表达水平,避免了对其他基因的过度校正。
与现有方法相比,scCDC 在校正高度污染基因的同时,也能避免对低度/非污染基因的过度校正,从而提高了基因标记和基因网络构建的准确性。
安装：
if(!require(“devtools”, quietly = TRUE)){
install.packages(“devtools”)
}

library(devtools)
install_github(“ZJU-UoE-CCW-LAB/scCDC”)
测试代码，先走标准 seurat 流程：
library(scCDC)

data(mislet_before, package = “scCDC”)

mislet_seuratobj<-CreateSeuratObject(mislet_before)
mislet_seuratobj <- NormalizeData(mislet_seuratobj,
normalization.method = “LogNormalize”,scale.factor = 10000)
mislet_seuratobj <- FindVariableFeatures(mislet_seuratobj,
selection.method = “vst”, nfeatures = 2000)
all.genes <- rownames(mislet_seuratobj)
mislet_seuratobj <- ScaleData(mislet_seuratobj, features = all.genes)
mislet_seuratobj <- RunPCA(mislet_seuratobj,
features = VariableFeatures(object = mislet_seuratobj))

We use pre-defined clustering annotation here.

If no pre-defined clustering information is available, the standard clustering

procedure in Seurat should be applied.

mislet_seuratobj@active.ident<-mislet_annotation
mislet_seuratobj <- RunUMAP(mislet_seuratobj,dims=1:20)
1.检测污染（Contamination detection)：identify the global contamination-causing genes (GCGs):
GCGs <- ContaminationDetection(mislet_seuratobj)
rownames(GCGs)

[1] “Sst” “Gcg” “Ppy” “Ins2” “Pyy” “Ins1” “Ttr”

[8] “Malat1” “Iapp” “Gm42418” “Rbp4” “Fth1”

2.定量污染比例（Contamination quantification），如果污染比例高于 0.0003 ，强烈建议使用 scCDC ，如果低于这个比例，其他工具也可以：
mislet_cont_ratio <- ContaminationQuantification(mislet_seuratobj,rownames(GCGs))
mislet_cont_ratio

[1] 0.02847532

3.矫正（Contamination correction），输出的结果在‘Corrected’ assay 里面，也是 seurat 对象：
mislet_seuratobj_corrected <- ContaminationCorrection(mislet_seuratobj, rownames(GCGs))
提取矫正后的矩阵：
corrected_count_matrix = data.frame(mislet_seuratobj_corrected@assays[[“Corrected”]]@layers$counts)
❝
如果你有多个样本，可以单独走这个流程，然后分别拿到矫正后的 counts 矩阵，再重新创建 seurat 对象走多样本整合，继续下游的标准分析流程。
结尾

❝
路漫漫其修远兮,吾将上下而求索。
欢迎加入生信交流群。加我微信我也拉你进微信群聊老俊俊生信交流群 (微信交流群需收取 20 元入群费用,一旦交费,拒不退还!(防止骗子和便于管理)) 。

声明：文中观点不代表本站立场。本文传送门：https://eyangzhen.com/424347.html

单细胞去噪工具一览