%0 Journal Article
%T 基于自然语言处理的单细胞转录组数据伪时间分析
Pseudo-Time Analysis of Single-Cell Transcriptome Data Based on Natural Language Processing
%A 卢雨儿
%A 胡桓
%A 陈玲玲
%A 程烽
%A 帅建伟
%A 林海
%J Biophysics
%P 31-38
%@ 2330-1694
%D 2022
%I Hans Publishing
%R 10.12677/BIPHY.2022.102004
%X
针对单细胞转录组测序数据,人们已经提出了各种强大的分析模型和处理算法,用于细胞聚类、细胞类型识别、细胞伪时间轨迹推断、细胞RNA动力学、基因调控网络推断和RNA速度分析等。本文提出一种方法,将自然语言处理技术引入单细胞转录组数据分析中。算法首先采用TF-IDF表示转录组基因表达强度对细胞功能的影响程度,进一步把细胞演化发育过程所形成的各种基因表达变化,理解为自然语言中的各种句子文本,创新性地把自然语言文本分析技术应用于单细胞转录组演化发育的处理。通过在基因网络上随机行走生成各种基因序列文本,从而生成基因空间中基因的嵌入式词向量表示和细胞的嵌入式词向量表示,实现了对单细胞转录组数据的伪时间可视化分析。最后的分析结果表明该模型对于单细胞数据进行细胞发育伪时间分析是一种有效的方法。
For single-cell transcriptome
sequencing data, various powerful analytical models and processing algorithms
have been proposed for cell clustering, cell type recognition, cell pseudo-time
trajectory inference, cellular RNA dynamics, gene regulatory network inference,
and RNA velocity analysis. This paper
proposes an innovative approach to introducing natural language processing techniques
into single-cell transcriptome data analysis. The algorithm first uses TF-IDF
to indicate the degree of influence of transcriptome gene expression intensity
on cell function, and further innovatively treats the various gene expression
changes formed by the process of cell evolution and development as various
sentence texts in natural language. Then, the natural language text analysis
can be applied for the processing of evolutionary development of single-cell
transcriptomes. Various gene sequence texts are generated by random walking
process on the gene network, which generates the embedded word vector
representation of genes and the embedded word vector representation of cells in
the gene space, respectively. Finally, the pseudo-time visual analysis is
considered for the single-cell transcriptome data. The final analysis results
show that this model is an effective method for pseudo-time analysis of cell
development for single-cell data.
%K 单细胞测序,伪时间轨迹推断,自然语言处理,基因组学
Single-Cell Sequencing
%K Pseudo-Time Trajectory Inference
%K Natural Language Processing
%K Genomics
%U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=52057