%0 Journal Article
%T 基因剪切位点的统计分析研究
Research on Statistical Analysis of Gene Splicing Sites
%A 李宏彬
%A 赫光中
%J Hans Journal of Computational Biology
%P 41-49
%@ 2164-5434
%D 2016
%I Hans Publishing
%R 10.12677/HJCB.2016.63006
%X
真核生物的基因由若干外显子和内含子交替组成,外显子序列在转录后保留,而内含子序列转录过程中被剪切掉。大量分子生物学实验验证基因的剪切位点遵从GT-AG规则,然而只有很少的含GT或AG序列是真剪切位点,目前预测的准确程度仍有待提高。本研究下载了HS3D剪切位点训练数据集,对启动子剪切位点附近的序列进行了统计分析研究。当真、假序列长度在剪切位点左旁和右旁均超出各七个位点时,序列呈现很高的特异性,可以使用这些特异性序列作为特征进行训练,从而准确地识别真假剪切位点。
The genes of eukaryotes are composed of several exons and introns. After transcript process, sequences of exons are retained, while sequences of introns are cleaved off. A large number of experiments of molecular biology validate that the splicing sites between exon and intron follow the rule of GT-AG, only a few GT or AG sequences are true splicing sites, and the accuracy of the prediction still needs to be improved. In this study, the training dataset of splicing site of HS3D was downloaded, and a statistical analysis of the sequence near the splicing site of the promoter was carried out. The sequence showed high specificity when the true and false sequence lengths of the left splicing site side and right splicing site side were both more than seven, which was helpful to train the sequences characters so as to accurately identify the true and false splicing sites.
%K 基因,剪切位点,统计分析
Gene
%K Splice Site
%K Statistical Analysis
%U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=18411