%0 Journal Article %T 基于API序列特征和统计特征组合的恶意样本检测框架<br>API based sequence and statistical features in a combined malware detection architecture %A 芦效峰 %A 蒋方朔 %A 周箫 %A 崔宝江 %A 伊胜伟 %A 沙晶 %J 清华大学学报(自然科学版) %D 2018 %R 10.16511/j.cnki.qhdxxb.2018.25.020 %X 针对恶意样本行为分析,该文提出了一种组合机器学习框架,首先对应用程序编程接口(application programming interface,API)序列中调用的依赖关系进行功能层面上的分析,提取特征,使用随机森林进行检测;其次利用深度学习中的循环神经网络处理时间序列数据的特性,在冗余信息预处理的基础上,直接对序列进行学习和检测;最后对2种方法进行了组合。在恶意软件样本上进行的实验结果表明: 2种方法均可有效检测恶意样本,但是组合学习的效果更优,AUC (area under the curve of ROC)达到99.3%,优于现有的类似研究结果。<br>Abstract:This paper presents a combined machine learning framework for malware behavior analyses. One part of the framework analyzes the dependency relation in the API call sequence at the functional level to extract features to train and classify a random forest. The other part uses a recurrent neural network (RNN) to study the API sequence to identify malware with redundant information preprocessing using the RNN time series forecasting ability. Tests on a malware dataset show that both methods can effectively detect malwares. However, the combined framework is better with an AUC of 99.3%. %K 计算机病毒与防治 %K 恶意样本检测 %K 机器学习 %K 深度学习 %K 调用序列 %K < %K br> %K computer virus and prevention %K malware classification %K machine learning %K deep learning %K call sequence %U http://jst.tsinghuajournals.com/CN/Y2018/V58/I5/500