%0 Journal Article %T 加密流量数据集类别不平衡的研究
Research on Class Imbalance in Encrypted Traffic Datasets %A 王晓 %J Pure Mathematics %P 23-33 %@ 2160-7605 %D 2024 %I Hans Publishing %R 10.12677/PM.2024.141004 %X 近年来,随着深度学习技术的迅猛发展,网络安全领域的研究人员开始探索利用深度学习解决加密流量分类问题。然而,目前公开的加密流量数据集存在严重的类别不平衡问题,这对于深度学习分类方法的性能造成了一定的影响。从头构建一个完整的加密流量数据集是耗时且昂贵的。为了克服这个问题,本文提出了一种基于改进的生成对抗网络(GAN)的加密流量生成模型。该模型通过在GAN模型中添加数据包的统计特征和网络流的类别标签作为条件约束,从而生成逼真的流量数据,进而扩充数据集。实验证明,在使用经过本文方法增强的数据集时,基于深度学习的加密流量分类器展现出比使用随机过采样(ROS)、合成少数类过采样技术(SMOTE)和传统的对抗生成网络(GAN)技术更出色的性能。
In recent years, with the rapid development of deep learning technology, researchers in the field of network security have begun to explore using deep learning to solve the problem of encrypted traffic classification. However, currently available encrypted traffic datasets suffer from serious class imbalance issues, which can adversely affect the performance of deep learning classification methods. Creating a complete encrypted traffic dataset from scratch is both time-consuming and expensive. To address this issue, this paper proposes an improved generative adversarial network (GAN) based model for generating encrypted traffic data. The model adds packet statistics feature vectors as conditional constraints to the GAN model, thereby generating realistic traffic data to expand the dataset. Experimental results show that when using our method to enhance the dataset, the deep learning-based encrypted traffic classifier exhibits better performance than that using random oversampling (ROS), synthetic minority oversampling technique (SMOTE), and traditional GAN techniques. %K 加密流量分类,平衡数据集,深度学习,生成对抗网络
Encrypted Traffic Classification %K Balanced Dataset %K Deep Learning %K Generative Adversarial Networks %U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=79054