%0 Journal Article %T 基于保险反欺诈任务的跨表特征工程方法
Cross Feature Engineering for Anti-Fraud Task in Insurance %A 董今妮 %A 邓潇 %A 那崇宁 %A 杨耀 %A 陈奎 %J Artificial Intelligence and Robotics Research %P 467-477 %@ 2326-3423 %D 2024 %I Hans Publishing %R 10.12677/AIRR.2024.132048 %X 特征工程是使用机器学习技术解决场景任务过程的核心环节,特征工程的质量决定了模型效果的上限。本文将聚焦汽车保险反欺诈任务,研究跨表特征工程技术,解决汽车保险反欺诈过程中的数据表格聚合和高效特征挖掘问题,用于支撑下游反欺诈建模任务。目前,单表的特征工程算法较为成熟,而跨表的特征工程算法相对较少。相比于单表特征工程,多表之间的特征衍生所涉及的特征数目更多,更容易出现特征爆炸。针对这一问题,我们提出了xDFS方法,在DFS (Deep Feature Synthesis)方法上进行优化,引入对单表的统计分析过程,避免了DFS在数据预处理阶段的特征拆分,利用xgboost模型计算特征衍生的最优组合,进而解决了跨表特征衍生过程中的特征爆炸问题。在实验过程中,我们将xDFS方法在两个公开数据集和一个车险数据集上进行测试,发现当衍生特征深度较深时,DFS出现特征爆炸问题,而xDFS均未产生特征爆炸问题。
Feature engineering is the core part to introduce machine learning into application, which determines the best performance of a model. The current paper will be focused on the anti-fraud task in auto insurance, study the cross feature engineering so as to solve the aggregation among multi tables and efficiently mining deep features, as a result supporting anti-fraud modeling task. Currently, feature engineering in independent dataset is relatively mature, but less research in relational cross datasets. We develop the xDFS method based on DFS (deep feature synthesis), which introduces groupby method to get statistical features between different entities in same dataset without entity extraction and feature aggregation. Besides, xDFS uses xgboost to get feature combinations and avoid the exponential growth as increase of synthesis depth. Experiments on two public datasets and an auto insurance dataset show that feature exponential growth in DFS, while not in xDFS. %K 保险,反欺诈,人工智能,特征爆炸,跨表特征工程
Insurance %K Anti-Fraud %K AI %K Feature Exponential Growth %K Cross Feature Engineering %U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=88956