全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于保险反欺诈任务的跨表特征工程方法
Cross Feature Engineering for Anti-Fraud Task in Insurance

DOI: 10.12677/AIRR.2024.132048, PP. 467-477

Keywords: 保险,反欺诈,人工智能,特征爆炸,跨表特征工程
Insurance
, Anti-Fraud, AI, Feature Exponential Growth, Cross Feature Engineering

Full-Text   Cite this paper   Add to My Lib

Abstract:

特征工程是使用机器学习技术解决场景任务过程的核心环节,特征工程的质量决定了模型效果的上限。本文将聚焦汽车保险反欺诈任务,研究跨表特征工程技术,解决汽车保险反欺诈过程中的数据表格聚合和高效特征挖掘问题,用于支撑下游反欺诈建模任务。目前,单表的特征工程算法较为成熟,而跨表的特征工程算法相对较少。相比于单表特征工程,多表之间的特征衍生所涉及的特征数目更多,更容易出现特征爆炸。针对这一问题,我们提出了xDFS方法,在DFS (Deep Feature Synthesis)方法上进行优化,引入对单表的统计分析过程,避免了DFS在数据预处理阶段的特征拆分,利用xgboost模型计算特征衍生的最优组合,进而解决了跨表特征衍生过程中的特征爆炸问题。在实验过程中,我们将xDFS方法在两个公开数据集和一个车险数据集上进行测试,发现当衍生特征深度较深时,DFS出现特征爆炸问题,而xDFS均未产生特征爆炸问题。
Feature engineering is the core part to introduce machine learning into application, which determines the best performance of a model. The current paper will be focused on the anti-fraud task in auto insurance, study the cross feature engineering so as to solve the aggregation among multi tables and efficiently mining deep features, as a result supporting anti-fraud modeling task. Currently, feature engineering in independent dataset is relatively mature, but less research in relational cross datasets. We develop the xDFS method based on DFS (deep feature synthesis), which introduces groupby method to get statistical features between different entities in same dataset without entity extraction and feature aggregation. Besides, xDFS uses xgboost to get feature combinations and avoid the exponential growth as increase of synthesis depth. Experiments on two public datasets and an auto insurance dataset show that feature exponential growth in DFS, while not in xDFS.

References

[1]  国家金融监督管理总局官网[EB/OL].
https://www.cbirc.gov.cn/cn/view/pages/tongjishuju/tongjishuju.html, 2024-02-23.
[2]  喻炜, 冯根福, 张文珺. 机动车辆保险欺诈检测系统及团伙识别研究[J]. 保险研究, 2017(2): 63-73.
[3]  车险反欺诈联合课题组. 车险欺诈与反欺诈问题研究及监管建议[J]. 保险研究, 2021(6): 3-10.
[4]  卢冰洁, 李炜卓, 那崇宁, 等. 机器学习模型在车险欺诈检测的研究进展[J]. 计算机工程与应用, 2022, 58(5): 34-49.
[5]  Yang, J., Chen, K., Ding, K., et al. (2023) Auto Insurance Fraud Detection with Multimodal Learning. Data Intelligence, 5, 388-412.
https://doi.org/10.1162/dint_a_00191
[6]  Nian, K., Zhang, H., Tayal, A., et al. (2016) Auto Insurance Fraud Detection Using Unsupervised Spectral Ranking for Anomaly. The Journal of Finance and Data Science, 2, 58-75.
https://doi.org/10.1016/j.jfds.2016.03.001
[7]  Wang, Y. and Xu, W. (2018) Leveraging Deep Learning with LDA-Based Text Analytics to Detect Automobile Insurance Fraud. Decision Support Systems, 105, 87-95.
https://doi.org/10.1016/j.dss.2017.11.001
[8]  Luo, Y., Wang, M., Zhou, H., et al. (2019) Autocross: Automatic Feature Crossing for Tabular Data in Real-World Applications. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1936-1945.
https://doi.org/10.1145/3292500.3330679
[9]  Liu, B., Zhu, C., Li, G., et al. (2020) Autofis: Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2636-2645.
https://doi.org/10.1145/3394486.3403314
[10]  Yu, R., Ye, Y., Liu, Q., et al. (2021) Xcrossnet: Feature Structure-Oriented Learning for Click-Through Rate Prediction. Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer International Publishing, Cham, 436-447.
https://doi.org/10.1007/978-3-030-75765-6_35
[11]  Katz, G., Shin, E.C.R. and Song, D. (2016) Explorekit: Automatic Feature Generation and Selection. 2016 IEEE 16th International Conference on Data Mining (ICDM), 979-984.
https://doi.org/10.1109/ICDM.2016.0123
[12]  Shi, Q., Zhang, Y.L., Li, L., et al. (2020) Safe: Scalable Automatic Feature Engineering Framework for Industrial Tasks. 2020 IEEE 36th International Conference on Data Engineering (ICDE), 1645-1656.
https://doi.org/10.1109/ICDE48307.2020.00146
[13]  Tsang, M., Cheng, D., Liu, H., et al. (2020) Feature Interaction Interpretability: A Case for Explaining Ad-Recommendation Systems via Neural Interaction Detection. arXiv Preprint arXiv:2006.10966.
[14]  Su, Y., Zhang, R., Erfani, S., et al. (2021) Detecting Beneficial Feature Interactions for Recommender Systems. Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI), 35, No. 5.
https://doi.org/10.1609/aaai.v35i5.16561
[15]  Deng, W., Pan, J., Zhou, T., et al. (2021) Deeplight: Deep Lightweight Feature Interactions for Accelerating CTR Predictions in Ad Serving. Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 922-930.
https://doi.org/10.1145/3437963.3441727
[16]  Zhao, P., Xiao, K., Zhang, Y., et al. (2020) Amer: Automatic Behavior Modeling and Interaction Exploration in Recommender System. arXiv Preprint arXiv:2006.05933.
https://doi.org/10.24963/ijcai.2021/290
[17]  Kanter, J.M. and Veeramachaneni, K. (2015) Deep Feature Synthesis: Towards Automating Data Science Endeavors. 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 1-10.
https://doi.org/10.1109/DSAA.2015.7344858
[18]  Limsurut, T. and Chaisangmongkon, W. (2019) Event-Based Feature Synthesis: Autonomous Data Science Engine. Journal of Computers, 30, 55-67.
[19]  McKinney, W. (2011) Pandas: A Foundational Python Library for Data Analysis and Statistics. Python for High Performance and Scientific Computing, 14, 1-9.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413