全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Evaluating Privacy Leakage and Memorization Attacks on Large Language Models (LLMs) in Generative AI Applications

DOI: 10.4236/jsea.2024.175023, PP. 421-447

Keywords: Large Language Models, PII Leakage, Privacy, Memorization, Overfitting, Membership Inference Attack (MIA)

Full-Text   Cite this paper   Add to My Lib

Abstract:

The recent interest in the deployment of Generative AI applications that use large language models (LLMs) has brought to the forefront significant privacy concerns, notably the leakage of Personally Identifiable Information (PII) and other confidential or protected information that may have been memorized during training, specifically during a fine-tuning or customization process. We describe different black-box attacks from potential adversaries and study their impact on the amount and type of information that may be recovered from commonly used and deployed LLMs. Our research investigates the relationship between PII leakage, memorization, and factors such as model size, architecture, and the nature of attacks employed. The study utilizes two broad categories of attacks: PII leakage-focused attacks (auto-completion and extraction attacks) and memorization-focused attacks (various membership inference attacks). The findings from these investigations are quantified using an array of evaluative metrics, providing a detailed understanding of LLM vulnerabilities and the effectiveness of different attacks.

References

[1]  Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M., Socher, R., et al. (2024) Large Language Models: A Survey.
https://arxiv.org/abs/2402.06196
[2]  Jeong, C. (2024) Fine-Tuning and Utilization Methods of Domainspecific Llms.
https://arxiv.org/abs/2401.02981
[3]  Hartmann, V., Suri, A., Bindschaedler, V., Evans, D., Tople, S. and West, R. (2023) SoK: Memorization in Generalpurpose Large Language Models.
https://arxiv.org/abs/2310.18362
[4]  Carlini, N., Ippolito, D., Jagielski, M., Lee, K., Tramer, F. and Zhang, C.Y. (2023) Quantifying Memorization across Neural Language Models.
https://arxiv.org/abs/2202.07646
[5]  Mireshghallah, F., Uniyal, A., Wang, T.H., Evans, D. and Berg-Kirkpatrick, T. (2022) An Empirical Analysis of Memorization in Fine-Tuned Autoregressive Language Models. In Goldberg, Y., Kozareva, Z., and Zhang, Y., (Eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 1816-1826.
https://doi.org/10.18653/v1/2022.emnlp-main.119
[6]  Tirumala, K., Markosyan, A.H., Zettlemoyer, L. and Aghajanyan, A. (2022) Memorization without Overfitting: Analyzing the Training Dynamics of Large Language Models.
https://arxiv.org/abs/2205.10770
[7]  Piwik, P.R.O. (2024) What Is PII, Non-PII, and Personal Data? [UPDATED].
https://piwik.pro/blog/what-is-pii-personal-data/
[8]  Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation). Official Journal of the European Union.
[9]  Fu, W.J., Wang, H.D., Gao, C., Liu, G.H., Li, Y. and Jiang, T. (2023) Practical Membership Inference Attacks against Fine-Tuned Large Language Models via Self-Prompt Calibration.
https://arxiv.org/abs/2311.06062
[10]  Nils, L., Ahmed, S., Robert, S., Shruti, T., Lukas, W. and Zanella-Béguelin, S. (2023) Analyzing Leakage of Personally Identifiable Information in Language Models.
https://arxiv.org/abs/2302.00539
[11]  Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., et al. (2021) Extracting Training Data from Large Language Models.
https://arxiv.org/abs/2012.07805
[12]  Sun, A.Y., Zemour, E., Saxena, A., Vaidyanathan, U., Lin, E., Lau, C. and Mugunthan, V. (2024) Does Fine-Tuning GPT-3 with the OpenAI API Leak Personally-Identifiable Information?
https://arxiv.org/abs/2307.16382
[13]  Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Timothée, L., et al. (2023) Llama: Open and Efficient Foundation Language Models.
https://arxiv.org/abs/2302.13971
[14]  Jiang, A.Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D.S., de las Casas. D., et al. (2023) Mistral 7B.
https://arxiv.org/abs/2310.06825
[15]  Jiang, A.Q., Sablayrolles, A., Roux, A., Mensch, A., Savary, B., Bamford, C., et al. (2024) Mixtral of Experts.
https://arxiv.org/abs/2401.04088
[16]  Zhang, R. and Tetreault, J. (2019) This Email Could Save Your Life: Introducing the Task of Email Subject Line Generation.
https://arxiv.org/abs/1906.03497
[17]  Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y.Z., Wang, S.A., et al. (2021) Lora: Low-Rank Adaptation of Large Language Models.
https://arxiv.org/abs/2106.09685
[18]  Casper, S., Ezell, C., Siegmann, C., Kolt, N., Curtis, T.L., Bucknall, B., et al. (2024) Black-Box Access Is Insufficient for Rigorous AI Audits.
https://arxiv.org/abs/2401.14446
[19]  Mattern, J., Mireshghallah, F., Jin, Z.j., Schölkopf, B., Sachan, M. and Berg-Kirkpatrick, T. (2023) Membership Inference Attacks against Language Models via Neighbourhood Comparison.
https://arxiv.org/abs/2305.18462
[20]  Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2019) Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding.
https://arxiv.org/abs/1810.04805
[21]  Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., et al. (2023) Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.
https://arxiv.org/abs/1910.10683

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413