The recent interest in the deployment of Generative AI applications that use large language models (LLMs) has brought to the forefront significant privacy concerns, notably the leakage of Personally Identifiable Information (PII) and other confidential or protected information that may have been memorized during training, specifically during a fine-tuning or customization process. This inadvertent leakage of sensitive information typically occurs when the models are subjected to black-box attacks. To address the growing concerns of safeguarding private and sensitive information while simultaneously preserving its utility, we analyze the performance of Targeted Catastrophic Forgetting (TCF). TCF involves preserving targeted pieces of sensitive information within datasets through an iterative pipeline which significantly reduces the likelihood of such information being leaked or reproduced by the model during black-box attacks, such as the autocompletion attack in our case. The experiments conducted using TCF evidently demonstrate its capability to reduce the extraction of PII while still preserving the context and utility of the target application.
References
[1]
Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M., Socher, R., Amatriain, X. and Gao, J.F. (2024) Large Language Models: A Survey. arXiv: 2402.06196. https://arxiv.org/abs/2402.06196
[2]
Naveed, H., Khan, A.U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N. and Mian, A. (2024) A Comprehensive Overview of Large Language Models. arXiv: 2307.06435. https://arxiv.org/abs/2307.06435
[3]
Hartmann, V., Suri, A., Bindschaedler, V., Evans, D., Tople, S. and West, R. (2023) SoK: Memorization in General-Purpose Large Language Models. arXiv: 2310.18362. https://arxiv.org/abs/2310.18362
[4]
Pope, A. (2024) NYT v. OpenAI: The Times’s About-Face. Harvard Law Review. https://harvardlawreview.org/blog/2024/04/nyt-v-openai-the-timess-about-face/
[5]
Aditya, H., Chawla, S., Dhingra, G., Rai, P., Sood, S., Singh, T., et al. (2024) Evaluating Privacy Leakage and Memorization Attacks on Large Language Models (LLMs) in Generative AI Applications. Journal of Software Engineering and Applications, 17, 421-447. https://doi.org/10.4236/jsea.2024.175023
[6]
Chen, J. and Yang, D. (2023) Unlearn What You Want to Forget: Efficient Unlearning for LLMs. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, December 203, 12041–12052. https://doi.org/10.18653/v1/2023.emnlp-main.738
[7]
Pochinkov, N. and Schoots, N. (2024) Dissecting Language Models: Machine Unlearning via Selective Pruning. arXiv: 2403.01267.
[8]
Bhaila, K., Van, M.H. and Wu, X.T. (2024) Soft Prompting for Unlearning in Large Language Models. arXiv: 2406.12038. https://arxiv.org/abs/2406.12038
[9]
Mirza, M. and Osindero, S. (2014) Conditional Generative Adversarial Nets. arXiv: 1411.1784.
[10]
Yan, B.W., Li, K., Xu, M.H., Dong, Y.Y., Zhang, Y., Ren, Z.C. and Cheng, X.Z. (2024) On Protecting the Data Privacy of Large Language Models (LLMs): A Survey. arXiv: 2403.05156. https://arxiv.org/abs/2403.05156
[11]
Kumar, A., Murthy, S.V., Singh, S. and Ragupathy, S. (2024) The Ethics of Interaction: Mitigating Security Threats in LLMs. arXiv: 2401.12273. https://arxiv.org/abs/2401.12273
[12]
Liu, Z.Y., Wang, J.H. and Liang, Z.W. (2019) CatGAN: Category-Aware Generative Adversarial Networks with Hierarchical Evolutionary Learning for Category Text Generation. arXiv: 1911.06641.
[13]
Jabbar, A., Li, X. and Omar, B. (2020) A Survey on Generative Adversarial Networks: Variants, Applications, and Training. arXiv: 2006.05132.
[14]
Luo, Y., Yang, Z., Meng, F.D., Li, Y.F., Zhou, J. and Zhang, Y. (2024) An Empirical Study of Catastrophic Forgetting in Large Language Models during Continual Fine-tuning. arXiv: 2308.08747.
[15]
Sun, A.Y., Zemour, E., Saxena, A., Vaidyanathan, U., Lin, E., Lau, C. and Mugunthan, V. (2024) Does Fine-Tuning GPT-3 with the OpenAI API Leak Personally-Identifiable Information? arXiv: 2307.16382. https://arxiv.org/abs/2307.16382