Deep neural networks (DNNs) have achieved great
success in tasks such as image
classification, speech recognition, and natural language processing. However,
they are susceptible to false predictions caused by adversarial exemplars,
which are normal inputs with imperceptible perturbations. Adversarial samples
have been widely studied in image classification, but not as much in text
classification. Current textual attack methods often rely on low-success-rate
heuristic replacement strategies at the character or word level, which cannot
search for the best solution while maintaining semantic consistency and
linguistic fluency. Our framework, FastAttacker, generates natural adversarial
text efficiently and effectively by constructing different semantic
perturbation functions. It optimizes perturbations constrained in generic
semantic spaces, such as the typo space, knowledge space, contextualized
semantic space, or a combination. As a
result, the generated adversarial texts are semantically close to the
original inputs. Experiments show that FastAttacker generates adversarial texts
from different levels of spatial constraints, making the problem of finding synonyms an optimal solution problem. Our
approach is not only robust in terms of attack generation, but also in
terms of adversarial defense. Experiments have shown that state-of-the-art
language models and defense strategies are still vulnerable to FastAttack
attacks.
References
[1]
Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.J., Srivastava, M. and Chang, K.-W. (2018) Generating Natural Language Adversarial Examples. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2890-2896. https://doi.org/10.18653/v1/D18-1316
[2]
Li, L.Y., Ma, R.T., Guo, Q.P., Xue, X.Y. and Qiu, X.P. (2020) Bert-Attack: Adversarial Attack against Bert Using Bert. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 6193-6202. https://doi.org/10.18653/v1/2020.emnlp-main.500
[3]
Jin, D., Jin, Z.J., Zhou, J.T. and Szolovits, P. (2020) Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 8018-8025. https://doi.org/10.1609/aaai.v34i05.6311
[4]
Wang, B.X., Xu, C.J., Liu, X.Y., Cheng, Y. and Li, B. (2022) SemAttack: Natural Textual Attacks via Different Semantic Spaces. Findings of the Association for Computational Linguistics: NAACL 2022, 176-205. https://doi.org/10.18653/v1/2022.findings-naacl.14
[5]
Hou, B.R., Jia, J.H., Zhang, Y.H., Zhang, G.H., Zhang, Y., Liu, S.J. and Chang, S.Y. (2022) TextGrad: Advancing Robustness Evaluation in NLP by Gradient-Driven Optimization. https://arxiv.org/abs/2212.09254
[6]
Bengio, Y., Leonard, N. and Courville, A. (2013) Estimating or Propagating Gradients through Stochastic Neurons for Conditional Computation. https://arxiv.org/abs/1308.3432
[7]
Feng, S., Wallace, E., Grissom, I., Iyyer, M., Rodriguez, P., Boyd-Graber, J., et al. (2018) Pathologies of Neural Models Make Interpretations Difficult. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 3719-3728. https://doi.org/10.18653/v1/D18-1407
[8]
Carlini, N. and Wagner, D. (2017) Towards Evaluating the Robustness of Neural Networks. IEEE Symposium on Security and Privacy, 39-57. https://doi.org/10.1109/SP.2017.49
[9]
Carmon, Y., Raghunathan, A., Schmidt, L., Duchi, J.C. and Liang, P.S. (2019) Unlabeled Data Improves Adversarial Robustness. Advances in Neural Information Processing Systems. https://arxiv.org/abs/1905.13736
[10]
Ebrahimi, J., Rao, A., Lowd, D. and Dou, D. (2017) Hotflip: White-Box Adversarial Examples for Text Classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 31-36. https://doi.org/10.18653/v1/P18-2006
[11]
Jang, E., Gu, S.X. and Poole, B. (2016) Categorical Reparameterization with Gumbel-Softmax. International Conference on Learning Representations. https://openreview.net/pdf?id=rkE3y85ee
[12]
Kurakin, A., Goodfellow, I. and Bengio, S. (2016) Adversarial Examples in the Physical World. Chapman and Hall/CRC eBooks, 99-112. https://doi.org/10.1201/9781351251389-8
[13]
Chi, P., Chung, P., Wu, T., Hsieh, C., Chen, Y., Li, S. and Lee, H. (2021) Audio Albert: A Lite Bert for Self-Supervised Learning of Audio Representation. 2021 IEEE Spoken Language Technology Workshop (SLT), Shenzhen, 19-22 January 2021, 344-350. https://doi.org/10.1109/SLT48900.2021.9383575
[14]
Li, Z.Y., Xu, J.H., Zeng, J.H., Li, L.Y., Zheng, X.Q., Zhang, Q., Chang, K.W. and Hsieh, C.-J. (2021) Searching for an Effective Defender: Benchmarking Defense against Adversarial Word Substitution. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 3137-3147. https://doi.org/10.18653/v1/2021.emnlp-main.251