Decodingtrust: A comprehensive assessment of trustworthiness in gpt models B Wang, W Chen, H Pei, C Xie, M Kang, C Zhang, C Xu, Z Xiong, R Dutta, ... arXiv preprint arXiv:2306.11698, 2023 | 114 | 2023 |
Umd: Unsupervised model detection for x2x backdoor attacks Z Xiang, Z Xiong, B Li International Conference on Machine Learning, 38013-38038, 2023 | 9 | 2023 |
Badchain: Backdoor chain-of-thought prompting for large language models Z Xiang, F Jiang, Z Xiong, B Ramasubramanian, R Poovendran, B Li arXiv preprint arXiv:2401.12242, 2024 | 8 | 2024 |
Label-smoothed backdoor attack M Peng, Z Xiong, M Sun, P Li arXiv preprint arXiv:2202.11203, 2022 | 8 | 2022 |
RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content Z Yuan, Z Xiong, Y Zeng, N Yu, R Jia, D Song, B Li arXiv preprint arXiv:2403.13031, 2024 | 1 | 2024 |
CBD: A certified backdoor detector based on local dominant probability Z Xiang, Z Xiong, B Li Advances in Neural Information Processing Systems 36, 2024 | 1 | 2024 |
Rethinking the Necessity of Labels in Backdoor Removal Z Xiong, D Wu, Y Wang, Y Wang ICLR 2023 Workshop on Backdoor Attacks and Defenses in Machine Learning, 2023 | | 2023 |