Jailbreak attacks and defenses against large language models: A survey S Yi, Y Liu, Z Sun, T Cong, X He, J Song, K Xu, Q Li arXiv preprint arXiv:2407.04295, 2024 | 39 | 2024 |
On the Generalization Ability of Machine-Generated Text Detectors Y Liu, Z Zhong, Y Liao, Z Sun, J Zheng, J Wei, Q Gong, F Tong, Y Chen, ... arXiv preprint arXiv:2412.17242, 2024 | 1 | 2024 |
Quantized Delta Weight Is Safety Keeper Y Liu, Z Sun, X He, X Huang arXiv preprint arXiv:2411.19530, 2024 | 1 | 2024 |
The Rising Threat to Emerging AI-Powered Search Engines Z Luo, Z Peng, Y Liu, Z Sun, M Li, J Zheng, X He arXiv preprint arXiv:2502.04951, 2025 | | 2025 |
Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media Z Sun, Z Zhang, X Shen, Z Zhang, Y Liu, M Backes, Y Zhang, X He arXiv preprint arXiv:2412.18148, 2024 | | 2024 |
PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning Z Sun, T Cong, Y Liu, C Lin, X He, R Chen, X Han, X Huang arXiv preprint arXiv:2411.17453, 2024 | | 2024 |