Xiao Wang

引用次数

	总计	2019 年至今
引用	480	480
h 指数	8	8
i10 指数	7	7

280

140

210

202120222023202411 42 161 265

开放获取的出版物数量

查看全部

1 篇文章

0 篇文章

可查看的文章

无法查看的文章

根据资助方的强制性开放获取政策

合著作者

Qi Zhang (张奇)Professor of Computer Science, Fudan University在 fudan.edu.cn 的电子邮件经过验证
Tao Gui （桂韬）复旦大学在 fudan.edu.cn 的电子邮件经过验证
Huang Xuanjing (黄萱菁)Professor of Computer Science, Fudan University在 fudan.edu.cn 的电子邮件经过验证
Xianjun YangUCSB在 ucsb.edu 的电子邮件经过验证
Yicheng ZouShanghai AI Laboratory在 pjlab.org.cn 的电子邮件经过验证
Dahua LinThe Chinese University of Hong Kong在 ie.cuhk.edu.hk 的电子邮件经过验证
Rui ZhengFudan University在 fudan.edu.cn 的电子邮件经过验证
Hang YanComputer Science, Fudan University在 fudan.edu.cn 的电子邮件经过验证
Jie ZhouEast China Normal University在 cs.ecnu.edu.cn 的电子邮件经过验证
Xipeng Qiu（邱锡鹏）Professor of Computer Science, Fudan University在 fudan.edu.cn 的电子邮件经过验证

关注

Xiao Wang

Fudan University

在 fudan.edu.cn 的电子邮件经过验证 - 首页

LLM Alignment Security of LLMs


标题按引用次数排序按年份排序按标题排序	引用次数引用次数	年份
The rise and potential of large language model based agents: A survey Z Xi, W Chen, X Guo, W He, Y Ding, B Hong, M Zhang, J Wang, S Jin, ... arXiv preprint arXiv:2309.07864, 2023	248	2023
Textflint: Unified multilingual robustness evaluation toolkit for natural language processing X Wang, Q Liu, T Gui, Q Zhang, Y Zou, X Zhou, J Ye, Y Zhang, R Zheng, ... Proceedings of the 59th Annual Meeting of the Association for Computational …, 2021	111*	2021
Shadow alignment: The ease of subverting safely-aligned language models X Yang, X Wang, Q Zhang, L Petzold, WY Wang, X Zhao, D Lin arXiv preprint arXiv:2310.02949, 2023	40	2023
MINER: Improving out-of-vocabulary named entity recognition from an information theoretic perspective X Wang, S Dou, L Xiong, Y Zou, Q Zhang, T Gui, L Qiao, Z Cheng, ... Proceedings of the 60th Annual Meeting of the Association for Computational …, 2022	24	2022
Orthogonal Subspace Learning for Language Model Continual Learning X Wang, T Chen, Q Ge, H Xia, R Bao, R Zheng, Q Zhang, T Gui, X Huang EMNLP 2023 findings, 2023	12	2023
InstructUIE: multi-task instruction tuning for unified information extraction X Wang, W Zhou, C Zu, H Xia, T Chen, Y Zhang, R Zheng, J Ye, Q Zhang, ... arXiv preprint arXiv:2304.08085, 2023	12	2023
Secrets of rlhf in large language models part ii: Reward modeling B Wang, R Zheng, L Chen, Y Liu, S Dou, C Huang, W Shen, S Jin, E Zhou, ... arXiv preprint arXiv:2401.06080, 2024	10	2024
LoRAMoE: Revolutionizing mixture of experts for maintaining world knowledge in language model alignment S Dou, E Zhou, Y Liu, S Gao, J Zhao, W Shen, Y Zhou, Z Xi, X Wang, ... arXiv preprint arXiv:2312.09979, 2023	8	2023
Farewell to Aimless Large-scale Pretraining: Influential Subset Selection for Language Model X Wang, W Zhou, Q Zhang, J Zhou, S Gao, J Wang, M Zhang, X Gao, ... ACL 2023 findings, 2023	4	2023
Navigating the OverKill in Large Language Models C Shi, X Wang, Q Ge, S Gao, X Yang, T Gui, Q Zhang, X Huang, X Zhao, ... arXiv preprint arXiv:2401.17633, 2024	2	2024
Improving generalization of alignment with human preferences through group invariant learning R Zheng, W Shen, Y Hua, W Lai, S Dou, Y Zhou, Z Xi, X Wang, H Huang, ... ICLR 2024, 2023	2	2023
DSRM: Boost Textual Adversarial Training with Distribution Shift Risk Minimization S Gao, S Dou, Y Liu, X Wang, Q Zhang, Z Wei, J Ma, Y Shan ACL 2023, 2023	2	2023
CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models H Lv, X Wang, Y Zhang, C Huang, S Dou, J Ye, T Gui, Q Zhang, ... arXiv preprint arXiv:2402.16717, 2024	1	2024
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning Z Xi, W Chen, B Hong, S Jin, R Zheng, W He, Y Ding, S Liu, X Guo, ... ICML 2024, 2024	1	2024
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback S Dou, Y Liu, H Jia, L Xiong, E Zhou, J Shan, C Huang, X Wang, W Shen, ... arXiv preprint arXiv:2402.01391, 2024	1	2024
Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback S Gao, Q Ge, W Shen, S Dou, J Ye, X Wang, R Zheng, Y Zou, Z Chen, ... ICML 2024, 2024	1	2024
TRACE: A comprehensive benchmark for continual learning in large language models X Wang, Y Zhang, T Chen, S Gao, S Jin, X Yang, Z Xi, R Zheng, Y Zou, ... arXiv preprint arXiv:2310.06762, 2023	1	2023
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models W Zhou, X Wang, L Xiong, H Xia, Y Gu, M Chai, F Zhu, C Huang, S Dou, ... arXiv preprint arXiv:2403.12171, 2024		2024
RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions Y Zhang, X Wang, Z Xi, H Xia, T Gui, Q Zhang, X Huang COLING 2024, 2024		2024
A Confidence-based Partial Label Learning Model for Crowd-Annotated Named Entity Recognition L Xiong, J Zhou, Q Zhu, X Wang, Y Wu, Q Zhang, T Gui, X Huang, J Ma, ... ACL 2023 findings, 2023		2023

系统目前无法执行此操作，请稍后再试。

文章 1–20

每年引用数

重复的引用

合并的引用

添加合著者合著作者

关注

引用次数

合著作者