关注
Jiaxin Wen
Jiaxin Wen
在 mails.tsinghua.edu.cn 的电子邮件经过验证 - 首页
标题
引用次数
引用次数
年份
Unveiling the implicit toxicity in large language models
J Wen, P Ke, H Sun, Z Zhang, C Li, J Bai, M Huang
arXiv preprint arXiv:2311.17391, 2023
542023
Robustness testing of language understanding in task-oriented dialog
J Liu, R Takanobu, J Wen, D Wan, H Li, W Nie, C Li, W Peng, M Huang
arXiv preprint arXiv:2012.15262, 2020
532020
A chatbot for mental health support: exploring the impact of Emohaa on reducing mental distress in China
S Sabour, W Zhang, X Xiao, Y Zhang, Y Zheng, J Wen, J Zhao, M Huang
Frontiers in digital health 5, 1133987, 2023
472023
Eva2. 0: Investigating open-domain chinese dialogue systems with large-scale pre-training
Y Gu, J Wen, H Sun, Y Song, P Ke, C Zheng, Z Zhang, J Yao, L Liu, X Zhu, ...
Machine Intelligence Research 20 (2), 207-219, 2023
452023
Augesc: Dialogue augmentation with large language models for emotional support conversation
C Zheng, S Sabour, J Wen, Z Zhang, M Huang
arXiv preprint arXiv:2202.13047, 2022
442022
Ethicist: Targeted training data extraction through loss smoothed soft prompting and calibrated confidence estimation
Z Zhang, J Wen, M Huang
arXiv preprint arXiv:2307.04401, 2023
272023
Persona-Guided Planning for Controlling the Protagonist's Persona in Story Generation
Z Zhang, J Wen, J Guan, M Huang
arXiv preprint arXiv:2204.10703, 2022
232022
Augesc: Large-scale data augmentation for emotional support conversation with pre-trained language models
C Zheng, S Sabour, J Wen, M Huang
arXiv preprint arXiv:2202.13047, 2022
182022
Language models learn to mislead humans via rlhf
J Wen, R Zhong, A Khan, E Perez, J Steinhardt, M Huang, SR Bowman, ...
arXiv preprint arXiv:2409.12822, 2024
122024
Autocad: Automatically generating counterfactuals for mitigating shortcut learning
J Wen, Y Zhu, J Zhang, J Zhou, M Huang
arXiv preprint arXiv:2211.16202, 2022
122022
Learning Task Decomposition to Assist Humans in Competitive Programming
J Wen, R Zhong, P Ke, Z Shao, H Wang, M Huang
arXiv preprint arXiv:2406.04604, 2024
42024
Robustness testing of language understanding in dialog systems
J Liu, R Takanobu, J Wen, D Wan, W Nie, H Li, C Li, W Peng, M Huang
CoRR, abs, 2012
32012
Codeplan: Unlocking reasoning potential in large langauge models by scaling code-form planning
J Wen, J Guan, H Wang, W Wu, M Huang
CoRR, 2024
22024
Adaptivebackdoor: Backdoored language model agents that detect human overseers
H Wang, R Zhong, J Wen, J Steinhardt
ICML 2024 Next Generation of AI Safety Workshop, 0
2
Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
J Wen, V Hebbar, C Larson, A Bhatt, A Radhakrishnan, M Sharma, ...
arXiv preprint arXiv:2411.17693, 2024
2024
Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning
J Wen, J Guan, H Wang, W Wu, M Huang
arXiv preprint arXiv:2409.12452, 2024
2024
Re3Dial: Retrieve, Reorganize and Rescale Conversations for Long-Turn Open-Domain Dialogue Pre-training
J Wen, H Zhou, J Guan, J Zhou, M Huang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language …, 2023
2023
SmartBackdoor: Malicious Language Model Agents that Avoid Being Caught
H Wang, R Zhong, J Wen, J Steinhardt
系统目前无法执行此操作,请稍后再试。
文章 1–18