关注
Kaivalya Hariharan
Kaivalya Hariharan
MIT CSAIL
在 mit.edu 的电子邮件经过验证
标题
引用次数
引用次数
年份
Red teaming deep neural networks with feature synthesis tools
S Casper, T Bu, Y Li, J Li, K Zhang, K Hariharan, D Hadfield-Menell
Advances in Neural Information Processing Systems 36, 80470-80516, 2023
102023
Diagnostics for deep neural networks with automated copy/paste attacks
S Casper, K Hariharan, D Hadfield-Menell
arXiv preprint arXiv:2211.10024, 2022
92022
Forbidden Facts: An Investigation of Competing Objectives in Llama-2
TT Wang, M Wang, K Hariharan, N Shavit
arXiv preprint arXiv:2312.08793, 2023
2023
系统目前无法执行此操作,请稍后再试。
文章 1–3