Tomasz Korbak

引用次数

	总计	2019 年至今
引用	783	778
h 指数	14	14
i10 指数	16	15

400

200

100

300

20182019202020212022202320242 3 10 23 30 381 327

开放获取的出版物数量

查看全部

3 篇文章

0 篇文章

可查看的文章

无法查看的文章

根据资助方的强制性开放获取政策

合著作者

Ethan PerezAnthropic; New York University在 anthropic.com 的电子邮件经过验证
Marc DymetmanIndependent Researcher (Prev. Principal Scientist, NAVER Labs Europe)在 naverlabs.com 的电子邮件经过验证
Germán KruszewskiSenior Scientist @ Naver Labs Europe; MSCA Postdoctoral Researcher @ UPF在 naverlabs.com 的电子邮件经过验证
Samuel R. BowmanNYU and Anthropic在 nyu.edu 的电子邮件经过验证
Hady ElsaharResearch Scientist at Meta AI在 meta.com 的电子邮件经过验证
Kyunghyun ChoNew York University, Genentech在 nyu.edu 的电子邮件经过验证
Joanna Rączaszek-LeonardiProfessor, University of Warsaw在 psych.uw.edu.pl 的电子邮件经过验证
Jason PhangNew York University在 nyu.edu 的电子邮件经过验证
Owain EvansResearch Associate, University of Oxford在 philosophy.ox.ac.uk 的电子邮件经过验证
Anil SethSussex University在 sussex.ac.uk 的电子邮件经过验证
David Scott KruegerUniversity Assistant Professor, University of Cambridge在 cam.ac.uk 的电子邮件经过验证

关注

Tomasz Korbak

Anthropic

在 anthropic.com 的电子邮件经过验证 - 首页

language models AI alignment reinforcement learning Bayesian inference ML safety


标题按引用次数排序按年份排序按标题排序	引用次数引用次数	年份
Open problems and fundamental limitations of reinforcement learning from human feedback S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ... arXiv preprint arXiv:2307.15217, 2023	166	2023
Pretraining language models with human preferences T Korbak, K Shi, A Chen, RV Bhalerao, C Buckley, J Phang, SR Bowman, ... International Conference on Machine Learning, 17506-17533, 2023	83	2023
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A" L Berglund, M Tong, M Kaufmann, M Balesni, AC Stickland, T Korbak, ... arXiv preprint arXiv:2309.12288, 2023	82*	2023
Inverse Scaling: When Bigger Isn't Better IR McKenzie, A Lyzhov, M Pieler, A Parrish, A Mueller, A Prabhu, ... arXiv preprint arXiv:2306.09479, 2023	69*	2023
Training language models with language feedback at scale J Scheurer, JA Campos, T Korbak, JS Chan, A Chen, K Cho, E Perez arXiv preprint arXiv:2303.16755, 2023	63	2023
Towards understanding sycophancy in language models M Sharma, M Tong, T Korbak, D Duvenaud, A Askell, SR Bowman, ... arXiv preprint arXiv:2310.13548, 2023	43	2023
Improving code generation by training with natural language feedback A Chen, J Scheurer, T Korbak, JA Campos, JS Chan, SR Bowman, K Cho, ... arXiv preprint arXiv:2303.16749, 2023	36	2023
Aligning language models with preferences through f-divergence minimization D Go, T Korbak, G Kruszewski, J Rozen, N Ryu, M Dymetman arXiv preprint arXiv:2302.08215, 2023	30	2023
Computational enactivism under the free energy principle T Korbak Synthese 198 (3), 2743-2763, 2021	30	2021
RL with KL penalties is better viewed as Bayesian inference T Korbak, E Perez, CL Buckley arXiv preprint arXiv:2205.11275, 2022	29*	2022
On reinforcement learning and distribution matching for fine-tuning language models with no catastrophic forgetting T Korbak, H Elsahar, G Kruszewski, M Dymetman Advances in Neural Information Processing Systems 35, 16203-16220, 2022	23	2022
Taken out of context: On measuring situational awareness in LLMs L Berglund, AC Stickland, M Balesni, M Kaufmann, M Tong, T Korbak, ... arXiv preprint arXiv:2309.00667, 2023	20*	2023
Controlling conditional language models without catastrophic forgetting T Korbak, H Elsahar, G Kruszewski, M Dymetman International Conference on Machine Learning, 11499-11528, 2022	19	2022
Interaction history as a source of compositionality in emergent communication T Korbak, J Zubek, Ł Kuciński, P Miłoś, J Rączaszek-Leonardi Interaction Studies 22 (2), 212-243, 2021	17*	2021
Catalytic role of noise and necessity of inductive biases in the emergence of compositional communication Ł Kuciński, T Korbak, P Kołodziej, P Miłoś Advances in neural information processing systems 34, 23075-23088, 2021	14	2021
Scaffolded minds and the evolution of content in signaling pathways T Korbak Studies in Logic, Grammar and Rhetoric 41 (1), 89-103, 2015	11	2015
Measuring non-trivial compositionality in emergent communication T Korbak, J Zubek, J Rączaszek-Leonardi arXiv preprint arXiv:2010.15058, 2020	8	2020
Energy-based models for code generation under compilability constraints T Korbak, H Elsahar, M Dymetman, G Kruszewski arXiv preprint arXiv:2106.04985, 2021	7	2021
Exploiting unsupervised pre-training and automated feature engineering for low-resource hate speech detection in polish R Korzeniowski, R Rolczyński, P Sadownik, T Korbak, M Możejko arXiv preprint arXiv:1906.09325, 2019	7	2019
Fine-tuning Tree-LSTM for phrase-level sentiment classification on a polish dependency treebank T Korbak, P Żak Language and Technology Conference, 31-42, 2017	4	2017

系统目前无法执行此操作，请稍后再试。

文章 1–20

每年引用数

重复的引用

合并的引用

添加合著者合著作者

关注

引用次数

合著作者