A broad-coverage challenge corpus for sentence understanding through inference A Williams, N Nangia, SR Bowman arXiv preprint arXiv:1704.05426, 2017 | 2856 | 2017 |
Superglue: A stickier benchmark for general-purpose language understanding systems A Wang, Y Pruksachatkun, N Nangia, A Singh, J Michael, F Hill, O Levy, ... Advances in neural information processing systems 32, 2019 | 1193 | 2019 |
CrowS-pairs: A challenge dataset for measuring social biases in masked language models N Nangia, C Vania, R Bhalerao, SR Bowman arXiv preprint arXiv:2010.00133, 2020 | 173 | 2020 |
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... arXiv preprint arXiv:2206.04615, 2022 | 99 | 2022 |
The repeval 2017 shared task: Multi-genre natural language inference with sentence representations N Nangia, A Williams, A Lazaridou, SR Bowman arXiv preprint arXiv:1707.08172, 2017 | 97 | 2017 |
Human vs. muppet: A conservative estimate of human performance on the GLUE benchmark N Nangia, SR Bowman arXiv preprint arXiv:1905.10425, 2019 | 75 | 2019 |
Listops: A diagnostic dataset for latent tree learning N Nangia, SR Bowman arXiv preprint arXiv:1804.06028, 2018 | 75 | 2018 |
jiant 1.2: A software toolkit for research on general-purpose text understanding models A Wang, IF Tenney, Y Pruksachatkun, K Yu, J Hula, P Xia, R Pappagari, ... Note: http://jiant. info/Cited by: footnote 4, 2019 | 46 | 2019 |
QuALITY: Question Answering with Long Input Texts, Yes! RY Pang, A Parrish, N Joshi, N Nangia, J Phang, A Chen, V Padmakumar, ... arXiv preprint arXiv:2112.08608, 2021 | 18 | 2021 |
BBQ: A hand-built bias benchmark for question answering A Parrish, A Chen, N Nangia, V Padmakumar, J Phang, J Thompson, ... arXiv preprint arXiv:2110.08193, 2021 | 15 | 2021 |
What ingredients make for an effective crowdsourcing protocol for difficult NLU data collection tasks? N Nangia, S Sugawara, H Trivedi, A Warstadt, C Vania, SR Bowman arXiv preprint arXiv:2106.00794, 2021 | 15 | 2021 |
A broad-coverage challenge corpus for sentence understanding through inference. arXiv 2017 A Williams, N Nangia, SR Bowman arXiv preprint arXiv:1704.05426, 0 | 13 | |
The multi-genre nli corpus A Williams, N Nangia, SR Bowman | 11 | 2018 |
Does Putting a Linguist in the Loop Improve NLU Data Collection? A Parrish, W Huang, O Agha, SH Lee, N Nangia, A Warstadt, K Aggarwal, ... arXiv preprint arXiv:2104.07179, 2021 | 8 | 2021 |
Latent structure models for natural language processing AFT Martins, T Mihaylova, N Nangia, V Niculae Proceedings of the 57th Annual Meeting of the Association for Computational …, 2019 | 6 | 2019 |
Single-turn debate does not help humans answer hard reading-comprehension questions A Parrish, H Trivedi, E Perez, A Chen, N Nangia, J Phang, SR Bowman arXiv preprint arXiv:2204.05212, 2022 | 4 | 2022 |
What Makes Reading Comprehension Questions Difficult? S Sugawara, N Nangia, A Warstadt, SR Bowman arXiv preprint arXiv:2203.06342, 2022 | 4 | 2022 |
What do NLP researchers believe? Results of the NLP community metasurvey J Michael, A Holtzman, A Parrish, A Mueller, A Wang, A Chen, D Madaan, ... arXiv preprint arXiv:2208.12852, 2022 | 3 | 2022 |
Crowdsourcing Beyond Annotation: Case Studies in Benchmark Data Collection A Suhr, C Vania, N Nangia, M Sap, M Yatskar, S Bowman, Y Artzi Proceedings of the 2021 Conference on Empirical Methods in Natural Language …, 2021 | 3 | 2021 |
Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension Questions A Parrish, H Trivedi, N Nangia, V Padmakumar, J Phang, AS Saimbhi, ... arXiv preprint arXiv:2210.10860, 2022 | 2 | 2022 |