Flamingo: a visual language model for few-shot learning JB Alayrac, J Donahue, P Luc, A Miech, I Barr, Y Hasson, K Lenc, ... Advances in neural information processing systems 35, 23716-23736, 2022 | 3313 | 2022 |
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, JB Alayrac, J Yu, R Soricut, J Schalkwyk, ... arXiv preprint arXiv:2312.11805, 2023 | 2192 | 2023 |
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips A Miech, D Zhukov, JB Alayrac, M Tapaswi, I Laptev, J Sivic Proceedings of the IEEE International Conference on Computer Vision, 2630-2640, 2019 | 1254 | 2019 |
End-to-end learning of visual representations from uncurated instructional videos A Miech, JB Alayrac, L Smaira, I Laptev, J Sivic, A Zisserman Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020 | 810 | 2020 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ... arXiv preprint arXiv:2403.05530, 2024 | 689 | 2024 |
Learnable pooling with context gating for video classification A Miech, I Laptev, J Sivic arXiv preprint arXiv:1706.06905, 2017 | 403 | 2017 |
Just ask: Learning to answer questions from millions of narrated videos A Yang, A Miech, J Sivic, I Laptev, C Schmid Proceedings of the IEEE/CVF international conference on computer vision …, 2021 | 300 | 2021 |
Learning a text-video embedding from incomplete and heterogeneous data A Miech, I Laptev, J Sivic arXiv preprint arXiv:1804.02516, 2018 | 261 | 2018 |
Zero-shot video question answering via frozen bidirectional language models A Yang, A Miech, J Sivic, I Laptev, C Schmid Advances in Neural Information Processing Systems 35, 124-141, 2022 | 212 | 2022 |
Vid2seq: Large-scale pretraining of a visual language model for dense video captioning A Yang, A Nagrani, PH Seo, A Miech, J Pont-Tuset, I Laptev, J Sivic, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 208 | 2023 |
Thinking fast and slow: Efficient text-to-visual retrieval with transformers A Miech, JB Alayrac, I Laptev, J Sivic, A Zisserman Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021 | 150 | 2021 |
Tubedetr: Spatio-temporal video grounding with transformers A Yang, A Miech, J Sivic, I Laptev, C Schmid Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 99 | 2022 |
Leveraging the present to anticipate the future in videos A Miech, I Laptev, J Sivic, H Wang, L Torresani, D Tran Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2019 | 84 | 2019 |
Mikoł aj Binkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, and Karén Simonyan. Flamingo: a visual language model for few-shot learning JB Alayrac, J Donahue, P Luc, A Miech, I Barr, Y Hasson, K Lenc, ... Advances in Neural Information Processing Systems 35, 23716-23736, 2022 | 78 | 2022 |
Perception test: A diagnostic benchmark for multimodal video models V Patraucean, L Smaira, A Gupta, A Recasens, L Markeeva, D Banarse, ... Advances in Neural Information Processing Systems 36, 2024 | 58 | 2024 |
Learning from video and text via large-scale discriminative clustering A Miech, JB Alayrac, P Bojanowski, I Laptev, J Sivic Proceedings of the IEEE international conference on computer vision, 5257-5266, 2017 | 50 | 2017 |
Learning to answer visual questions from web videos A Yang, A Miech, J Sivic, I Laptev, C Schmid arXiv preprint arXiv:2205.05019, 2022 | 39 | 2022 |
Look for the change: Learning object states and state-modifying actions from untrimmed web videos T Souček, JB Alayrac, A Miech, I Laptev, J Sivic Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 31 | 2022 |
Rareact: A video dataset of unusual interactions A Miech, JB Alayrac, I Laptev, J Sivic, A Zisserman arXiv preprint arXiv:2008.01018, 2020 | 27 | 2020 |
Zorro: the masked multimodal transformer A Recasens, J Lin, J Carreira, D Jaegle, L Wang, J Alayrac, P Luc, ... arXiv preprint arXiv:2301.09595, 2023 | 22 | 2023 |