MMDetection: Open mmlab detection toolbox and benchmark K Chen, J Wang, J Pang, Y Cao, Y Xiong, X Li, S Sun, W Feng, Z Liu, J Xu, ... arXiv preprint arXiv:1906.07155, 2019 | 3503 | 2019 |
Seesaw loss for long-tailed instance segmentation J Wang, W Zhang, Y Zang, Y Cao, J Pang, T Gong, K Chen, Z Liu, CC Loy, ... Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021 | 307 | 2021 |
Prime sample attention in object detection Y Cao, K Chen, CC Loy, D Lin Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020 | 277 | 2020 |
Internlm-xcomposer2: Mastering free-form text-image composition and comprehension in vision-language large model X Dong, P Zhang, Y Zang, Y Cao, B Wang, L Ouyang, X Wei, S Zhang, ... arXiv preprint arXiv:2401.16420, 2024 | 216 | 2024 |
Internlm-xcomposer: A vision-language large model for advanced text-image comprehension and composition P Zhang, X Dong, B Wang, Y Cao, C Xu, L Ouyang, Z Zhao, H Duan, ... arXiv preprint arXiv:2309.15112, 2023 | 188 | 2023 |
Side-aware boundary localization for more precise object detection J Wang, W Zhang, Y Cao, K Chen, J Pang, T Gong, J Shi, CC Loy, D Lin Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23 …, 2020 | 181 | 2020 |
MMDetection: open mmlab detection toolbox and benchmark (2019) K Chen, J Wang, J Pang, Y Cao, Y Xiong, X Li, S Sun, W Feng, Z Liu, J Xu, ... arXiv preprint arXiv:1906.07155, 1906 | 139 | 1906 |
Few-shot object detection via association and discrimination Y Cao, J Wang, Y Jin, T Wu, K Chen, Z Liu, D Lin Advances in neural information processing systems 34, 16570-16581, 2021 | 110 | 2021 |
Internlm-xcomposer2-4khd: A pioneering large vision-language model handling resolutions from 336 pixels to 4k hd X Dong, P Zhang, Y Zang, Y Cao, B Wang, L Ouyang, S Zhang, H Duan, ... arXiv preprint arXiv:2404.06512, 2024 | 109 | 2024 |
Internlm-xcomposer-2.5: A versatile large vision language model supporting long-contextual input and output P Zhang, X Dong, Y Zang, Y Cao, R Qian, L Chen, Q Guo, H Duan, ... arXiv preprint arXiv:2407.03320, 2024 | 69 | 2024 |
Feature pyramid grids K Chen, Y Cao, CC Loy, D Lin, C Feichtenhofer arXiv preprint arXiv:2004.03580, 2020 | 66 | 2020 |
V3det: Vast vocabulary visual detection dataset J Wang, P Zhang, T Chu, Y Cao, Y Zhou, T Wu, B Wang, C He, D Lin Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 56 | 2023 |
Wssod: A new pipeline for weakly-and semi-supervised object detection S Fang, Y Cao, X Wang, K Chen, D Lin, W Zhang arXiv preprint arXiv:2105.11293, 2021 | 13 | 2021 |
Mini: Mining implicit novel instances for few-shot object detection Y Cao, J Wang, Y Lin, D Lin arXiv preprint arXiv:2205.03381, 2022 | 11 | 2022 |
DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models Y Cao, P Zhang, X Dong, D Lin, J Wang arXiv preprint arXiv:2402.14767, 2024 | 10 | 2024 |
YuXiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, et al. Mmdetection: Open mmlab detectiontoolbox and benchmark K Chen, J Wang, J Pang, Y Cao arXiv preprint arXiv:1906.07155 6, 2019 | 10 | 2019 |
Pyramiddrop: Accelerating your large vision-language models via pyramid visual redundancy reduction L Xing, Q Huang, X Dong, J Lu, P Zhang, Y Zang, Y Cao, C He, J Wang, ... arXiv preprint arXiv:2410.17247, 2024 | 9 | 2024 |
Mia-dpo: Multi-image augmented direct preference optimization for large vision-language models Z Liu, Y Zang, X Dong, P Zhang, Y Cao, H Duan, C He, Y Xiong, D Lin, ... arXiv preprint arXiv:2410.17637, 2024 | 5 | 2024 |
Internlm-xcomposer2. 5-omnilive: A comprehensive multimodal system for long-term streaming video and audio interactions P Zhang, X Dong, Y Cao, Y Zang, R Qian, X Wei, L Chen, Y Li, J Niu, ... arXiv preprint arXiv:2412.09596, 2024 | 3 | 2024 |
Sam2long: Enhancing sam 2 for long video segmentation with a training-free memory tree S Ding, R Qian, X Dong, P Zhang, Y Zang, Y Cao, Y Guo, D Lin, J Wang arXiv preprint arXiv:2410.16268, 2024 | 3 | 2024 |