Multi-task collaborative network for joint referring expression comprehension and segmentation G Luo, Y Zhou, X Sun, L Cao, C Wu, C Deng, R Ji Proceedings of the IEEE/CVF Conference on computer vision and pattern ¡K, 2020 | 135 | 2020 |
Improving image captioning by leveraging intra-and inter-layer global representation in transformer network J Ji, Y Luo, X Sun, F Chen, G Luo, Y Wu, Y Gao, R Ji Proceedings of the AAAI conference on artificial intelligence 35 (2), 1655-1663, 2021 | 94 | 2021 |
Cascade grouped attention network for referring expression segmentation G Luo, Y Zhou, R Ji, X Sun, J Su, CW Lin, Q Tian Proceedings of the 28th ACM International Conference on Multimedia, 1274-1282, 2020 | 54 | 2020 |
A real-time global inference network for one-stage referring expression comprehension Y Zhou, R Ji, G Luo, X Sun, J Su, X Ding, CW Lin, Q Tian IEEE Transactions on Neural Networks and Learning Systems, 2021 | 25 | 2021 |
Seqtr: A simple yet universal network for visual grounding C Zhu, Y Zhou, Y Shen, G Luo, X Pan, M Lin, C Chen, L Cao, X Sun, R Ji Computer Vision¡VECCV 2022: 17th European Conference, Tel Aviv, Israel ¡K, 2022 | 22 | 2022 |
Active teacher for semi-supervised object detection P Mi, J Lin, Y Zhou, Y Shen, G Luo, X Sun, L Cao, R Fu, Q Xu, R Ji Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern ¡K, 2022 | 12 | 2022 |
Towards lightweight transformer via group-wise transformation for vision-and-language tasks G Luo, Y Zhou, X Sun, Y Wang, L Cao, Y Wu, F Huang, R Ji IEEE Transactions on Image Processing 31, 3386-3398, 2022 | 11 | 2022 |
K-armed bandit based multi-modal network architecture search for visual question answering Y Zhou, R Ji, X Sun, G Luo, X Hong, J Su, X Ding, L Shao Proceedings of the 28th ACM international conference on multimedia, 1245-1254, 2020 | 9 | 2020 |
What Goes beyond Multi-modal Fusion in One-stage Referring Expression Comprehension: An Empirical Study G Luo, Y Zhou, J Sun, S Huang, X Sun, Q Ye, Y Wu, R Ji arXiv preprint arXiv:2204.07913, 2022 | 4 | 2022 |
Towards language-guided visual recognition via dynamic convolutions G Luo, Y Zhou, X Sun, X Ding, Y Wu, F Huang, Y Gao, R Ji arXiv preprint arXiv:2110.08797, 2021 | 4 | 2021 |
Towards efficient visual adaption via structural re-parameterization G Luo, M Huang, Y Zhou, X Sun, G Jiang, Z Wang, R Ji arXiv preprint arXiv:2302.08106, 2023 | 3 | 2023 |
Multi-Branch Distance-Sensitive Self-Attention Network for Image Captioning J Ji, X Huang, X Sun, Y Zhou, G Luo, L Cao, J Liu, L Shao, R Ji IEEE Transactions on Multimedia, 2022 | 3 | 2022 |
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models G Luo, Y Zhou, T Ren, S Chen, X Sun, R Ji arXiv preprint arXiv:2305.15023, 2023 | | 2023 |
Towards End-to-end Semi-supervised Learning for One-stage Object Detection G Luo, Y Zhou, L Jin, X Sun, R Ji arXiv preprint arXiv:2302.11299, 2023 | | 2023 |
RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension J Sun, G Luo, Y Zhou, X Sun, G Jiang, Z Wang, R Ji Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern ¡K, 2023 | | 2023 |
RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension L Jin, G Luo, Y Zhou, X Sun, G Jiang, A Shu, R Ji Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern ¡K, 2023 | | 2023 |