A fast and accurate one-stage approach to visual grounding Z Yang, B Gong, L Wang, W Huang, D Yu, J Luo IEEE International Conference on Computer Vision (ICCV), 4683-4693, 2019 | 201 | 2019 |
Action recognition with spatio–temporal visual attention on skeleton image sequences Z Yang, Y Li, J Yang, J Luo IEEE Transactions on Circuits and Systems for Video Technology 29 (8), 2405-2415, 2018 | 156 | 2018 |
End-to-end multi-modal multi-task vehicle control for self-driving cars with visual perceptions Z Yang, Y Zhang, J Yu, J Cai, J Luo 2018 24th International Conference on Pattern Recognition (ICPR), 2289-2294, 2018 | 155 | 2018 |
Attentive relational networks for mapping images to scene graphs M Qi, W Li, Z Yang, Y Wang, J Luo IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3957-3966, 2019 | 137 | 2019 |
TransVG: End-to-End Visual Grounding with Transformers J Deng, Z Yang, T Chen, W Zhou, H Li IEEE International Conference on Computer Vision (ICCV), 2021 | 117 | 2021 |
Improving One-stage Visual Grounding by Recursive Sub-query Construction Z Yang, T Chen, L Wang, J Luo European Conference on Computer Vision (ECCV), 2020 | 114 | 2020 |
Scaling up vision-language pre-training for image captioning X Hu, Z Gan, J Wang, Z Yang, Z Liu, Y Lu, L Wang Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 100 | 2022 |
An empirical study of gpt-3 for few-shot knowledge-based vqa Z Yang, Z Gan, J Wang, X Hu, Y Lu, Z Liu, L Wang Proceedings of the AAAI Conference on Artificial Intelligence 36 (3), 3081-3089, 2022 | 99 | 2022 |
Git: A generative image-to-text transformer for vision and language J Wang, Z Yang, X Hu, L Li, K Lin, Z Gan, Z Liu, C Liu, L Wang Transactions on Machine Learning Research (TMLR), 2022 | 82 | 2022 |
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation Y Yin, F Meng, J Su, C Zhou, Z Yang, J Zhou, J Luo Annual Meeting of the Association for Computational Linguistics (ACL), 2020 | 82 | 2020 |
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption Z Yang, Y Lu, J Wang, X Yin, D Florencio, L Wang, C Zhang, L Zhang, ... IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021 | 81 | 2021 |
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling Z Yang, Z Gan, J Wang, X Hu, F Ahmed, Z Liu, Y Lu, L Wang European Conference on Computer Vision (ECCV), 521--539, 2022 | 42* | 2022 |
Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation L Wang, J Huang, Y Li, K Xu, Z Yang, D Yu IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021 | 42 | 2021 |
Dynamic context-guided capsule network for multimodal machine translation H Lin, F Meng, J Su, Y Yin, Z Yang, Y Ge, J Zhou, J Luo Proceedings of the 28th ACM International Conference on Multimedia, 1320-1329, 2020 | 41 | 2020 |
Ufo: A unified transformer for vision-language representation learning J Wang, X Hu, Z Gan, Z Yang, X Dai, Z Liu, Y Lu, L Wang arXiv preprint arXiv:2111.10023, 2021 | 36 | 2021 |
SAT: 2D Semantics Assisted Training for 3D Visual Grounding Z Yang, S Zhang, L Wang, J Luo IEEE International Conference on Computer Vision (ICCV), 2021 | 30 | 2021 |
Mm-react: Prompting chatgpt for multimodal reasoning and action Z Yang, L Li, J Wang, K Lin, E Azarnasab, F Ahmed, Z Liu, C Liu, M Zeng, ... arXiv preprint arXiv:2303.11381, 2023 | 25 | 2023 |
Prompting gpt-3 to be reliable C Si, Z Gan, Z Yang, S Wang, J Wang, J Boyd-Graber, L Wang International Conference on Learning Representations (ICLR 23), 2022 | 24 | 2022 |
Grounding-tracking-integration Z Yang, T Kumar, T Chen, J Su, J Luo IEEE Transactions on Circuits and Systems for Video Technology 31 (9), 3433-3443, 2020 | 18 | 2020 |
Human-centered emotion recognition in animated gifs Z Yang, Y Zhang, J Luo 2019 IEEE International Conference on Multimedia and Expo (ICME), 1090-1095, 2019 | 17 | 2019 |