Predicting visual features from text for image and video caption retrieval J Dong, X Li, CGM Snoek IEEE Transactions on Multimedia 20 (12), 3377-3388, 2018 | 299* | 2018 |
Dual encoding for zero-example video retrieval J Dong, X Li, C Xu, S Ji, Y He, G Yang, X Wang Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2019 | 263 | 2019 |
Dual encoding for video retrieval by text J Dong, X Li, C Xu, X Yang, G Yang, X Wang, M Wang IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (8), 4065-4080, 2021 | 143 | 2021 |
W2VV++: Fully Deep Learning for Ad-hoc Video Search X Li, C Xu, G Yang, Z Chen, J Dong Proceedings of the 27th ACM International Conference on Multimedia, 1786-1794, 2019 | 120 | 2019 |
Context-aware biaffine localizing network for temporal sentence grounding D Liu, X Qu, J Dong, P Zhou, Y Cheng, W Wei, Z Xu, Y Xie Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021 | 104 | 2021 |
Exploring human-like attention supervision in visual question answering T Qiao, J Dong, D Xu Proceedings of the AAAI Conference on Artificial Intelligence 32 (1), 2018 | 99 | 2018 |
Jointly cross-and self-modal graph attention network for query-based moment localization D Liu, X Qu, XY Liu, J Dong, P Zhou, Z Xu Proceedings of the 28th ACM International Conference on Multimedia, 4070-4078, 2020 | 97 | 2020 |
Tree-augmented cross-modal encoding for complex-query video retrieval X Yang, J Dong, Y Cao, X Wang, M Wang, TS Chua Proceedings of the 43rd international ACM SIGIR conference on research and …, 2020 | 94 | 2020 |
Early embedding and late reranking for video captioning J Dong, X Li, W Lan, Y Huo, CGM Snoek Proceedings of the 24th ACM international conference on Multimedia, 1082-1086, 2016 | 89 | 2016 |
Adding chinese captions to images X Li, W Lan, J Dong, H Liu Proceedings of the 2016 ACM on international conference on multimedia …, 2016 | 86 | 2016 |
Fluency-guided cross-lingual image captioning W Lan, X Li, J Dong Proceedings of the 25th ACM international conference on Multimedia, 1549-1557, 2017 | 78 | 2017 |
Fine-grained iterative attention network for temporal language localization in videos X Qu, P Tang, Z Zou, Y Cheng, J Dong, P Zhou, Z Xu Proceedings of the 28th ACM International Conference on Multimedia, 4280-4288, 2020 | 63 | 2020 |
Video moment retrieval with cross-modal neural architecture search X Yang, S Wang, J Dong, J Dong, M Wang, TS Chua IEEE Transactions on Image Processing 31, 1204-1216, 2022 | 45 | 2022 |
Fine-grained fashion similarity learning by attribute-specific embedding network Z Ma, J Dong, Z Long, Y Zhang, Y He, H Xue, S Ji Proceedings of the AAAI Conference on artificial intelligence 34 (07), 11741 …, 2020 | 40 | 2020 |
Adaptive proposal generation network for temporal sentence localization in videos D Liu, X Qu, J Dong, P Zhou arXiv preprint arXiv:2109.06398, 2021 | 34 | 2021 |
Which is plagiarism: Fashion image retrieval based on regional representation for design protection Y Lang, Y He, F Yang, J Dong, H Xue Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020 | 33 | 2020 |
Reading-strategy inspired visual representation learning for text-to-video retrieval J Dong, Y Wang, X Chen, X Qu, X Li, Y He, X Wang IEEE transactions on circuits and systems for video technology 32 (8), 5680-5694, 2022 | 27 | 2022 |
University of Amsterdam and Renmin University at TRECVID 2016: Searching Video, Detecting Events and Describing Video CGM Snoek, J Dong, X Li, X Wang, Q Wei, W Lan, E Gavves, N Hussein, ... TRECVID 2016 Workshop, 2016 | 27 | 2016 |
Reasoning step-by-step: Temporal sentence localization in videos via deep rectification-modulation network D Liu, X Qu, J Dong, P Zhou Proceedings of the 28th International Conference on Computational …, 2020 | 26 | 2020 |
Fine-grained fashion similarity prediction by attribute-specific embedding learning J Dong, Z Ma, X Mao, X Yang, Y He, R Hong, S Ji IEEE Transactions on Image Processing 30, 8410-8425, 2021 | 25 | 2021 |