Multi-interaction network with object relation for video question answering W Jin, Z Zhao, M Gu, J Yu, J Xiao, Y Zhuang Proceedings of the 27th ACM international conference on multimedia, 1193-1201, 2019 | 71 | 2019 |
Gloss attention for gloss-free sign language translation A Yin, T Zhong, L Tang, W Jin, T Jin, Z Zhao Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023 | 51 | 2023 |
Mlslt: Towards multilingual sign language translation A Yin, Z Zhao, W Jin, M Zhang, X Zeng, X He Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022 | 49 | 2022 |
Hierarchical cross-modal graph consistency learning for video-text retrieval W Jin, Z Zhao, P Zhang, J Zhu, X He, Y Zhuang Proceedings of the 44th International ACM SIGIR Conference on Research and …, 2021 | 46 | 2021 |
Graph-based multi-interaction network for video question answering M Gu, Z Zhao, W Jin, R Hong, F Wu IEEE Transactions on Image Processing 30, 2758-2770, 2021 | 42 | 2021 |
Simulslt: End-to-end simultaneous sign language translation A Yin, Z Zhao, J Liu, W Jin, M Zhang, X Zeng, X He Proceedings of the 29th ACM International Conference on Multimedia, 4118-4127, 2021 | 34 | 2021 |
Adaptive spatio-temporal graph enhanced vision-language representation for video QA W Jin, Z Zhao, X Cao, J Zhu, X He, Y Zhuang IEEE Transactions on Image Processing 30, 5477-5489, 2021 | 32 | 2021 |
Video question answering via knowledge-based progressive spatial-temporal attention network W Jin, Z Zhao, Y Li, J Li, J Xiao, Y Zhuang ACM Transactions on Multimedia Computing, Communications, and Applications …, 2019 | 20 | 2019 |
VLAD-VSA: cross-domain face presentation attack detection with vocabulary separation and adaptation J Wang, Z Zhao, W Jin, X Duan, Z Lei, B Huai, Y Wu, X He Proceedings of the 29th ACM International Conference on Multimedia, 1497-1506, 2021 | 16 | 2021 |
Multi-turn video question generation via reinforced multi-choice attention network Z Guo, Z Zhao, W Jin, Z Wei, M Yang, N Wang, NJ Yuan IEEE Transactions on Circuits and Systems for Video Technology 31 (5), 1697-1710, 2020 | 14 | 2020 |
Taohighlight: Commodity-aware multi-modal video highlight detection in e-commerce Z Guo, Z Zhao, W Jin, D Wang, R Liu, J Yu IEEE Transactions on Multimedia 24, 2606-2616, 2021 | 11 | 2021 |
Video dialog via multi-grained convolutional self-attention context multi-modal networks M Gu, Z Zhao, W Jin, D Cai, F Wu IEEE Transactions on Circuits and Systems for Video Technology 30 (12), 4453 …, 2019 | 11 | 2019 |
Video dialog via multi-grained convolutional self-attention context networks W Jin, Z Zhao, M Gu, J Yu, J Xiao, Y Zhuang Proceedings of the 42nd International ACM SIGIR Conference on Research and …, 2019 | 7 | 2019 |
Video dialog via progressive inference and cross-transformer W Jin, Z Zhao, M Gu, J Xiao, F Wei, Y Zhuang Proceedings of the 2019 Conference on Empirical Methods in Natural Language …, 2019 | 4 | 2019 |
Frame-Subtitle Self-Supervision for Multi-Modal Video Question Answering J Wang, Z Zhao, W Jin arXiv preprint arXiv:2209.03609, 2022 | | 2022 |