VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training Z Tong, Y Song, J Wang, L Wang 36th Conference on Neural Information Processing Systems (NeurIPS), 2022 | 1037 | 2022 |
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition S Chen, C Ge, Z Tong, J Wang, Y Song, J Wang, P Luo 36th Conference on Neural Information Processing Systems (NeurIPS), 2022 | 550 | 2022 |
TDN: Temporal Difference Networks for Efficient Action Recognition L Wang, Z Tong, B Ji, G Wu IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1895-1904, 2021 | 482 | 2021 |
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking L Wang, B Huang, Z Zhao, Z Tong, Y He, Y Wang, Y Wang, Y Qiao IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023 | 332 | 2023 |
Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations Y Liang, C Ge, Z Tong, Y Song, J Wang, P Xie International Conference on Learning Representations (ICLR), 2022 | 312 | 2022 |
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark MMA Contributors https://github.com/open-mmlab/mmaction2, 2020 | 202 | 2020 |
MGSampler: An Explainable Sampling Strategy for Video Action Recognition Y Zhi, Z Tong, L Wang, G Wu IEEE/CVF International Conference on Computer Vision (ICCV), 1513-1522, 2021 | 78 | 2021 |
Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning C Ge, J Wang, Z Tong, S Chen, Y Song, P Luo International Conference on Learning Representations (ICLR), 2023 | 30 | 2023 |
Advancing Vision Transformers with Group-Mix Attention C Ge, X Ding, Z Tong, L Yuan, J Wang, Y Song, P Luo arXiv preprint arXiv:2311.15157, 2023 | 16 | 2023 |
Efficient Video Action Detection with Token Dropout and Context Refinement L Chen, Z Tong, Y Song, G Wu, L Wang IEEE/CVF International Conference on Computer Vision (ICCV), 2023 | 16 | 2023 |
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale Z Zeng, Z Tong, X Liu, B Chen, ST Xia, Y Ge arXiv preprint arXiv:2305.14173, 2023 | 8 | 2023 |
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens Z Gao, Z Tong, L Wang, MZ Shou International Conference on Learning Representations (ICLR), 2024 | 7 | 2024 |
CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection L Chen, Z Tong, Y Song, G Wu, L Wang arXiv preprint arXiv:2303.16118, 2023 | 5 | 2023 |
Contextual AD Narration with Interleaved Multimodal Sequence H Wang, Z Tong, K Zheng, Y Shen, L Wang arXiv preprint arXiv:2403.12922, 2024 | 3 | 2024 |
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification Q Liu, K Zheng, W Wu, Z Tong, Y Liu, W Chen, Z Wang, Y Shen arXiv preprint arXiv:2312.14149, 2023 | 3 | 2023 |
Bootstrapping SparseFormers from Vision Foundation Models Z Gao, Z Tong, KQ Lin, J Chen, MZ Shou IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024 | | 2024 |
SpeedAug: A Simple Co-Augmentation Method for Unsupervised Audio-Visual Pre-training J Wang, J Jiao, Y Song, S James, Z Tong, C Ge, P Abbeel, YH Liu IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Sight …, 2023 | | 2023 |