Mvptr: Multi-level semantic alignment for vision-language pre-training via multi-stage learning Z Li, Z Fan, H Tou, J Chen, Z Wei, X Huang Proceedings of the 30th ACM International Conference on Multimedia, 4395-4405, 2022 | 32* | 2022 |
TCIC: Theme concepts learning cross language and vision for image captioning Z Fan, Z Wei, S Wang, R Wang, Z Li, H Shan, X Huang arXiv preprint arXiv:2106.10936, 2021 | 30 | 2021 |
Unifying cross-lingual and cross-modal modeling towards weakly supervised multilingual vision-language pre-training Z Li, Z Fan, J Chen, Q Zhang, XJ Huang, Z Wei Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023 | 15 | 2023 |
Unifying local and global knowledge: Empowering large language models as political experts with knowledge graphs X Mou, Z Li, H Lyu, J Luo, Z Wei Proceedings of the ACM Web Conference 2024, 2603-2614, 2024 | 11 | 2024 |
Constructing phrase-level semantic labels to form multi-grained supervision for image-text retrieval Z Fan, Z Wei, Z Li, S Wang, H Shan, X Huang, J Fan Proceedings of the 2022 International Conference on Multimedia Retrieval …, 2022 | 11 | 2022 |
Negative sample is negative in its own way: Tailoring negative sentences for image-text retrieval Z Fan, Z Wei, Z Li, S Wang, J Fan arXiv preprint arXiv:2111.03349, 2021 | 9 | 2021 |
Embspatial-bench: Benchmarking spatial understanding for embodied tasks with large vision-language models M Du, B Wu, Z Li, X Huang, Z Wei arXiv preprint arXiv:2406.05756, 2024 | 8 | 2024 |
Vocot: Unleashing visually grounded multi-step reasoning in large multi-modal models Z Li, R Luo, J Zhang, M Qiu, Z Wei arXiv preprint arXiv:2405.16919, 2024 | 7 | 2024 |
Reform-eval: Evaluating large vision language models via unified re-formulation of task-oriented benchmarks Z Li, Y Wang, M Du, Q Liu, B Wu, J Zhang, C Zhou, Z Fan, J Fu, J Chen, ... Proceedings of the 32nd ACM International Conference on Multimedia, 1971-1980, 2024 | 5 | 2024 |
An unsupervised sampling approach for image-sentence matching using document-level structural information Z Li, Z Wei, Z Fan, H Shan, X Huang Proceedings of the AAAI Conference on Artificial Intelligence 35 (15), 13324 …, 2021 | 5 | 2021 |
A unified continuous learning framework for multi-modal knowledge discovery and pre-training Z Fan, Z Wei, J Chen, S Wang, Z Li, J Xu, X Huang arXiv preprint arXiv:2206.05555, 2022 | 4 | 2022 |
Delan: Dual-level alignment for vision-and-language navigation by cross-modal contrastive learning M Du, B Wu, J Zhang, Z Fan, Z Li, R Luo, X Huang, Z Wei arXiv preprint arXiv:2404.01994, 2024 | 3 | 2024 |
Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference S Wang, D Wang, C Zhou, Z Li, Z Fan, X Huang, Z Wei arXiv preprint arXiv:2412.12785, 2024 | | 2024 |
An Iterative Framework for Document-Level Event Argument Extraction Assisted by Long Short-Term Memory T You, Z Li, Z Fan, C Yin, Y He, J Cai, JH Fu, Z Wei CCF International Conference on Natural Language Processing and Chinese …, 2024 | | 2024 |
Graph Interpretation of Image-Text Matching: Link Prediction on Concept-Enhanced Cross-Modal Graph Z Fan, Z Li, S Wang, Z Wei, H Shan CCF International Conference on Natural Language Processing and Chinese …, 2024 | | 2024 |
Continuous or Discrete, That Is the Question: A Survey on Large Multi-Modal Models from the Perspective of Input-Output Space Extension Z Li, J Zhang, D Wang, Y Wang, X Huang, Z Wei | | 2024 |