Mathverse: Does your multi-modal llm truly see the diagrams in visual math problems? R Zhang, D Jiang, Y Zhang, H Lin, Z Guo, P Qiu, A Zhou, P Lu, KW Chang, ... European Conference on Computer Vision, 169-186, 2025 | 97 | 2025 |
Temporal enhanced training of multi-view 3d object detector via historical object prediction Z Zong, D Jiang, G Song, Z Xue, J Su, H Li, Y Liu Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 34 | 2023 |
Mova: Adapting mixture of vision experts to multimodal context Z Zong, B Ma, D Shen, G Song, H Shao, D Jiang, H Li, Y Liu arXiv preprint arXiv:2404.13046, 2024 | 27 | 2024 |
Mavis: Mathematical visual instruction tuning R Zhang, X Wei, D Jiang, Y Zhang, Z Guo, C Tong, J Liu, A Zhou, B Wei, ... arXiv e-prints, arXiv: 2407.08739, 2024 | 25 | 2024 |
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching D Jiang, G Song, X Wu, R Zhang, D Shen, Z Zong, Y Liu, H Li arXiv preprint arXiv:2404.03653, 2024 | 9 | 2024 |
Mavis: Mathematical visual instruction tuning with an automatic data engine R Zhang, X Wei, D Jiang, Z Guo, S Li, Y Zhang, C Tong, J Liu, A Zhou, ... arXiv preprint arXiv:2407.08739, 2024 | 2 | 2024 |
Mmsearch: Benchmarking the potential of large models as multi-modal search engines D Jiang, R Zhang, Z Guo, Y Wu, J Lei, P Qiu, P Lu, Z Chen, G Song, ... arXiv preprint arXiv:2409.12959, 2024 | 1 | 2024 |
Easyref: Omni-generalized group image reference for diffusion models via multimodal llm Z Zong, D Jiang, B Ma, G Song, H Shao, D Shen, Y Liu, H Li arXiv preprint arXiv:2412.09618, 2024 | | 2024 |