关注
Sihan Chen
标题
引用次数
引用次数
年份
Cptr: Full transformer network for image captioning
W Liu, S Chen, L Guo, X Zhu, J Liu
arXiv preprint arXiv:2101.10804, 2021
2142021
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
J Liu, S Chen, X He, L Guo, X Zhu, W Wang, J Tang
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
99*2024
Vast: A vision-audio-subtitle-text omni-modality foundation model and dataset
S Chen, H Li, Q Wang, Z Zhao, M Sun, X Zhu, J Liu
Advances in Neural Information Processing Systems 36, 72842-72866, 2023
982023
Chatbridge: Bridging modalities with large language model as a language catalyst
Z Zhao, L Guo, T Yue, S Chen, S Shao, X Zhu, Z Yuan, J Liu
arXiv preprint arXiv:2305.16103, 2023
532023
Vl-mamba: Exploring state space models for multimodal learning
Y Qiao, Z Yu, L Guo, S Chen, Z Zhao, M Sun, Q Wu, J Liu
arXiv preprint arXiv:2403.13600, 2024
492024
Global-local propagation network for RGB-D semantic segmentation
S Chen, X Zhu, W Liu, X He, J Liu
arXiv preprint arXiv:2101.10801, 2021
242021
Vlab: Enhancing video language pre-training by feature adapting and blending
X He, S Chen, F Ma, Z Huang, X Jin, Z Liu, D Fu, Y Yang, J Liu, J Feng
IEEE Transactions on Multimedia, 2023
202023
Sounding video generator: A unified framework for text-guided sounding video generation
J Liu, W Wang, S Chen, X Zhu, J Liu
IEEE Transactions on Multimedia 26, 141-153, 2023
92023
Cosa: Concatenated sample pretrained vision-language foundation model
S Chen, X He, H Li, X Jin, J Feng, J Liu
The Twelfth International Conference on Learning Representations, 2023
82023
Mm21 pre-training for video understanding challenge: Video captioning with pretraining techniques
S Chen, X Zhu, D Hao, W Liu, J Liu, Z Zhao, L Guo, J Liu
Proceedings of the 29th ACM International Conference on Multimedia, 4853-4857, 2021
82021
GLOBER: coherent non-autoregressive video generation via global guided video decoder
M Sun, W Wang, Z Qin, J Sun, S Chen, J Liu
Advances in Neural Information Processing Systems 36, 2024
42024
EAVL: Explicitly Align Vision and Language for Referring Image Segmentation
Y Yan, X He, W Wang, S Chen, J Liu
arXiv preprint arXiv:2308.09779, 2023
22023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Z Liu, S Chen, L Guo, H Li, X He, J Liu
Proceedings of the 31st ACM International Conference on Multimedia, 5120-5131, 2023
12023
CROSS-MODAL DATA PROCESSING METHOD AND APPARATUS, DEVICE, MEDIUM, AND PROGRAM PRODUCT
X Jin, S Chen, J Feng, X HE, H LI, J Liu
US Patent App. 18/744,418, 2024
2024
VIDEO PROCESSING METHOD, APPARATUS, DEVICE, MEDIUM, AND PROGRAM PRODUCT
X Jin, X HE, S Chen, F MA, Z Huang, J Liu, J Feng
US Patent App. 18/671,708, 2024
2024
Fuse and Calibrate: A Bi-directional Vision-Language Guided Framework for Referring Image Segmentation
Y Yan, X He, S Chen, S Lu, J Liu
International Conference on Intelligent Computing, 313-324, 2024
2024
Calibration & Reconstruction: Deeply Integrated Language for Referring Image Segmentation
Y Yan, X He, S Chen, J Liu
Proceedings of the 2024 International Conference on Multimedia Retrieval …, 2024
2024
系统目前无法执行此操作,请稍后再试。
文章 1–17