关注
Kunchang Li
标题
引用次数
引用次数
年份
Tip-adapter: Training-free clip-adapter for better vision-language modeling
R Zhang, R Fang, P Gao, W Zhang, K Li, J Dai, Y Qiao, H Li
ECCV2022, 2021
663*2021
VideoChat: Chat-Centric Video Understanding
K Li, Y He, Y Wang, Y Li, W Wang, P Luo, Y Wang, L Wang, Y Qiao
SCIENCE CHINA Information Sciences, 2023
5022023
PointCLIP: Point Cloud Understanding by CLIP
R Zhang, Z Guo, W Zhang, K Li, X Miao, B Cui, Y Qiao, P Gao, H Li
CVPR2022, 2021
4282021
UniFormer: Unifying convolution and self-attention for visual recognition
K Li, Y Wang, J Zhang, P Gao, G Song, Y Liu, H Li, Y Qiao
TPAMI, 2022
3762022
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Y Wang*, K Li*, Y Li*, Y He*, B Huang*, Z Zhao*, H Zhang, J Xu, Y Liu, ...
arXiv preprint arXiv:2212.03191, 2022
2992022
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
K Li, Y Wang, P Gao, G Song, Y Liu, H Li, Y Qiao
ICLR2022, 2022
2912022
Grounded sam: Assembling open-world models for diverse visual tasks
T Ren, S Liu, A Zeng, J Lin, K Li, H Cao, J Chen, X Huang, Y Chen, F Yan, ...
ICCV2023 Demo, 2024
1962024
Mvbench: A comprehensive multi-modal video understanding benchmark
K Li, Y Wang, Y He, Y Li, Y Wang, Y Liu, Z Wang, J Xu, G Chen, P Luo, ...
CVPR2024, 2023
1842023
Illumination Adaptive Transformer
Z Cui, K Li, L Gu, S Su, P Gao, Z Jiang, Y Qiao, T Harada
BMVC2022, 2022
182*2022
Internvid: A large-scale video-text dataset for multimodal understanding and generation
Y Wang, Y He, Y Li, K Li, J Yu, X Ma, X Li, G Chen, X Chen, Y Wang, C He, ...
ICLR2024, 2023
1802023
Uniformerv2: Spatiotemporal learning by arming image vits with video uniformer
K Li, Y Wang, Y He, Y Li, Y Wang, L Wang, Y Qiao
ICCV2023, 2022
162*2022
Unmasked teacher: Towards training-efficient video foundation models
K Li, Y Wang, Y Li, Y Wang, Y He, L Wang, Y Qiao
ICCV2023 Oral, 2023
1352023
Videomamba: State space model for efficient video understanding
K Li, X Li, Y Wang, Y He, Y Wang, L Wang, Y Qiao
ECCV2024, 2024
1152024
Interngpt: Solving vision-centric tasks by interacting with chatgpt beyond language
Z Liu, Y He, W Wang, W Wang, Y Wang, S Chen, Q Zhang, Z Lai, Y Yang, ...
arXiv preprint arXiv:2305.05662, 2023
812023
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
Y Wang, K Li, X Li, J Yu, Y He, G Chen, B Pei, R Zheng, Z Wang, Y Shi, ...
ECCV2024, 2024
77*2024
CT-Net: Channel tensorization network for video classification
K Li, X Li, Y Wang, J Wang, Y Qiao
ICLR2021, 2021
722021
MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video
DJ Zhang*, K Li*, Y Chen, Y Wang, S Chandra, Y Qiao, L Liu, MZ Shou
ECCV2022, 2021
57*2021
Video mamba suite: State space model as a versatile alternative for video understanding
G Chen, Y Huang, J Xu, B Pei, Z Chen, Z Li, J Wang, K Li, T Lu, L Wang
arXiv preprint arXiv:2403.09626, 2024
472024
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
G Chen, S Xing, Z Chen, Y Wang, K Li, Y Li, Y Liu, J Wang, YD Zheng, ...
ECCVW2022, 2022
422022
Self-slimmed vision transformer
Z Zong*, K Li*, G Song, Y Wang, Y Qiao, B Leng, Y Liu
ECCV2022, 2022
252022
系统目前无法执行此操作,请稍后再试。
文章 1–20