Follow
Siyuan Huang
Siyuan Huang
Shanghai AI Lab && SJTU && MMLab CUHK
Verified email at sjtu.edu.cn - Homepage
Title
Cited by
Cited by
Year
Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners
R Zhang, X Hu, B Li, S Huang, H Deng, Y Qiao, P Gao, H Li
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023
842023
Multi-modal sensor fusion for auto driving perception: A survey
K Huang, B Shi, X Li, X Li, S Huang, Y Li
arXiv preprint arXiv:2202.02703, 2022
842022
Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models
P Xu, W Shao, K Zhang, P Gao, S Liu, M Lei, F Meng, S Huang, Y Qiao, ...
arXiv preprint arXiv:2306.09265, 2023
762023
Sphinx: The joint mixing of weights, tasks, and visual embeddings for multi-modal large language models
Z Lin, C Liu, R Zhang, P Gao, L Qiu, H Xiao, H Qiu, C Lin, W Shao, ...
arXiv preprint arXiv:2311.07575, 2023
582023
Instruct2act: Mapping multi-modality instructions to robotic actions with large language model
S Huang, Z Jiang, H Dong, Y Qiao, P Gao, H Li
arXiv preprint arXiv:2305.11176, 2023
522023
Tiny lvlm-ehub: Early multimodal experiments with bard
W Shao, Y Hu, P Gao, M Lei, K Zhang, F Meng, P Xu, S Huang, H Li, ...
arXiv preprint arXiv:2308.03729, 2023
162023
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
P Gao, R Zhang, C Liu, L Qiu, S Huang, W Lin, S Zhao, S Geng, Z Lin, ...
arXiv preprint arXiv:2402.05935, 2024
112024
Bridging zero-shot object navigation and foundation models through pixel-guided navigation skill
W Cai, S Huang, G Cheng, Y Long, P Gao, C Sun, H Dong
arXiv preprint arXiv:2309.10309, 2023
72023
Sug: Single-dataset unified generalization for 3d point cloud classification
S Huang, B Zhang, B Shi, H Li, Y Li, P Gao
Proceedings of the 31st ACM International Conference on Multimedia, 8644-8652, 2023
52023
Adas: A simple active-and-adaptive baseline for cross-domain 3d semantic segmentation
B Fei, S Huang, J Yuan, B Shi, B Zhang, T Chen, M Dou, Y Qiao
arXiv preprint arXiv 2212, 2, 2022
42022
Model-based multiple object tracking using capacitive proximity sensors
S Huang, H Alagi, B Hein
2nd Full-day Workshop on Progress in Ergonomic Physical Human-Robot …, 0
1
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
W Lin, X Wei, R An, P Gao, B Zou, Y Luo, S Huang, S Zhang, H Li
arXiv preprint arXiv:2403.20271, 2024
2024
ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models
S Huang, I Ponomarenko, Z Jiang, X Li, X Hu, P Gao, H Li, H Dong
arXiv preprint arXiv:2403.11289, 2024
2024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
X Lu, Q Liu, Y Xu, A Zhou, S Huang, B Zhang, J Yan, H Li
arXiv preprint arXiv:2402.14800, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–14