Rethinking kullback-leibler divergence in knowledge distillation for large language models T Wu, C Tao, J Wang, R Yang, Z Zhao, N Wong arXiv preprint arXiv:2404.02657, 2024 | 23 | 2024 |
Llm-neo: Parameter efficient knowledge distillation for large language models R Yang, T Wu, J Wang, P Hu, YC Wu, N Wong, Y Yang arXiv preprint arXiv:2411.06839, 2024 | 1 | 2024 |
Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning Z Li, Y Su, R Yang, C Xie, Z Wang, Z Xie, N Wong, H Yang arXiv preprint arXiv:2501.03035, 2025 | | 2025 |
LoCa: Logit Calibration for Knowledge Distillation R Yang, T Wu, Y Yang arXiv preprint arXiv:2409.04778, 2024 | | 2024 |