Yan Shu (舒言)

I'm a 2nd PhD student at the University of Trento. I am advised by Nicu Sebe and Paolo Rota in MHUG Group .

Previously, I was a research intern in Beijing Academy of Artificial Intelligence (BAAI) supervised by Bo Zhao and Zheng Liu .

Before that, I got Master`s degree from Harbin Institute of Technology, supervised by Shaohui Liu. I also worked as research assistant in Institute of Information Engineering, Chinese Academy of Sciences, advised by Yu Zhou

Email / Scholar / Twitter / Github / 小红书

News

12/2024: 😇😇Start my PhD journey at the University of Trento, fighting step by step.
11/2024: 🎉🎉Finished my journey in BAAI. Great thanks to my advisors Zheng Liu and Bo Zhao.
12/2023: 😄😄Ended my RA at CAS. Great thanks to my advisor Yu Zhou.
06/2023: 🎉🎉Got my Master`s Degree at HIT. Great thanks to my advisor Shaohui Liu.

Research

I'm interested in computer vision, multimodal learning, video understanding, Remote Sensing and OCR. Below are some selected publications. (* indicates equal contribution.)

	When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding Yan Shu, Hangui Lin, Yexin Liu, Yan Zhang, Gangyan Zeng, Yan Li, Yu Zhou, Ser-Nam Lim, Harry Yang, Nicu Sebe, NeurIPS*, 2025 project page / Arxiv Exploring and mitigating semantic hallucination in MLLMs.
	Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding Yan Shu, Peitian Zhang, Zheng Liu, Minghao Qin, Junjie Zhou, Tiejun Huang, Bo Zhao CVPR, 2025 (Oral) project page / Arxiv First-ever hour-scale video understanding models.
	MLVU: Multi-task Long Video Understanding Benchmark Junjie Zhou, Yan Shu, Bo Zhao, Boya Wu, Shitao Xiao, Xi Yang, Yongping Xiong, Bo Zhang, Tiejun Huang, Zheng Liu CVPR*, 2025 project page / Arxiv First-ever comprehensive long video benchmark.
	TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control Weichao Zeng, Yan Shu, Zhenhang Li, Dongbao Yang, Yu Zhou NeurIPS, 2024 (Spotlight) project page / arXiv A diffusion-based scene text editing model as well as a real-world scene text editing benchmark.
	Perceiving Ambiguity and Semantics without Recognition: An Efficient and Effective Ambiguous Scene Text Detector Yan Shu, Wei Wang, Yu Zhou, Shaohui Liu, Aoting Zhang, Dongbao Yang, Weiping Wang ACM MM, 2023 (Oral) project page / arXiv A model designed for ambiguous scene text detection.

Talks

12/2025: Give a talk about Semantic Halluciantion in MLLMs on CSIG(中国图像图形学会)
12/2024: Give a talk about Video-XL on 智源学者论坛
11/2024: Give a talk on Video LLMs at Renmin University invited by Prof. Ruihua Song

Education and Working Experience

	PhD student in UniTN (2024.12-2027.12 (predicted))
	Research intern in BAAI (2024.03-2024.11)
	Research Assistant in Chinese Academy of Sciences (2023.06-2023.12)
	Master in Harbin Institute of Technology (2021.09-2023.06)
	Bachelor in University of International Relations (2017.09-2021.06)

Services

Reviewer in TPAMI, TIP, CVPR, ICLR, NeurIPS, ICCV...