Yan Shu (舒言)

I'm an incoming PhD candidate at the University of Trento. I will join MHUG Group supervised by Paolo Rota and Nicu Sebe .

Previously, I was a research intern in Beijing Academy of Artificial Intelligence (BAAI) supervised by Bo Zhao and Zheng Liu .

Before that, I got Master`s degree from Harbin Institute of Technology, supervised by Shaohui Liu. I also worked as research assistant in Institute of Information Engineering, Chinese Academy of Sciences, advised by Yu Zhou

Email  /  Scholar  /  Twitter  /  Github  /  小红书

profile photo

Research

I'm interested in computer vision, multimodal learning, video understanding and OCR. Below are some selected publications. (* indicates equal contribution.)

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Yan Shu, Peitian Zhang, Zheng Liu, Minghao Qin, Junjie Zhou, Tiejun Huang, Bo Zhao
Arxiv, 2024  
project page / arXiv

First-ever hour-scale video understanding models.

MLVU: Multi-task Long Video Understanding Benchmark
Junjie Zhou*, Yan Shu*, Bo Zhao*, Boya Wu, Shitao Xiao, Xi Yang, Yongping Xiong, Bo Zhang, Tiejun Huang, Zheng Liu
Arxiv, 2024  
project page / arXiv

First-ever comprehensive long video benchmark.

TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control
Weichao Zeng, Yan Shu, Zhenhang Li, Dongbao Yang, Yu Zhou
NeurIPS, 2024   (Spotlight)
project page / arXiv

A diffusion-based scene text editing model as well as a real-world scene text editing benchmark.

First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending
Zhenhang Li, Yan Shu, Weichao Zeng Dongbao Yang, Yu Zhou
ECAI, 2024  
project page / arXiv

A diffusion-based scene text generation model as well as a synthetic scene text detection dataset.

CLiF-VQA: Enhancing Video Quality Assessment by Incorporating High-Level Semantic Information related to Human Feelings
Yachun Mi, Yan Shu, Yu Li Chen Hui, Puchao Zhou, Shaohui Liu
ACM MM, 2024  
project page / arXiv

Video quality assessment framework based on CLIP.

Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing
Yan Shu, Weichao Zeng Zhenhang Li, Fangmin Zhao, Yu Zhou
Arxiv, 2024  
project page / arXiv

Survey on low-level scene text processing methods.

Perceiving Ambiguity and Semantics without Recognition: An Efficient and Effective Ambiguous Scene Text Detector
Yan Shu, Wei Wang, Yu Zhou, Shaohui Liu, Aoting Zhang, Dongbao Yang, Weiping Wang
ACM MM, 2023   (Oral)
project page / arXiv

A model designed for ambiguous scene text detection.

Education and Working Experience

Research intern in BAAI (2024.03-2024.11)
cs188 Research Assistant in Chinese Academy of Sciences (2023.06-2023.12)
cs188 Master in Harbin Institute of Technology (2021.09-2023.06)
cs188 Bachelor in University of International Relations (2017.09-2021.06)

Services

Reviewer in ACM MM 2024, ICLR 2025.