News
- 12/2024: 😇😇Start my PhD journey at the University of Trento, fighting step by step.
- 11/2024: 🎉🎉Finished my journey in BAAI. Great thanks to my advisors Zheng Liu and Bo Zhao.
- 12/2023: 😄😄Ended my RA at CAS. Great thanks to my advisor Yu Zhou.
- 06/2023: 🎉🎉Got my Master`s Degree at HIT. Great thanks to my advisor Shaohui Liu.
|
Research
I'm interested in computer vision, multimodal learning, video understanding, Remote Sensing and OCR. Below are some selected publications. (* indicates equal contribution.)
|
|
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Yan Shu,
Peitian Zhang,
Zheng Liu,
Minghao Qin,
Junjie Zhou,
Tiejun Huang,
Bo Zhao
CVPR, 2025  
project page /
Arxiv
First-ever hour-scale video understanding models.
|
|
MLVU: Multi-task Long Video Understanding Benchmark
Junjie Zhou*,
Yan Shu*,
Bo Zhao*,
Boya Wu,
Shitao Xiao,
Xi Yang,
Yongping Xiong,
Bo Zhang,
Tiejun Huang,
Zheng Liu
CVPR, 2025  
project page /
Arxiv
First-ever comprehensive long video benchmark.
|
|
TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control
Weichao Zeng,
Yan Shu,
Zhenhang Li,
Dongbao Yang,
Yu Zhou
NeurIPS, 2024  
(Spotlight)
project page /
arXiv
A diffusion-based scene text editing model as well as a real-world scene text editing benchmark.
|
|
First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending
Zhenhang Li,
Yan Shu,
Weichao Zeng
Dongbao Yang,
Yu Zhou
ECAI, 2024  
project page /
arXiv
A diffusion-based scene text generation model as well as a synthetic scene text detection dataset.
|
|
CLiF-VQA: Enhancing Video Quality Assessment by Incorporating High-Level Semantic Information related to Human Feelings
Yachun Mi,
Yan Shu,
Yu Li
Chen Hui,
Puchao Zhou,
Shaohui Liu
ACM MM, 2024  
project page /
arXiv
Video quality assessment framework based on CLIP.
|
|
Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing
Yan Shu,
Weichao Zeng
Zhenhang Li,
Fangmin Zhao,
Yu Zhou
Arxiv, 2024  
project page /
arXiv
Survey on low-level scene text processing methods.
|
|
Perceiving Ambiguity and Semantics without Recognition: An Efficient and Effective Ambiguous Scene Text Detector
Yan Shu,
Wei Wang,
Yu Zhou,
Shaohui Liu,
Aoting Zhang,
Dongbao Yang,
Weiping Wang
ACM MM, 2023  
(Oral)
project page /
arXiv
A model designed for ambiguous scene text detection.
|
Talks
- 12/2024: Give a talk about Video-XL on 智源学者论坛
- 11/2024: Give a talk on Video LLMs at Renmin University invited by Prof. Ruihua Song
|
Education and Working Experience
|
|