Picture for Qi She

Qi She

EvoVid: Temporal-Centric Self-Evolution for Video Large Language Models

Add code
May 21, 2026
Viaarxiv icon

TextSculptor: Training and Benchmarking Scene Text Editing

Add code
May 20, 2026
Viaarxiv icon

AD-MIR: Bridging the Gap from Perception to Persuasion in Advertising Video Understanding via Structured Reasoning

Add code
Feb 07, 2026
Viaarxiv icon

Video-KTR: Reinforcing Video Reasoning via Key Token Attribution

Add code
Jan 27, 2026
Viaarxiv icon

ThinkGen: Generalized Thinking for Visual Generation

Add code
Dec 29, 2025
Viaarxiv icon

CodeDance: A Dynamic Tool-integrated MLLM for Executable Visual Reasoning

Add code
Dec 19, 2025
Viaarxiv icon

TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning

Add code
Nov 07, 2025
Viaarxiv icon

Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

Add code
Jun 12, 2025
Figure 1 for Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Figure 2 for Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Figure 3 for Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Figure 4 for Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Viaarxiv icon

TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding

Add code
Apr 02, 2025
Figure 1 for TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding
Figure 2 for TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding
Figure 3 for TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding
Figure 4 for TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding
Viaarxiv icon

ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance

Add code
Dec 09, 2024
Viaarxiv icon