Picture for Qi She

Qi She

ThinkGen: Generalized Thinking for Visual Generation

Add code
Dec 29, 2025
Viaarxiv icon

CodeDance: A Dynamic Tool-integrated MLLM for Executable Visual Reasoning

Add code
Dec 19, 2025
Viaarxiv icon

TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning

Add code
Nov 07, 2025
Viaarxiv icon

Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

Add code
Jun 12, 2025
Figure 1 for Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Figure 2 for Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Figure 3 for Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Figure 4 for Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Viaarxiv icon

TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding

Add code
Apr 02, 2025
Figure 1 for TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding
Figure 2 for TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding
Figure 3 for TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding
Figure 4 for TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding
Viaarxiv icon

ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance

Add code
Dec 09, 2024
Viaarxiv icon

[CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster

Add code
Dec 02, 2024
Figure 1 for [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster
Figure 2 for [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster
Figure 3 for [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster
Figure 4 for [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster
Viaarxiv icon

MC-LLaVA: Multi-Concept Personalized Vision-Language Model

Add code
Nov 18, 2024
Figure 1 for MC-LLaVA: Multi-Concept Personalized Vision-Language Model
Figure 2 for MC-LLaVA: Multi-Concept Personalized Vision-Language Model
Figure 3 for MC-LLaVA: Multi-Concept Personalized Vision-Language Model
Figure 4 for MC-LLaVA: Multi-Concept Personalized Vision-Language Model
Viaarxiv icon

MammothModa: Multi-Modal Large Language Model

Add code
Jun 26, 2024
Viaarxiv icon

PDO-s3DCNNs: Partial Differential Operator Based Steerable 3D CNNs

Add code
Aug 07, 2022
Figure 1 for PDO-s3DCNNs: Partial Differential Operator Based Steerable 3D CNNs
Figure 2 for PDO-s3DCNNs: Partial Differential Operator Based Steerable 3D CNNs
Figure 3 for PDO-s3DCNNs: Partial Differential Operator Based Steerable 3D CNNs
Figure 4 for PDO-s3DCNNs: Partial Differential Operator Based Steerable 3D CNNs
Viaarxiv icon