Picture for Jianlong Wu

Jianlong Wu

The 1st Winner for 5th PVUW MeViS-Text Challenge: Strong MLLMs Meet SAM3 for Referring Video Object Segmentation

Add code
Apr 01, 2026
Viaarxiv icon

Advancing Complex Video Object Segmentation via Tracking-Enhanced Prompt: The 1st Winner for 5th PVUW MOSE Challenge

Add code
Apr 01, 2026
Viaarxiv icon

StructAlign: Structured Cross-Modal Alignment for Continual Text-to-Video Retrieval

Add code
Jan 28, 2026
Viaarxiv icon

AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding

Add code
Mar 16, 2025
Viaarxiv icon

MegaSR: Mining Customized Semantics and Expressive Guidance for Image Super-Resolution

Add code
Mar 11, 2025
Viaarxiv icon

HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models

Add code
Feb 28, 2025
Figure 1 for HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
Figure 2 for HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
Figure 3 for HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
Figure 4 for HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
Viaarxiv icon

Continuous Knowledge-Preserving Decomposition for Few-Shot Continual Learning

Add code
Jan 09, 2025
Figure 1 for Continuous Knowledge-Preserving Decomposition for Few-Shot Continual Learning
Figure 2 for Continuous Knowledge-Preserving Decomposition for Few-Shot Continual Learning
Figure 3 for Continuous Knowledge-Preserving Decomposition for Few-Shot Continual Learning
Figure 4 for Continuous Knowledge-Preserving Decomposition for Few-Shot Continual Learning
Viaarxiv icon

LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition

Add code
Jan 08, 2025
Figure 1 for LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
Figure 2 for LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
Figure 3 for LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
Figure 4 for LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
Viaarxiv icon

ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding

Add code
Dec 29, 2024
Viaarxiv icon

Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization

Add code
Dec 13, 2024
Figure 1 for Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Figure 2 for Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Figure 3 for Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Figure 4 for Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Viaarxiv icon