Picture for Jie Qin

Jie Qin

History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation

Add code
Dec 17, 2025
Figure 1 for History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation
Figure 2 for History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation
Figure 3 for History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation
Figure 4 for History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation
Viaarxiv icon

STAR: STacked AutoRegressive Scheme for Unified Multimodal Learning

Add code
Dec 15, 2025
Figure 1 for STAR: STacked AutoRegressive Scheme for Unified Multimodal Learning
Figure 2 for STAR: STacked AutoRegressive Scheme for Unified Multimodal Learning
Figure 3 for STAR: STacked AutoRegressive Scheme for Unified Multimodal Learning
Figure 4 for STAR: STacked AutoRegressive Scheme for Unified Multimodal Learning
Viaarxiv icon

LoFA: Learning to Predict Personalized Priors for Fast Adaptation of Visual Generative Models

Add code
Dec 09, 2025
Viaarxiv icon

MoniTor: Exploiting Large Language Models with Instruction for Online Video Anomaly Detection

Add code
Oct 24, 2025
Viaarxiv icon

Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions

Add code
Aug 06, 2025
Figure 1 for Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions
Figure 2 for Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions
Figure 3 for Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions
Figure 4 for Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions
Viaarxiv icon

HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking

Add code
Jul 10, 2025
Figure 1 for HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking
Figure 2 for HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking
Figure 3 for HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking
Figure 4 for HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking
Viaarxiv icon

Uncertainty Guided Refinement for Fine-Grained Salient Object Detection

Add code
Apr 13, 2025
Figure 1 for Uncertainty Guided Refinement for Fine-Grained Salient Object Detection
Figure 2 for Uncertainty Guided Refinement for Fine-Grained Salient Object Detection
Figure 3 for Uncertainty Guided Refinement for Fine-Grained Salient Object Detection
Figure 4 for Uncertainty Guided Refinement for Fine-Grained Salient Object Detection
Viaarxiv icon

Uni4D: A Unified Self-Supervised Learning Framework for Point Cloud Videos

Add code
Apr 07, 2025
Figure 1 for Uni4D: A Unified Self-Supervised Learning Framework for Point Cloud Videos
Figure 2 for Uni4D: A Unified Self-Supervised Learning Framework for Point Cloud Videos
Figure 3 for Uni4D: A Unified Self-Supervised Learning Framework for Point Cloud Videos
Figure 4 for Uni4D: A Unified Self-Supervised Learning Framework for Point Cloud Videos
Viaarxiv icon

UniViTAR: Unified Vision Transformer with Native Resolution

Add code
Apr 02, 2025
Viaarxiv icon

WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models

Add code
Jul 14, 2024
Figure 1 for WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models
Figure 2 for WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models
Figure 3 for WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models
Figure 4 for WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models
Viaarxiv icon