Picture for Jinyu Yang

Jinyu Yang

X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs

Add code
Jul 18, 2024
Viaarxiv icon

PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

Add code
Jun 24, 2024
Viaarxiv icon

1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

Add code
Jun 11, 2024
Viaarxiv icon

Place Anything into Any Video

Add code
Feb 22, 2024
Figure 1 for Place Anything into Any Video
Figure 2 for Place Anything into Any Video
Figure 3 for Place Anything into Any Video
Viaarxiv icon

On the impact of robot personalization on human-robot interaction: A review

Add code
Jan 22, 2024
Figure 1 for On the impact of robot personalization on human-robot interaction: A review
Figure 2 for On the impact of robot personalization on human-robot interaction: A review
Viaarxiv icon

Track Anything: Segment Anything Meets Videos

Add code
Apr 28, 2023
Figure 1 for Track Anything: Segment Anything Meets Videos
Figure 2 for Track Anything: Segment Anything Meets Videos
Figure 3 for Track Anything: Segment Anything Meets Videos
Figure 4 for Track Anything: Segment Anything Meets Videos
Viaarxiv icon

Learning Dual-Fused Modality-Aware Representations for RGBD Tracking

Add code
Nov 15, 2022
Figure 1 for Learning Dual-Fused Modality-Aware Representations for RGBD Tracking
Figure 2 for Learning Dual-Fused Modality-Aware Representations for RGBD Tracking
Figure 3 for Learning Dual-Fused Modality-Aware Representations for RGBD Tracking
Figure 4 for Learning Dual-Fused Modality-Aware Representations for RGBD Tracking
Viaarxiv icon

Prompting for Multi-Modal Tracking

Add code
Aug 01, 2022
Figure 1 for Prompting for Multi-Modal Tracking
Figure 2 for Prompting for Multi-Modal Tracking
Figure 3 for Prompting for Multi-Modal Tracking
Figure 4 for Prompting for Multi-Modal Tracking
Viaarxiv icon

Vision-Language Pre-Training with Triple Contrastive Learning

Add code
Mar 28, 2022
Figure 1 for Vision-Language Pre-Training with Triple Contrastive Learning
Figure 2 for Vision-Language Pre-Training with Triple Contrastive Learning
Figure 3 for Vision-Language Pre-Training with Triple Contrastive Learning
Figure 4 for Vision-Language Pre-Training with Triple Contrastive Learning
Viaarxiv icon

Multi-modal Alignment using Representation Codebook

Add code
Mar 28, 2022
Figure 1 for Multi-modal Alignment using Representation Codebook
Figure 2 for Multi-modal Alignment using Representation Codebook
Figure 3 for Multi-modal Alignment using Representation Codebook
Figure 4 for Multi-modal Alignment using Representation Codebook
Viaarxiv icon