Picture for Xiyang Dai

Xiyang Dai

LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following

Add code
Oct 18, 2023
Figure 1 for LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following
Figure 2 for LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following
Figure 3 for LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following
Figure 4 for LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following
Viaarxiv icon

Image is First-order Norm+Linear Autoregressive

Add code
May 25, 2023
Viaarxiv icon

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System

Add code
Apr 29, 2023
Figure 1 for ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System
Figure 2 for ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System
Figure 3 for ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System
Figure 4 for ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System
Viaarxiv icon

OmniTracker: Unifying Object Tracking by Tracking-with-Detection

Add code
Mar 21, 2023
Viaarxiv icon

Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations

Add code
Feb 27, 2023
Figure 1 for Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations
Figure 2 for Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations
Figure 3 for Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations
Figure 4 for Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations
Viaarxiv icon

Generalized Decoding for Pixel, Image, and Language

Add code
Dec 21, 2022
Figure 1 for Generalized Decoding for Pixel, Image, and Language
Figure 2 for Generalized Decoding for Pixel, Image, and Language
Figure 3 for Generalized Decoding for Pixel, Image, and Language
Figure 4 for Generalized Decoding for Pixel, Image, and Language
Viaarxiv icon

Look Before You Match: Instance Understanding Matters in Video Object Segmentation

Add code
Dec 13, 2022
Figure 1 for Look Before You Match: Instance Understanding Matters in Video Object Segmentation
Figure 2 for Look Before You Match: Instance Understanding Matters in Video Object Segmentation
Figure 3 for Look Before You Match: Instance Understanding Matters in Video Object Segmentation
Figure 4 for Look Before You Match: Instance Understanding Matters in Video Object Segmentation
Viaarxiv icon

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning

Add code
Dec 08, 2022
Viaarxiv icon

Self-Supervised Learning based on Heat Equation

Add code
Nov 23, 2022
Viaarxiv icon

Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling

Add code
Aug 25, 2022
Figure 1 for Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling
Figure 2 for Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling
Figure 3 for Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling
Figure 4 for Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling
Viaarxiv icon