Picture for Xiyang Dai

Xiyang Dai

DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs

Add code
Jun 06, 2024
Viaarxiv icon

Rewrite the Stars

Add code
Mar 29, 2024
Figure 1 for Rewrite the Stars
Figure 2 for Rewrite the Stars
Figure 3 for Rewrite the Stars
Figure 4 for Rewrite the Stars
Viaarxiv icon

Efficient Modulation for Vision Networks

Add code
Mar 29, 2024
Figure 1 for Efficient Modulation for Vision Networks
Figure 2 for Efficient Modulation for Vision Networks
Figure 3 for Efficient Modulation for Vision Networks
Figure 4 for Efficient Modulation for Vision Networks
Viaarxiv icon

Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search

Add code
Mar 15, 2024
Figure 1 for Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search
Figure 2 for Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search
Figure 3 for Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search
Figure 4 for Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search
Viaarxiv icon

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Add code
Nov 10, 2023
Figure 1 for Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Figure 2 for Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Figure 3 for Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Figure 4 for Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Viaarxiv icon

On the Hidden Waves of Image

Add code
Oct 19, 2023
Figure 1 for On the Hidden Waves of Image
Figure 2 for On the Hidden Waves of Image
Figure 3 for On the Hidden Waves of Image
Figure 4 for On the Hidden Waves of Image
Viaarxiv icon

Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection

Add code
Oct 18, 2023
Figure 1 for Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection
Figure 2 for Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection
Figure 3 for Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection
Figure 4 for Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection
Viaarxiv icon

LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following

Add code
Oct 18, 2023
Figure 1 for LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following
Figure 2 for LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following
Figure 3 for LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following
Figure 4 for LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following
Viaarxiv icon

Image is First-order Norm+Linear Autoregressive

Add code
May 25, 2023
Figure 1 for Image is First-order Norm+Linear Autoregressive
Figure 2 for Image is First-order Norm+Linear Autoregressive
Figure 3 for Image is First-order Norm+Linear Autoregressive
Figure 4 for Image is First-order Norm+Linear Autoregressive
Viaarxiv icon

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System

Add code
Apr 29, 2023
Figure 1 for ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System
Figure 2 for ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System
Figure 3 for ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System
Figure 4 for ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System
Viaarxiv icon