Picture for Xiatian Zhu

Xiatian Zhu

Source-Free Domain Adaptation with Frozen Multimodal Foundation Model

Add code
Nov 27, 2023
Figure 1 for Source-Free Domain Adaptation with Frozen Multimodal Foundation Model
Figure 2 for Source-Free Domain Adaptation with Frozen Multimodal Foundation Model
Figure 3 for Source-Free Domain Adaptation with Frozen Multimodal Foundation Model
Figure 4 for Source-Free Domain Adaptation with Frozen Multimodal Foundation Model
Viaarxiv icon

Adaptive-Labeling for Enhancing Remote Sensing Cloud Understanding

Add code
Nov 09, 2023
Viaarxiv icon

Recognize Any Regions

Add code
Nov 02, 2023
Viaarxiv icon

Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping

Add code
Oct 19, 2023
Figure 1 for Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping
Figure 2 for Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping
Figure 3 for Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping
Figure 4 for Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping
Viaarxiv icon

Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting

Add code
Oct 16, 2023
Figure 1 for Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting
Figure 2 for Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting
Figure 3 for Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting
Figure 4 for Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting
Viaarxiv icon

Sarcasm in Sight and Sound: Benchmarking and Expansion to Improve Multimodal Sarcasm Detection

Add code
Sep 29, 2023
Figure 1 for Sarcasm in Sight and Sound: Benchmarking and Expansion to Improve Multimodal Sarcasm Detection
Figure 2 for Sarcasm in Sight and Sound: Benchmarking and Expansion to Improve Multimodal Sarcasm Detection
Figure 3 for Sarcasm in Sight and Sound: Benchmarking and Expansion to Improve Multimodal Sarcasm Detection
Figure 4 for Sarcasm in Sight and Sound: Benchmarking and Expansion to Improve Multimodal Sarcasm Detection
Viaarxiv icon

Leveraging Foundation models for Unsupervised Audio-Visual Segmentation

Add code
Sep 13, 2023
Viaarxiv icon

DiffSED: Sound Event Detection with Denoising Diffusion

Add code
Aug 16, 2023
Viaarxiv icon

Actor-agnostic Multi-label Action Recognition with Multi-modal Query

Add code
Aug 08, 2023
Figure 1 for Actor-agnostic Multi-label Action Recognition with Multi-modal Query
Figure 2 for Actor-agnostic Multi-label Action Recognition with Multi-modal Query
Figure 3 for Actor-agnostic Multi-label Action Recognition with Multi-modal Query
Figure 4 for Actor-agnostic Multi-label Action Recognition with Multi-modal Query
Viaarxiv icon

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models

Add code
Jun 15, 2023
Figure 1 for Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models
Figure 2 for Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models
Figure 3 for Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models
Figure 4 for Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models
Viaarxiv icon