Picture for Rui Shao

Rui Shao

Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding

Add code
Jul 19, 2024
Viaarxiv icon

MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models

Add code
Jul 17, 2024
Viaarxiv icon

RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models

Add code
Apr 07, 2024
Figure 1 for RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models
Figure 2 for RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models
Figure 3 for RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models
Figure 4 for RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models
Viaarxiv icon

CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios

Add code
Mar 07, 2024
Figure 1 for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Figure 2 for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Figure 3 for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Figure 4 for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Viaarxiv icon

Enhancing the Emotional Generation Capability of Large Language Models via Emotional Chain-of-Thought

Add code
Jan 12, 2024
Viaarxiv icon

LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge

Add code
Nov 26, 2023
Figure 1 for LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Figure 2 for LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Figure 3 for LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Figure 4 for LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Viaarxiv icon

Robust Sequential DeepFake Detection

Add code
Sep 26, 2023
Figure 1 for Robust Sequential DeepFake Detection
Figure 2 for Robust Sequential DeepFake Detection
Figure 3 for Robust Sequential DeepFake Detection
Figure 4 for Robust Sequential DeepFake Detection
Viaarxiv icon

Detecting and Grounding Multi-Modal Media Manipulation and Beyond

Add code
Sep 25, 2023
Figure 1 for Detecting and Grounding Multi-Modal Media Manipulation and Beyond
Figure 2 for Detecting and Grounding Multi-Modal Media Manipulation and Beyond
Figure 3 for Detecting and Grounding Multi-Modal Media Manipulation and Beyond
Figure 4 for Detecting and Grounding Multi-Modal Media Manipulation and Beyond
Viaarxiv icon

DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection

Add code
Jun 01, 2023
Figure 1 for DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection
Figure 2 for DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection
Figure 3 for DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection
Figure 4 for DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection
Viaarxiv icon

Detecting and Grounding Multi-Modal Media Manipulation

Add code
Apr 05, 2023
Figure 1 for Detecting and Grounding Multi-Modal Media Manipulation
Figure 2 for Detecting and Grounding Multi-Modal Media Manipulation
Figure 3 for Detecting and Grounding Multi-Modal Media Manipulation
Figure 4 for Detecting and Grounding Multi-Modal Media Manipulation
Viaarxiv icon