Picture for Peng Gao

Peng Gao

DK

An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape

Add code
Apr 24, 2024
Viaarxiv icon

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Add code
Apr 24, 2024
Figure 1 for MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Figure 2 for MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Figure 3 for MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Figure 4 for MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Viaarxiv icon

No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation

Add code
Apr 05, 2024
Figure 1 for No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation
Figure 2 for No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation
Figure 3 for No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation
Figure 4 for No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation
Viaarxiv icon

Multi-Robot Collaborative Navigation with Formation Adaptation

Add code
Apr 02, 2024
Figure 1 for Multi-Robot Collaborative Navigation with Formation Adaptation
Figure 2 for Multi-Robot Collaborative Navigation with Formation Adaptation
Figure 3 for Multi-Robot Collaborative Navigation with Formation Adaptation
Figure 4 for Multi-Robot Collaborative Navigation with Formation Adaptation
Viaarxiv icon

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

Add code
Apr 01, 2024
Figure 1 for Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Figure 2 for Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Figure 3 for Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Figure 4 for Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Viaarxiv icon

CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation

Add code
Mar 26, 2024
Figure 1 for CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation
Figure 2 for CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation
Figure 3 for CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation
Figure 4 for CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation
Viaarxiv icon

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Add code
Mar 21, 2024
Figure 1 for MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Figure 2 for MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Figure 3 for MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Figure 4 for MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Viaarxiv icon

ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models

Add code
Mar 17, 2024
Viaarxiv icon

Masked AutoDecoder is Effective Multi-Task Vision Generalist

Add code
Mar 14, 2024
Figure 1 for Masked AutoDecoder is Effective Multi-Task Vision Generalist
Figure 2 for Masked AutoDecoder is Effective Multi-Task Vision Generalist
Figure 3 for Masked AutoDecoder is Effective Multi-Task Vision Generalist
Figure 4 for Masked AutoDecoder is Effective Multi-Task Vision Generalist
Viaarxiv icon

In Defense and Revival of Bayesian Filtering for Thermal Infrared Object Tracking

Add code
Feb 27, 2024
Viaarxiv icon