Picture for Ping Luo

Ping Luo

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

Add code
Jun 12, 2024
Viaarxiv icon

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

Add code
Jun 12, 2024
Figure 1 for GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Figure 2 for GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Figure 3 for GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Figure 4 for GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Viaarxiv icon

Needle In A Multimodal Haystack

Add code
Jun 11, 2024
Viaarxiv icon

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Add code
Jun 10, 2024
Figure 1 for Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Figure 2 for Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Figure 3 for Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Figure 4 for Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Viaarxiv icon

Uncovering Limitations of Large Language Models in Information Seeking from Tables

Add code
Jun 06, 2024
Viaarxiv icon

Learning Manipulation by Predicting Interaction

Add code
Jun 01, 2024
Figure 1 for Learning Manipulation by Predicting Interaction
Figure 2 for Learning Manipulation by Predicting Interaction
Figure 3 for Learning Manipulation by Predicting Interaction
Figure 4 for Learning Manipulation by Predicting Interaction
Viaarxiv icon

Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View

Add code
May 27, 2024
Figure 1 for Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
Figure 2 for Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
Figure 3 for Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
Figure 4 for Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
Viaarxiv icon

Part123: Part-aware 3D Reconstruction from a Single-view Image

Add code
May 27, 2024
Figure 1 for Part123: Part-aware 3D Reconstruction from a Single-view Image
Figure 2 for Part123: Part-aware 3D Reconstruction from a Single-view Image
Figure 3 for Part123: Part-aware 3D Reconstruction from a Single-view Image
Figure 4 for Part123: Part-aware 3D Reconstruction from a Single-view Image
Viaarxiv icon

UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge

Add code
May 23, 2024
Figure 1 for UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge
Figure 2 for UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge
Figure 3 for UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge
Figure 4 for UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge
Viaarxiv icon

AnalogCoder: Analog Circuit Design via Training-Free Code Generation

Add code
May 23, 2024
Figure 1 for AnalogCoder: Analog Circuit Design via Training-Free Code Generation
Figure 2 for AnalogCoder: Analog Circuit Design via Training-Free Code Generation
Figure 3 for AnalogCoder: Analog Circuit Design via Training-Free Code Generation
Figure 4 for AnalogCoder: Analog Circuit Design via Training-Free Code Generation
Viaarxiv icon