Picture for Ping Luo

Ping Luo

Tiny LVLM-eHub: Early Multimodal Experiments with Bard

Add code
Aug 07, 2023
Viaarxiv icon

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

Add code
Jul 13, 2023
Viaarxiv icon

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

Add code
Jul 07, 2023
Figure 1 for GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Figure 2 for GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Figure 3 for GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Figure 4 for GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Viaarxiv icon

ChiPFormer: Transferable Chip Placement via Offline Decision Transformer

Add code
Jun 26, 2023
Viaarxiv icon

Align, Adapt and Inject: Sound-guided Unified Image Generation

Add code
Jun 20, 2023
Figure 1 for Align, Adapt and Inject: Sound-guided Unified Image Generation
Figure 2 for Align, Adapt and Inject: Sound-guided Unified Image Generation
Figure 3 for Align, Adapt and Inject: Sound-guided Unified Image Generation
Figure 4 for Align, Adapt and Inject: Sound-guided Unified Image Generation
Viaarxiv icon

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models

Add code
Jun 15, 2023
Viaarxiv icon

Scene as Occupancy

Add code
Jun 06, 2023
Viaarxiv icon

RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths

Add code
May 29, 2023
Figure 1 for RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
Figure 2 for RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
Figure 3 for RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
Figure 4 for RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
Viaarxiv icon

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers

Add code
May 29, 2023
Figure 1 for DiffRate : Differentiable Compression Rate for Efficient Vision Transformers
Figure 2 for DiffRate : Differentiable Compression Rate for Efficient Vision Transformers
Figure 3 for DiffRate : Differentiable Compression Rate for Efficient Vision Transformers
Figure 4 for DiffRate : Differentiable Compression Rate for Efficient Vision Transformers
Viaarxiv icon

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Add code
May 25, 2023
Figure 1 for VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Figure 2 for VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Figure 3 for VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Figure 4 for VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Viaarxiv icon