Picture for Ping Luo

Ping Luo

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Add code
Dec 25, 2023
Viaarxiv icon

DriveLM: Driving with Graph Visual Question Answering

Add code
Dec 21, 2023
Figure 1 for DriveLM: Driving with Graph Visual Question Answering
Figure 2 for DriveLM: Driving with Graph Visual Question Answering
Figure 3 for DriveLM: Driving with Graph Visual Question Answering
Figure 4 for DriveLM: Driving with Graph Visual Question Answering
Viaarxiv icon

Cached Transformers: Improving Transformers with Differentiable Memory Cache

Add code
Dec 20, 2023
Viaarxiv icon

SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

Add code
Dec 18, 2023
Figure 1 for SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution
Figure 2 for SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution
Figure 3 for SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution
Figure 4 for SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution
Viaarxiv icon

You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception

Add code
Dec 09, 2023
Viaarxiv icon

GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation

Add code
Dec 07, 2023
Figure 1 for GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Figure 2 for GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Figure 3 for GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Figure 4 for GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Viaarxiv icon

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

Add code
Dec 06, 2023
Figure 1 for MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
Figure 2 for MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
Figure 3 for MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
Figure 4 for MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
Viaarxiv icon

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

Add code
Dec 03, 2023
Figure 1 for MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Figure 2 for MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Figure 3 for MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Figure 4 for MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Viaarxiv icon

MLLMs-Augmented Visual-Language Representation Learning

Add code
Dec 01, 2023
Figure 1 for MLLMs-Augmented Visual-Language Representation Learning
Figure 2 for MLLMs-Augmented Visual-Language Representation Learning
Figure 3 for MLLMs-Augmented Visual-Language Representation Learning
Figure 4 for MLLMs-Augmented Visual-Language Representation Learning
Viaarxiv icon

Advancing Vision Transformers with Group-Mix Attention

Add code
Nov 26, 2023
Figure 1 for Advancing Vision Transformers with Group-Mix Attention
Figure 2 for Advancing Vision Transformers with Group-Mix Attention
Figure 3 for Advancing Vision Transformers with Group-Mix Attention
Figure 4 for Advancing Vision Transformers with Group-Mix Attention
Viaarxiv icon