Picture for Ping Luo

Ping Luo

Low-Latency Privacy-Preserving Deep Learning Design via Secure MPC

Add code
Jul 24, 2024
Viaarxiv icon

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Add code
Jul 24, 2024
Figure 1 for Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
Figure 2 for Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
Figure 3 for Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
Figure 4 for Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
Viaarxiv icon

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

Add code
Jul 18, 2024
Viaarxiv icon

TCFormer: Visual Recognition via Token Clustering Transformer

Add code
Jul 16, 2024
Figure 1 for TCFormer: Visual Recognition via Token Clustering Transformer
Figure 2 for TCFormer: Visual Recognition via Token Clustering Transformer
Figure 3 for TCFormer: Visual Recognition via Token Clustering Transformer
Figure 4 for TCFormer: Visual Recognition via Token Clustering Transformer
Viaarxiv icon

Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts

Add code
Jul 16, 2024
Figure 1 for Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts
Figure 2 for Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts
Viaarxiv icon

When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset

Add code
Jul 14, 2024
Viaarxiv icon

IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

Add code
Jul 10, 2024
Figure 1 for IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
Figure 2 for IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
Figure 3 for IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
Figure 4 for IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
Viaarxiv icon

PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models

Add code
Jun 17, 2024
Figure 1 for PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models
Figure 2 for PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models
Figure 3 for PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models
Figure 4 for PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models
Viaarxiv icon

DAG-Plan: Generating Directed Acyclic Dependency Graphs for Dual-Arm Cooperative Planning

Add code
Jun 14, 2024
Viaarxiv icon

Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

Add code
Jun 13, 2024
Figure 1 for Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality
Figure 2 for Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality
Figure 3 for Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality
Figure 4 for Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality
Viaarxiv icon