Picture for Yiheng Li

Yiheng Li

Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation

Add code
Jun 06, 2025
Viaarxiv icon

Improved Immiscible Diffusion: Accelerate Diffusion Training by Reducing Its Miscibility

Add code
May 24, 2025
Viaarxiv icon

Benchmarking Chest X-ray Diagnosis Models Across Multinational Datasets

Add code
May 21, 2025
Viaarxiv icon

CoreNet: Conflict Resolution Network for Point-Pixel Misalignment and Sub-Task Suppression of 3D LiDAR-Camera Object Detection

Add code
Jan 11, 2025
Figure 1 for CoreNet: Conflict Resolution Network for Point-Pixel Misalignment and Sub-Task Suppression of 3D LiDAR-Camera Object Detection
Figure 2 for CoreNet: Conflict Resolution Network for Point-Pixel Misalignment and Sub-Task Suppression of 3D LiDAR-Camera Object Detection
Figure 3 for CoreNet: Conflict Resolution Network for Point-Pixel Misalignment and Sub-Task Suppression of 3D LiDAR-Camera Object Detection
Figure 4 for CoreNet: Conflict Resolution Network for Point-Pixel Misalignment and Sub-Task Suppression of 3D LiDAR-Camera Object Detection
Viaarxiv icon

RCTrans: Radar-Camera Transformer via Radar Densifier and Sequential Decoder for 3D Object Detection

Add code
Dec 17, 2024
Viaarxiv icon

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing

Add code
Nov 25, 2024
Viaarxiv icon

Mitigating Object Hallucination via Concentric Causal Attention

Add code
Oct 21, 2024
Viaarxiv icon

QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation

Add code
Oct 09, 2024
Figure 1 for QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation
Figure 2 for QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation
Figure 3 for QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation
Figure 4 for QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation
Viaarxiv icon

Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation

Add code
Oct 09, 2024
Figure 1 for Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation
Figure 2 for Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation
Figure 3 for Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation
Figure 4 for Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation
Viaarxiv icon

WOMD-Reasoning: A Large-Scale Language Dataset for Interaction and Driving Intentions Reasoning

Add code
Jul 05, 2024
Viaarxiv icon