Picture for Yiheng Li

Yiheng Li

Multi-Modal Multi-Behavior Sequential Recommendation with Conditional Diffusion-Based Feature Denoising

Add code
Aug 07, 2025
Viaarxiv icon

Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation

Add code
Jun 06, 2025
Viaarxiv icon

Improved Immiscible Diffusion: Accelerate Diffusion Training by Reducing Its Miscibility

Add code
May 24, 2025
Viaarxiv icon

Benchmarking Chest X-ray Diagnosis Models Across Multinational Datasets

Add code
May 21, 2025
Viaarxiv icon

CoreNet: Conflict Resolution Network for Point-Pixel Misalignment and Sub-Task Suppression of 3D LiDAR-Camera Object Detection

Add code
Jan 11, 2025
Figure 1 for CoreNet: Conflict Resolution Network for Point-Pixel Misalignment and Sub-Task Suppression of 3D LiDAR-Camera Object Detection
Figure 2 for CoreNet: Conflict Resolution Network for Point-Pixel Misalignment and Sub-Task Suppression of 3D LiDAR-Camera Object Detection
Figure 3 for CoreNet: Conflict Resolution Network for Point-Pixel Misalignment and Sub-Task Suppression of 3D LiDAR-Camera Object Detection
Figure 4 for CoreNet: Conflict Resolution Network for Point-Pixel Misalignment and Sub-Task Suppression of 3D LiDAR-Camera Object Detection
Viaarxiv icon

RCTrans: Radar-Camera Transformer via Radar Densifier and Sequential Decoder for 3D Object Detection

Add code
Dec 17, 2024
Figure 1 for RCTrans: Radar-Camera Transformer via Radar Densifier and Sequential Decoder for 3D Object Detection
Figure 2 for RCTrans: Radar-Camera Transformer via Radar Densifier and Sequential Decoder for 3D Object Detection
Figure 3 for RCTrans: Radar-Camera Transformer via Radar Densifier and Sequential Decoder for 3D Object Detection
Figure 4 for RCTrans: Radar-Camera Transformer via Radar Densifier and Sequential Decoder for 3D Object Detection
Viaarxiv icon

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing

Add code
Nov 25, 2024
Figure 1 for UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Figure 2 for UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Figure 3 for UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Figure 4 for UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Viaarxiv icon

Mitigating Object Hallucination via Concentric Causal Attention

Add code
Oct 21, 2024
Viaarxiv icon

Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation

Add code
Oct 09, 2024
Figure 1 for Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation
Figure 2 for Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation
Figure 3 for Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation
Figure 4 for Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation
Viaarxiv icon

QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation

Add code
Oct 09, 2024
Figure 1 for QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation
Figure 2 for QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation
Figure 3 for QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation
Figure 4 for QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation
Viaarxiv icon