Picture for Yiheng Li

Yiheng Li

Fine-Grained Representation for Lane Topology Reasoning

Add code
Nov 18, 2025
Figure 1 for Fine-Grained Representation for Lane Topology Reasoning
Figure 2 for Fine-Grained Representation for Lane Topology Reasoning
Figure 3 for Fine-Grained Representation for Lane Topology Reasoning
Figure 4 for Fine-Grained Representation for Lane Topology Reasoning
Viaarxiv icon

Explicit Temporal-Semantic Modeling for Dense Video Captioning via Context-Aware Cross-Modal Interaction

Add code
Nov 13, 2025
Viaarxiv icon

Grounding Foundational Vision Models with 3D Human Poses for Robust Action Recognition

Add code
Nov 06, 2025
Viaarxiv icon

Multi-Modal Multi-Behavior Sequential Recommendation with Conditional Diffusion-Based Feature Denoising

Add code
Aug 07, 2025
Figure 1 for Multi-Modal Multi-Behavior Sequential Recommendation with Conditional Diffusion-Based Feature Denoising
Figure 2 for Multi-Modal Multi-Behavior Sequential Recommendation with Conditional Diffusion-Based Feature Denoising
Figure 3 for Multi-Modal Multi-Behavior Sequential Recommendation with Conditional Diffusion-Based Feature Denoising
Figure 4 for Multi-Modal Multi-Behavior Sequential Recommendation with Conditional Diffusion-Based Feature Denoising
Viaarxiv icon

Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation

Add code
Jun 06, 2025
Figure 1 for Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation
Figure 2 for Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation
Figure 3 for Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation
Figure 4 for Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation
Viaarxiv icon

Improved Immiscible Diffusion: Accelerate Diffusion Training by Reducing Its Miscibility

Add code
May 24, 2025
Viaarxiv icon

Benchmarking Chest X-ray Diagnosis Models Across Multinational Datasets

Add code
May 21, 2025
Viaarxiv icon

CoreNet: Conflict Resolution Network for Point-Pixel Misalignment and Sub-Task Suppression of 3D LiDAR-Camera Object Detection

Add code
Jan 11, 2025
Figure 1 for CoreNet: Conflict Resolution Network for Point-Pixel Misalignment and Sub-Task Suppression of 3D LiDAR-Camera Object Detection
Figure 2 for CoreNet: Conflict Resolution Network for Point-Pixel Misalignment and Sub-Task Suppression of 3D LiDAR-Camera Object Detection
Figure 3 for CoreNet: Conflict Resolution Network for Point-Pixel Misalignment and Sub-Task Suppression of 3D LiDAR-Camera Object Detection
Figure 4 for CoreNet: Conflict Resolution Network for Point-Pixel Misalignment and Sub-Task Suppression of 3D LiDAR-Camera Object Detection
Viaarxiv icon

RCTrans: Radar-Camera Transformer via Radar Densifier and Sequential Decoder for 3D Object Detection

Add code
Dec 17, 2024
Figure 1 for RCTrans: Radar-Camera Transformer via Radar Densifier and Sequential Decoder for 3D Object Detection
Figure 2 for RCTrans: Radar-Camera Transformer via Radar Densifier and Sequential Decoder for 3D Object Detection
Figure 3 for RCTrans: Radar-Camera Transformer via Radar Densifier and Sequential Decoder for 3D Object Detection
Figure 4 for RCTrans: Radar-Camera Transformer via Radar Densifier and Sequential Decoder for 3D Object Detection
Viaarxiv icon

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing

Add code
Nov 25, 2024
Figure 1 for UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Figure 2 for UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Figure 3 for UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Figure 4 for UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Viaarxiv icon