Picture for Di Wu

Di Wu

School of Optoelectronic Science and Engineering, Soochow University

RoboMIND 2.0: A Multimodal, Bimanual Mobile Manipulation Dataset for Generalizable Embodied Intelligence

Add code
Dec 31, 2025
Viaarxiv icon

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Add code
Dec 23, 2025
Viaarxiv icon

StereoMV2D: A Sparse Temporal Stereo-Enhanced Framework for Robust Multi-View 3D Object Detection

Add code
Dec 19, 2025
Viaarxiv icon

Structure-Aware Decoding Mechanisms for Complex Entity Extraction with Large-Scale Language Models

Add code
Dec 16, 2025
Viaarxiv icon

Multi-Intent Spoken Language Understanding: Methods, Trends, and Challenges

Add code
Dec 12, 2025
Viaarxiv icon

CKM-Enabled Joint Spatial-Doppler Domain Clutter Suppression for Low-Altitude UAV ISAC

Add code
Dec 10, 2025
Viaarxiv icon

LEMUR: Large scale End-to-end MUltimodal Recommendation

Add code
Nov 17, 2025
Viaarxiv icon

Rethinking Multimodal Point Cloud Completion: A Completion-by-Correction Perspective

Add code
Nov 15, 2025
Viaarxiv icon

Rethinking Crystal Symmetry Prediction: A Decoupled Perspective

Add code
Nov 10, 2025
Viaarxiv icon

Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models

Add code
Nov 10, 2025
Viaarxiv icon