Picture for Xinhan Di

Xinhan Di

Towards Video to Piano Music Generation with Chain-of-Perform Support Benchmarks

Add code
May 26, 2025
Viaarxiv icon

MM-MovieDubber: Towards Multi-Modal Learning for Multi-Modal Movie Dubbing

Add code
May 22, 2025
Viaarxiv icon

Towards Film-Making Production Dialogue, Narration, Monologue Adaptive Moving Dubbing Benchmarks

Add code
Apr 30, 2025
Viaarxiv icon

OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance

Add code
Apr 07, 2025
Viaarxiv icon

DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance

Add code
Mar 31, 2025
Viaarxiv icon

DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation

Add code
Mar 28, 2025
Viaarxiv icon

Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization

Add code
Mar 28, 2025
Viaarxiv icon

DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos

Add code
Mar 28, 2025
Viaarxiv icon

Attentional Triple-Encoder Network in Spatiospectral Domains for Medical Image Segmentation

Add code
Mar 20, 2025
Viaarxiv icon

Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search

Add code
Jan 02, 2025
Figure 1 for Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search
Figure 2 for Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search
Figure 3 for Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search
Viaarxiv icon