Picture for Yu Wu

Yu Wu

Wuhan University

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation

Add code
Jun 15, 2024
Figure 1 for CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Figure 2 for CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Figure 3 for CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Figure 4 for CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Viaarxiv icon

From Space-Time to Space-Order: Directly Planning a Temporal Planning Graph by Redefining CBS

Add code
Apr 23, 2024
Viaarxiv icon

D$^3$: Scaling Up Deepfake Detection by Learning from Discrepancy

Add code
Apr 06, 2024
Figure 1 for D$^3$: Scaling Up Deepfake Detection by Learning from Discrepancy
Figure 2 for D$^3$: Scaling Up Deepfake Detection by Learning from Discrepancy
Figure 3 for D$^3$: Scaling Up Deepfake Detection by Learning from Discrepancy
Figure 4 for D$^3$: Scaling Up Deepfake Detection by Learning from Discrepancy
Viaarxiv icon

Improving Bird's Eye View Semantic Segmentation by Task Decomposition

Add code
Apr 02, 2024
Figure 1 for Improving Bird's Eye View Semantic Segmentation by Task Decomposition
Figure 2 for Improving Bird's Eye View Semantic Segmentation by Task Decomposition
Figure 3 for Improving Bird's Eye View Semantic Segmentation by Task Decomposition
Figure 4 for Improving Bird's Eye View Semantic Segmentation by Task Decomposition
Viaarxiv icon

Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning

Add code
Apr 01, 2024
Viaarxiv icon

Advanced Long-Content Speech Recognition With Factorized Neural Transducer

Add code
Mar 20, 2024
Figure 1 for Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Figure 2 for Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Figure 3 for Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Figure 4 for Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Viaarxiv icon

Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning

Add code
Feb 18, 2024
Figure 1 for Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Figure 2 for Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Figure 3 for Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Figure 4 for Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Viaarxiv icon

DVIS++: Improved Decoupled Framework for Universal Video Segmentation

Add code
Dec 20, 2023
Figure 1 for DVIS++: Improved Decoupled Framework for Universal Video Segmentation
Figure 2 for DVIS++: Improved Decoupled Framework for Universal Video Segmentation
Figure 3 for DVIS++: Improved Decoupled Framework for Universal Video Segmentation
Figure 4 for DVIS++: Improved Decoupled Framework for Universal Video Segmentation
Viaarxiv icon

DETER: Detecting Edited Regions for Deterring Generative Manipulations

Add code
Dec 16, 2023
Viaarxiv icon

Intelligent-Reflecting-Surface-Assisted UAV Communications for 6G Networks

Add code
Oct 31, 2023
Viaarxiv icon