Picture for Dandan Zheng

Dandan Zheng

Perceptual Flow Network for Visually Grounded Reasoning

Add code
May 04, 2026
Viaarxiv icon

TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders

Add code
Apr 08, 2026
Viaarxiv icon

TriC-Motion: Tri-Domain Causal Modeling Grounded Text-to-Motion Generation

Add code
Feb 09, 2026
Viaarxiv icon

GO-MLVTON: Garment Occlusion-Aware Multi-Layer Virtual Try-On with Diffusion Models

Add code
Jan 20, 2026
Viaarxiv icon

3SGen: Unified Subject, Style, and Structure-Driven Image Generation with Adaptive Task-specific Memory

Add code
Dec 22, 2025
Figure 1 for 3SGen: Unified Subject, Style, and Structure-Driven Image Generation with Adaptive Task-specific Memory
Figure 2 for 3SGen: Unified Subject, Style, and Structure-Driven Image Generation with Adaptive Task-specific Memory
Figure 3 for 3SGen: Unified Subject, Style, and Structure-Driven Image Generation with Adaptive Task-specific Memory
Figure 4 for 3SGen: Unified Subject, Style, and Structure-Driven Image Generation with Adaptive Task-specific Memory
Viaarxiv icon

ARGenSeg: Image Segmentation with Autoregressive Image Generation Model

Add code
Oct 23, 2025
Figure 1 for ARGenSeg: Image Segmentation with Autoregressive Image Generation Model
Figure 2 for ARGenSeg: Image Segmentation with Autoregressive Image Generation Model
Figure 3 for ARGenSeg: Image Segmentation with Autoregressive Image Generation Model
Figure 4 for ARGenSeg: Image Segmentation with Autoregressive Image Generation Model
Viaarxiv icon

PruneHal: Reducing Hallucinations in Multi-modal Large Language Models through Adaptive KV Cache Pruning

Add code
Oct 22, 2025
Viaarxiv icon

Ming-Omni: A Unified Multimodal Model for Perception and Generation

Add code
Jun 11, 2025
Figure 1 for Ming-Omni: A Unified Multimodal Model for Perception and Generation
Figure 2 for Ming-Omni: A Unified Multimodal Model for Perception and Generation
Figure 3 for Ming-Omni: A Unified Multimodal Model for Perception and Generation
Figure 4 for Ming-Omni: A Unified Multimodal Model for Perception and Generation
Viaarxiv icon

Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

Add code
May 05, 2025
Figure 1 for Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Figure 2 for Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Figure 3 for Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Figure 4 for Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Viaarxiv icon

A Continual Learning-driven Model for Accurate and Generalizable Segmentation of Clinically Comprehensive and Fine-grained Whole-body Anatomies in CT

Add code
Mar 16, 2025
Viaarxiv icon