Picture for Yongdong Zhang

Yongdong Zhang

Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking

Add code
May 26, 2025
Viaarxiv icon

Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models

Add code
May 26, 2025
Viaarxiv icon

Leveraging Robust Optimization for LLM Alignment under Distribution Shifts

Add code
Apr 08, 2025
Viaarxiv icon

HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation

Add code
Mar 31, 2025
Viaarxiv icon

Mask$^2$DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation

Add code
Mar 25, 2025
Viaarxiv icon

SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability

Add code
Mar 18, 2025
Viaarxiv icon

OmniPrism: Learning Disentangled Visual Concept for Image Generation

Add code
Dec 16, 2024
Figure 1 for OmniPrism: Learning Disentangled Visual Concept for Image Generation
Figure 2 for OmniPrism: Learning Disentangled Visual Concept for Image Generation
Figure 3 for OmniPrism: Learning Disentangled Visual Concept for Image Generation
Figure 4 for OmniPrism: Learning Disentangled Visual Concept for Image Generation
Viaarxiv icon

LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation

Add code
Dec 13, 2024
Figure 1 for LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation
Figure 2 for LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation
Figure 3 for LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation
Figure 4 for LMAgent: A Large-scale Multimodal Agents Society for Multi-user Simulation
Viaarxiv icon

A Graph-Based Synthetic Data Pipeline for Scaling High-Quality Reasoning Instructions

Add code
Dec 12, 2024
Viaarxiv icon

T-SVG: Text-Driven Stereoscopic Video Generation

Add code
Dec 12, 2024
Viaarxiv icon