Picture for Ji Xie

Ji Xie

Unified Video Editing with Temporal Reasoner

Add code
Dec 08, 2025
Viaarxiv icon

Reconstruction Alignment Improves Unified Multimodal Models

Add code
Sep 08, 2025
Viaarxiv icon

Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning

Add code
Aug 13, 2025
Viaarxiv icon

Seeing Sound, Hearing Sight: Uncovering Modality Bias and Conflict of AI models in Sound Localization

Add code
May 16, 2025
Viaarxiv icon

In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer

Add code
Apr 29, 2025
Viaarxiv icon

3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering

Add code
Jan 09, 2025
Viaarxiv icon

3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation

Add code
Oct 16, 2024
Figure 1 for 3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation
Figure 2 for 3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation
Figure 3 for 3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation
Figure 4 for 3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation
Viaarxiv icon