Picture for Jacob Zhiyuan Fang

Jacob Zhiyuan Fang

StoryMem: Multi-shot Long Video Storytelling with Memory

Add code
Dec 22, 2025
Viaarxiv icon

MAGREF: Masked Guidance for Any-Reference Video Generation

Add code
May 29, 2025
Viaarxiv icon

ATI: Any Trajectory Instruction for Controllable Video Generation

Add code
May 28, 2025
Viaarxiv icon

CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance

Add code
Mar 13, 2025
Viaarxiv icon

FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation

Add code
May 08, 2024
Figure 1 for FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
Figure 2 for FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
Figure 3 for FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
Figure 4 for FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
Viaarxiv icon

E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer

Add code
Nov 28, 2023
Figure 1 for E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Figure 2 for E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Figure 3 for E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Figure 4 for E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Viaarxiv icon

Text-to-image Editing by Image Information Removal

Add code
May 27, 2023
Viaarxiv icon