Picture for Yuwei Fang

Yuwei Fang

VIMI: Grounding Video Generation through Multi-modal Instruction

Add code
Jul 08, 2024
Viaarxiv icon

VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing

Add code
Jun 18, 2024
Figure 1 for VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing
Figure 2 for VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing
Figure 3 for VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing
Figure 4 for VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing
Viaarxiv icon

MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation

Add code
Apr 17, 2024
Figure 1 for MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation
Figure 2 for MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation
Figure 3 for MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation
Figure 4 for MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation
Viaarxiv icon

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Add code
Feb 29, 2024
Figure 1 for Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Figure 2 for Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Figure 3 for Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Figure 4 for Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Viaarxiv icon

Evaluating Very Long-Term Conversational Memory of LLM Agents

Add code
Feb 27, 2024
Figure 1 for Evaluating Very Long-Term Conversational Memory of LLM Agents
Figure 2 for Evaluating Very Long-Term Conversational Memory of LLM Agents
Figure 3 for Evaluating Very Long-Term Conversational Memory of LLM Agents
Figure 4 for Evaluating Very Long-Term Conversational Memory of LLM Agents
Viaarxiv icon

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Add code
Feb 22, 2024
Figure 1 for Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Figure 2 for Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Figure 3 for Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Figure 4 for Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Viaarxiv icon

AToM: Amortized Text-to-Mesh using 2D Diffusion

Add code
Feb 01, 2024
Figure 1 for AToM: Amortized Text-to-Mesh using 2D Diffusion
Figure 2 for AToM: Amortized Text-to-Mesh using 2D Diffusion
Figure 3 for AToM: Amortized Text-to-Mesh using 2D Diffusion
Figure 4 for AToM: Amortized Text-to-Mesh using 2D Diffusion
Viaarxiv icon

PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning

Add code
Nov 15, 2023
Figure 1 for PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning
Figure 2 for PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning
Figure 3 for PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning
Figure 4 for PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning
Viaarxiv icon

i-Code Studio: A Configurable and Composable Framework for Integrative AI

Add code
May 23, 2023
Figure 1 for i-Code Studio: A Configurable and Composable Framework for Integrative AI
Figure 2 for i-Code Studio: A Configurable and Composable Framework for Integrative AI
Figure 3 for i-Code Studio: A Configurable and Composable Framework for Integrative AI
Figure 4 for i-Code Studio: A Configurable and Composable Framework for Integrative AI
Viaarxiv icon

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

Add code
May 21, 2023
Figure 1 for i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Figure 2 for i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Figure 3 for i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Figure 4 for i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Viaarxiv icon