Alert button
Picture for Tsu-Jui Fu

Tsu-Jui Fu

Alert button

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Add code
Bookmark button
Alert button
Apr 11, 2024
Haotian Zhang, Haoxuan You, Philipp Dufter, Bowen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, Yinfei Yang

Viaarxiv icon

Guiding Instruction-based Image Editing via Multimodal Large Language Models

Add code
Bookmark button
Alert button
Sep 29, 2023
Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, Zhe Gan

Figure 1 for Guiding Instruction-based Image Editing via Multimodal Large Language Models
Figure 2 for Guiding Instruction-based Image Editing via Multimodal Large Language Models
Figure 3 for Guiding Instruction-based Image Editing via Multimodal Large Language Models
Figure 4 for Guiding Instruction-based Image Editing via Multimodal Large Language Models
Viaarxiv icon

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

Add code
Bookmark button
Alert button
Jul 12, 2023
Raphael Schumann, Wanrong Zhu, Weixi Feng, Tsu-Jui Fu, Stefan Riezler, William Yang Wang

Figure 1 for VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
Figure 2 for VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
Figure 3 for VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
Figure 4 for VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
Viaarxiv icon

Photoswap: Personalized Subject Swapping in Images

Add code
Bookmark button
Alert button
May 29, 2023
Jing Gu, Yilin Wang, Nanxuan Zhao, Tsu-Jui Fu, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung, Xin Eric Wang

Figure 1 for Photoswap: Personalized Subject Swapping in Images
Figure 2 for Photoswap: Personalized Subject Swapping in Images
Figure 3 for Photoswap: Personalized Subject Swapping in Images
Figure 4 for Photoswap: Personalized Subject Swapping in Images
Viaarxiv icon

Text-guided 3D Human Generation from 2D Collections

Add code
Bookmark button
Alert button
May 23, 2023
Tsu-Jui Fu, Wenhan Xiong, Yixin Nie, Jingyu Liu, Barlas Oğuz, William Yang Wang

Figure 1 for Text-guided 3D Human Generation from 2D Collections
Figure 2 for Text-guided 3D Human Generation from 2D Collections
Figure 3 for Text-guided 3D Human Generation from 2D Collections
Figure 4 for Text-guided 3D Human Generation from 2D Collections
Viaarxiv icon

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation

Add code
Bookmark button
Alert button
May 18, 2023
Wanrong Zhu, Xinyi Wang, Yujie Lu, Tsu-Jui Fu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

Figure 1 for Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation
Figure 2 for Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation
Figure 3 for Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation
Figure 4 for Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation
Viaarxiv icon

Discriminative Diffusion Models as Few-shot Vision and Language Learners

Add code
Bookmark button
Alert button
May 18, 2023
Xuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

Figure 1 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Figure 2 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Figure 3 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Figure 4 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Viaarxiv icon

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

Add code
Bookmark button
Alert button
Dec 09, 2022
Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, William Yang Wang

Figure 1 for Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Figure 2 for Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Figure 3 for Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Figure 4 for Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Viaarxiv icon

Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation

Add code
Bookmark button
Alert button
Nov 23, 2022
Tsu-Jui Fu, Licheng Yu, Ning Zhang, Cheng-Yang Fu, Jong-Chyi Su, William Yang Wang, Sean Bell

Figure 1 for Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
Figure 2 for Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
Figure 3 for Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
Figure 4 for Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
Viaarxiv icon

CPL: Counterfactual Prompt Learning for Vision and Language Models

Add code
Bookmark button
Alert button
Oct 19, 2022
Xuehai He, Diji Yang, Weixi Feng, Tsu-Jui Fu, Arjun Akula, Varun Jampani, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

Figure 1 for CPL: Counterfactual Prompt Learning for Vision and Language Models
Figure 2 for CPL: Counterfactual Prompt Learning for Vision and Language Models
Figure 3 for CPL: Counterfactual Prompt Learning for Vision and Language Models
Figure 4 for CPL: Counterfactual Prompt Learning for Vision and Language Models
Viaarxiv icon