Picture for Tsu-Jui Fu

Tsu-Jui Fu

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

Add code
May 29, 2024
Viaarxiv icon

From Text to Pixel: Advancing Long-Context Understanding in MLLMs

Add code
May 23, 2024
Viaarxiv icon

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Add code
Apr 11, 2024
Viaarxiv icon

Guiding Instruction-based Image Editing via Multimodal Large Language Models

Add code
Sep 29, 2023
Figure 1 for Guiding Instruction-based Image Editing via Multimodal Large Language Models
Figure 2 for Guiding Instruction-based Image Editing via Multimodal Large Language Models
Figure 3 for Guiding Instruction-based Image Editing via Multimodal Large Language Models
Figure 4 for Guiding Instruction-based Image Editing via Multimodal Large Language Models
Viaarxiv icon

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

Add code
Jul 12, 2023
Figure 1 for VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
Figure 2 for VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
Figure 3 for VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
Figure 4 for VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
Viaarxiv icon

Photoswap: Personalized Subject Swapping in Images

Add code
May 29, 2023
Figure 1 for Photoswap: Personalized Subject Swapping in Images
Figure 2 for Photoswap: Personalized Subject Swapping in Images
Figure 3 for Photoswap: Personalized Subject Swapping in Images
Figure 4 for Photoswap: Personalized Subject Swapping in Images
Viaarxiv icon

Text-guided 3D Human Generation from 2D Collections

Add code
May 23, 2023
Figure 1 for Text-guided 3D Human Generation from 2D Collections
Figure 2 for Text-guided 3D Human Generation from 2D Collections
Figure 3 for Text-guided 3D Human Generation from 2D Collections
Figure 4 for Text-guided 3D Human Generation from 2D Collections
Viaarxiv icon

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation

Add code
May 18, 2023
Figure 1 for Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation
Figure 2 for Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation
Figure 3 for Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation
Figure 4 for Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation
Viaarxiv icon

Discriminative Diffusion Models as Few-shot Vision and Language Learners

Add code
May 18, 2023
Figure 1 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Figure 2 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Figure 3 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Figure 4 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Viaarxiv icon

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

Add code
Dec 09, 2022
Figure 1 for Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Figure 2 for Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Figure 3 for Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Figure 4 for Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Viaarxiv icon