Picture for Yuanze Lin

Yuanze Lin

Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge

Add code
Jul 05, 2024
Viaarxiv icon

DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion

Add code
Mar 25, 2024
Figure 1 for DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
Figure 2 for DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
Figure 3 for DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
Figure 4 for DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
Viaarxiv icon

Text-Driven Image Editing via Learnable Regions

Add code
Nov 28, 2023
Figure 1 for Text-Driven Image Editing via Learnable Regions
Figure 2 for Text-Driven Image Editing via Learnable Regions
Figure 3 for Text-Driven Image Editing via Learnable Regions
Figure 4 for Text-Driven Image Editing via Learnable Regions
Viaarxiv icon

SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training

Add code
Nov 30, 2022
Figure 1 for SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Figure 2 for SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Figure 3 for SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Figure 4 for SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Viaarxiv icon

REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

Add code
Jun 02, 2022
Figure 1 for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
Figure 2 for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
Figure 3 for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
Figure 4 for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
Viaarxiv icon

Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

Add code
Mar 22, 2022
Figure 1 for Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Figure 2 for Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Figure 3 for Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Figure 4 for Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Viaarxiv icon

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Add code
Dec 28, 2021
Figure 1 for AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition
Figure 2 for AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition
Figure 3 for AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition
Figure 4 for AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition
Viaarxiv icon

Self-Supervised Video Representation Learning with Meta-Contrastive Network

Add code
Aug 23, 2021
Figure 1 for Self-Supervised Video Representation Learning with Meta-Contrastive Network
Figure 2 for Self-Supervised Video Representation Learning with Meta-Contrastive Network
Figure 3 for Self-Supervised Video Representation Learning with Meta-Contrastive Network
Figure 4 for Self-Supervised Video Representation Learning with Meta-Contrastive Network
Viaarxiv icon