Picture for Sugato Basu

Sugato Basu

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

Add code
May 29, 2024
Viaarxiv icon

KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models

Add code
May 28, 2023
Figure 1 for KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models
Figure 2 for KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models
Figure 3 for KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models
Figure 4 for KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models
Viaarxiv icon

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

Add code
May 24, 2023
Figure 1 for LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
Figure 2 for LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
Figure 3 for LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
Figure 4 for LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
Viaarxiv icon

Discriminative Diffusion Models as Few-shot Vision and Language Learners

Add code
May 18, 2023
Figure 1 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Figure 2 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Figure 3 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Figure 4 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Viaarxiv icon

MetaCLUE: Towards Comprehensive Visual Metaphors Research

Add code
Dec 19, 2022
Figure 1 for MetaCLUE: Towards Comprehensive Visual Metaphors Research
Figure 2 for MetaCLUE: Towards Comprehensive Visual Metaphors Research
Figure 3 for MetaCLUE: Towards Comprehensive Visual Metaphors Research
Figure 4 for MetaCLUE: Towards Comprehensive Visual Metaphors Research
Viaarxiv icon

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

Add code
Dec 09, 2022
Figure 1 for Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Figure 2 for Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Figure 3 for Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Figure 4 for Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Viaarxiv icon

CPL: Counterfactual Prompt Learning for Vision and Language Models

Add code
Oct 19, 2022
Figure 1 for CPL: Counterfactual Prompt Learning for Vision and Language Models
Figure 2 for CPL: Counterfactual Prompt Learning for Vision and Language Models
Figure 3 for CPL: Counterfactual Prompt Learning for Vision and Language Models
Figure 4 for CPL: Counterfactual Prompt Learning for Vision and Language Models
Viaarxiv icon

Diagnosing Vision-and-Language Navigation: What Really Matters

Add code
Mar 30, 2021
Figure 1 for Diagnosing Vision-and-Language Navigation: What Really Matters
Figure 2 for Diagnosing Vision-and-Language Navigation: What Really Matters
Figure 3 for Diagnosing Vision-and-Language Navigation: What Really Matters
Figure 4 for Diagnosing Vision-and-Language Navigation: What Really Matters
Viaarxiv icon

A Framework for Deep Constrained Clustering

Add code
Jan 07, 2021
Figure 1 for A Framework for Deep Constrained Clustering
Figure 2 for A Framework for Deep Constrained Clustering
Figure 3 for A Framework for Deep Constrained Clustering
Figure 4 for A Framework for Deep Constrained Clustering
Viaarxiv icon

Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations

Add code
Oct 07, 2020
Figure 1 for Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations
Figure 2 for Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations
Figure 3 for Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations
Figure 4 for Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations
Viaarxiv icon