Picture for Deepanway Ghosal

Deepanway Ghosal

Improving Text-To-Audio Models with Synthetic Captions

Add code
Jun 18, 2024
Viaarxiv icon

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Add code
Apr 16, 2024
Viaarxiv icon

PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns

Add code
Mar 20, 2024
Figure 1 for PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns
Figure 2 for PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns
Figure 3 for PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns
Figure 4 for PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns
Viaarxiv icon

Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning

Add code
Mar 13, 2024
Figure 1 for Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning
Figure 2 for Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning
Figure 3 for Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning
Figure 4 for Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning
Viaarxiv icon

Stuck in the Quicksand of Numeracy, Far from AGI Summit: Evaluating LLMs' Mathematical Competency through Ontology-guided Perturbations

Add code
Jan 17, 2024
Figure 1 for Stuck in the Quicksand of Numeracy, Far from AGI Summit: Evaluating LLMs' Mathematical Competency through Ontology-guided Perturbations
Figure 2 for Stuck in the Quicksand of Numeracy, Far from AGI Summit: Evaluating LLMs' Mathematical Competency through Ontology-guided Perturbations
Figure 3 for Stuck in the Quicksand of Numeracy, Far from AGI Summit: Evaluating LLMs' Mathematical Competency through Ontology-guided Perturbations
Figure 4 for Stuck in the Quicksand of Numeracy, Far from AGI Summit: Evaluating LLMs' Mathematical Competency through Ontology-guided Perturbations
Viaarxiv icon

Mustango: Toward Controllable Text-to-Music Generation

Add code
Nov 14, 2023
Figure 1 for Mustango: Toward Controllable Text-to-Music Generation
Figure 2 for Mustango: Toward Controllable Text-to-Music Generation
Figure 3 for Mustango: Toward Controllable Text-to-Music Generation
Figure 4 for Mustango: Toward Controllable Text-to-Music Generation
Viaarxiv icon

Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts

Add code
Oct 31, 2023
Figure 1 for Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts
Figure 2 for Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts
Figure 3 for Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts
Figure 4 for Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts
Viaarxiv icon

Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning

Add code
Jul 05, 2023
Figure 1 for Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning
Figure 2 for Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning
Figure 3 for Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning
Figure 4 for Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning
Viaarxiv icon

STOAT: Structured Data to Analytical Text With Controls

Add code
May 19, 2023
Figure 1 for STOAT: Structured Data to Analytical Text With Controls
Figure 2 for STOAT: Structured Data to Analytical Text With Controls
Figure 3 for STOAT: Structured Data to Analytical Text With Controls
Figure 4 for STOAT: Structured Data to Analytical Text With Controls
Viaarxiv icon

Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

Add code
Apr 24, 2023
Figure 1 for Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
Figure 2 for Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
Figure 3 for Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
Figure 4 for Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
Viaarxiv icon