Alert button

"Text": models, code, and papers
Alert button

The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation

Nov 22, 2023
Ilaria Manco, Benno Weck, SeungHeon Doh, Minz Won, Yixiao Zhang, Dmitry Bogdanov, Yusong Wu, Ke Chen, Philip Tovstogan, Emmanouil Benetos, Elio Quinton, György Fazekas, Juhan Nam

Viaarxiv icon

Understanding the Vulnerability of CLIP to Image Compression

Nov 23, 2023
Cangxiong Chen, Vinay P. Namboodiri, Julian Padget

Viaarxiv icon

Learning Mutually Informed Representations for Characters and Subwords

Nov 14, 2023
Yilin Wang, Xinyi Hu, Matthew R. Gormley

Figure 1 for Learning Mutually Informed Representations for Characters and Subwords
Figure 2 for Learning Mutually Informed Representations for Characters and Subwords
Figure 3 for Learning Mutually Informed Representations for Characters and Subwords
Figure 4 for Learning Mutually Informed Representations for Characters and Subwords
Viaarxiv icon

Empathy Detection Using Machine Learning on Text, Audiovisual, Audio or Physiological Signals

Oct 30, 2023
Md Rakibul Hasan, Md Zakir Hossain, Shreya Ghosh, Susannah Soon, Tom Gedeon

Viaarxiv icon

LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts

Oct 16, 2023
Hanan Gani, Shariq Farooq Bhat, Muzammal Naseer, Salman Khan, Peter Wonka

Viaarxiv icon

Radiology Report Generation Using Transformers Conditioned with Non-imaging Data

Nov 18, 2023
Nurbanu Aksoy, Nishant Ravikumar, Alejandro F Frangi

Viaarxiv icon

GELDA: A generative language annotation framework to reveal visual biases in datasets

Nov 29, 2023
Krish Kabra, Kathleen M. Lewis, Guha Balakrishnan

Viaarxiv icon

VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model

Nov 29, 2023
Haoyu Zhao, Tianyi Lu, Jiaxi Gu, Xing Zhang, Zuxuan Wu, Hang Xu, Yu-Gang Jiang

Viaarxiv icon

Improving Compositional Text-to-image Generation with Large Vision-Language Models

Oct 10, 2023
Song Wen, Guian Fang, Renrui Zhang, Peng Gao, Hao Dong, Dimitris Metaxas

Figure 1 for Improving Compositional Text-to-image Generation with Large Vision-Language Models
Figure 2 for Improving Compositional Text-to-image Generation with Large Vision-Language Models
Figure 3 for Improving Compositional Text-to-image Generation with Large Vision-Language Models
Figure 4 for Improving Compositional Text-to-image Generation with Large Vision-Language Models
Viaarxiv icon

FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing

Oct 09, 2023
Yuren Cong, Mengmeng Xu, Christian Simon, Shoufa Chen, Jiawei Ren, Yanping Xie, Juan-Manuel Perez-Rua, Bodo Rosenhahn, Tao Xiang, Sen He

Figure 1 for FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
Figure 2 for FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
Figure 3 for FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
Figure 4 for FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
Viaarxiv icon