Alert button

"Text": models, code, and papers
Alert button

MUST&P-SRL: Multi-lingual and Unified Syllabification in Text and Phonetic Domains for Speech Representation Learning

Oct 17, 2023
Noé Tits

Viaarxiv icon

Enhancing Object Coherence in Layout-to-Image Synthesis

Nov 25, 2023
Yibin Wang, Weizhong Zhang, Jianwei Zheng, Cheng Jin

Viaarxiv icon

In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval

Sep 16, 2023
Nina Shvetsova, Anna Kukleva, Bernt Schiele, Hilde Kuehne

Figure 1 for In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Figure 2 for In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Figure 3 for In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Figure 4 for In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Viaarxiv icon

Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions

Nov 20, 2023
Ziyue Wang, Chi Chen, Peng Li, Yang Liu

Figure 1 for Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions
Figure 2 for Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions
Figure 3 for Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions
Figure 4 for Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions
Viaarxiv icon

Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion

Oct 05, 2023
Anton Razzhigaev, Arseniy Shakhmatov, Anastasia Maltseva, Vladimir Arkhipkin, Igor Pavlov, Ilya Ryabov, Angelina Kuts, Alexander Panchenko, Andrey Kuznetsov, Denis Dimitrov

Viaarxiv icon

Automated Annotation of Scientific Texts for ML-based Keyphrase Extraction and Validation

Nov 08, 2023
Oluwamayowa O. Amusat, Harshad Hegde, Christopher J. Mungall, Anna Giannakou, Neil P. Byers, Dan Gunter, Kjiersten Fagnan, Lavanya Ramakrishnan

Viaarxiv icon

Investigating the Emergent Audio Classification Ability of ASR Foundation Models

Nov 15, 2023
Rao Ma, Adian Liusie, Mark J. F. Gales, Kate M. Knill

Figure 1 for Investigating the Emergent Audio Classification Ability of ASR Foundation Models
Figure 2 for Investigating the Emergent Audio Classification Ability of ASR Foundation Models
Figure 3 for Investigating the Emergent Audio Classification Ability of ASR Foundation Models
Figure 4 for Investigating the Emergent Audio Classification Ability of ASR Foundation Models
Viaarxiv icon

Domain Aligned CLIP for Few-shot Classification

Nov 15, 2023
Muhammad Waleed Gondal, Jochen Gast, Inigo Alonso Ruiz, Richard Droste, Tommaso Macri, Suren Kumar, Luitpold Staudigl

Viaarxiv icon

Pinpoint, Not Criticize: Refining Large Language Models via Fine-Grained Actionable Feedback

Nov 15, 2023
Wenda Xu, Daniel Deutsch, Mara Finkelstein, Juraj Juraska, Biao Zhang, Zhongtao Liu, William Yang Wang, Lei Li, Markus Freitag

Viaarxiv icon

Speech language models lack important brain-relevant semantics

Nov 08, 2023
Subba Reddy Oota, Emin Çelik, Fatma Deniz, Mariya Toneva

Viaarxiv icon