Alert button

"Text": models, code, and papers
Alert button

Learning Audio Concepts from Counterfactual Natural Language

Jan 10, 2024
Ali Vosoughi, Luca Bondi, Ho-Hsiang Wu, Chenliang Xu

Viaarxiv icon

VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder

Dec 18, 2023
Zhicong Tang, Shuyang Gu, Chunyu Wang, Ting Zhang, Jianmin Bao, Dong Chen, Baining Guo

Viaarxiv icon

Temporal Insight Enhancement: Mitigating Temporal Hallucination in Multimodal Large Language Models

Jan 18, 2024
Li Sun, Liuan Wang, Jun Sun, Takayuki Okatani

Viaarxiv icon

WisdoM: Improving Multimodal Sentiment Analysis by Fusing Contextual World Knowledge

Jan 12, 2024
Wenbin Wang, Liang Ding, Li Shen, Yong Luo, Han Hu, Dacheng Tao

Viaarxiv icon

Application Of Vision-Language Models For Assessing Osteoarthritis Disease Severity

Jan 12, 2024
Banafshe Felfeliyan, Yuyue Zhou, Shrimanti Ghosh, Jessica Kupper, Shaobo Liu, Abhilash Hareendranathan, Jacob L. Jaremko

Viaarxiv icon

Aligned with LLM: a new multi-modal training paradigm for encoding fMRI activity in visual cortex

Jan 08, 2024
Shuxiao Ma, Linyuan Wang, Senbao Hou, Bin Yan

Viaarxiv icon

LoMA: Lossless Compressed Memory Attention

Jan 16, 2024
Yumeng Wang, Zhenyang Xiao

Viaarxiv icon

Semantic Guidance Tuning for Text-To-Image Diffusion Models

Dec 26, 2023
Hyun Kang, Dohae Lee, Myungjin Shin, In-Kwon Lee

Viaarxiv icon

PIXAR: Auto-Regressive Language Modeling in Pixel Space

Jan 06, 2024
Yintao Tai, Xiyang Liao, Alessandro Suglia, Antonio Vergari

Viaarxiv icon

Closing the Gap between TD Learning and Supervised Learning -- A Generalisation Point of View

Jan 20, 2024
Raj Ghugare, Matthieu Geist, Glen Berseth, Benjamin Eysenbach

Viaarxiv icon