Alert button

"Text": models, code, and papers
Alert button

Semantic Draw Engineering for Text-to-Image Creation

Dec 23, 2023
Yang Li, Huaqiang Jiang, Yangkai Wu

Viaarxiv icon

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

Dec 21, 2023
Yiming Zhang, Zhening Xing, Yanhong Zeng, Youqing Fang, Kai Chen

Viaarxiv icon

Learning Audio Concepts from Counterfactual Natural Language

Jan 10, 2024
Ali Vosoughi, Luca Bondi, Ho-Hsiang Wu, Chenliang Xu

Viaarxiv icon

VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder

Dec 18, 2023
Zhicong Tang, Shuyang Gu, Chunyu Wang, Ting Zhang, Jianmin Bao, Dong Chen, Baining Guo

Viaarxiv icon

Temporal Insight Enhancement: Mitigating Temporal Hallucination in Multimodal Large Language Models

Jan 18, 2024
Li Sun, Liuan Wang, Jun Sun, Takayuki Okatani

Viaarxiv icon

WisdoM: Improving Multimodal Sentiment Analysis by Fusing Contextual World Knowledge

Jan 12, 2024
Wenbin Wang, Liang Ding, Li Shen, Yong Luo, Han Hu, Dacheng Tao

Viaarxiv icon

Application Of Vision-Language Models For Assessing Osteoarthritis Disease Severity

Jan 12, 2024
Banafshe Felfeliyan, Yuyue Zhou, Shrimanti Ghosh, Jessica Kupper, Shaobo Liu, Abhilash Hareendranathan, Jacob L. Jaremko

Viaarxiv icon

Aligned with LLM: a new multi-modal training paradigm for encoding fMRI activity in visual cortex

Jan 08, 2024
Shuxiao Ma, Linyuan Wang, Senbao Hou, Bin Yan

Viaarxiv icon

LoMA: Lossless Compressed Memory Attention

Jan 16, 2024
Yumeng Wang, Zhenyang Xiao

Viaarxiv icon

Semantic Guidance Tuning for Text-To-Image Diffusion Models

Dec 26, 2023
Hyun Kang, Dohae Lee, Myungjin Shin, In-Kwon Lee

Viaarxiv icon