Alert button

"Text": models, code, and papers
Alert button

LLM-driven Multimodal Target Volume Contouring in Radiation Oncology

Nov 03, 2023
Yujin Oh, Sangjoon Park, Hwa Kyung Byun, Jin Sung Kim, Jong Chul Ye

Figure 1 for LLM-driven Multimodal Target Volume Contouring in Radiation Oncology
Figure 2 for LLM-driven Multimodal Target Volume Contouring in Radiation Oncology
Figure 3 for LLM-driven Multimodal Target Volume Contouring in Radiation Oncology
Figure 4 for LLM-driven Multimodal Target Volume Contouring in Radiation Oncology
Viaarxiv icon

Violet: A Vision-Language Model for Arabic Image Captioning with Gemini Decoder

Nov 15, 2023
Abdelrahman Mohamed, Fakhraddin Alwajih, El Moatez Billah Nagoudi, Alcides Alcoba Inciarte, Muhammad Abdul-Mageed

Viaarxiv icon

To token or not to token: A Comparative Study of Text Representations for Cross-Lingual Transfer

Oct 12, 2023
Md Mushfiqur Rahman, Fardin Ahsan Sakib, Fahim Faisal, Antonios Anastasopoulos

Viaarxiv icon

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition

Sep 29, 2023
Pan Zhang, Xiaoyi Dong, Bin Wang, Yuhang Cao, Chao Xu, Linke Ouyang, Zhiyuan Zhao, Shuangrui Ding, Songyang Zhang, Haodong Duan, Hang Yan, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang

Figure 1 for InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Figure 2 for InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Figure 3 for InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Figure 4 for InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Viaarxiv icon

Semantic-aware Video Representation for Few-shot Action Recognition

Nov 10, 2023
Yutao Tang, Benjamin Bejar, Rene Vidal

Viaarxiv icon

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

Sep 27, 2023
David Junhao Zhang, Jay Zhangjie Wu, Jia-Wei Liu, Rui Zhao, Lingmin Ran, Yuchao Gu, Difei Gao, Mike Zheng Shou

Figure 1 for Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Figure 2 for Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Figure 3 for Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Figure 4 for Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Viaarxiv icon

Act As You Wish: Fine-Grained Control of Motion Diffusion Model with Hierarchical Semantic Graphs

Nov 02, 2023
Peng Jin, Yang Wu, Yanbo Fan, Zhongqian Sun, Yang Wei, Li Yuan

Viaarxiv icon

Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective

Oct 16, 2023
Huayang Li, Tian Lan, Zihao Fu, Deng Cai, Lemao Liu, Nigel Collier, Taro Watanabe, Yixuan Su

Viaarxiv icon

Personalizing Keyword Spotting with Speaker Information

Nov 06, 2023
Beltrán Labrador, Pai Zhu, Guanlong Zhao, Angelo Scorza Scarpati, Quan Wang, Alicia Lozano-Diez, Alex Park, Ignacio López Moreno

Viaarxiv icon

DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines

Nov 17, 2023
Chenyu Jiang, Zhen Jia, Shuai Zheng, Yida Wang, Chuan Wu

Viaarxiv icon