Alert button

"Text": models, code, and papers
Alert button

On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation

Nov 18, 2023
Duy Minh Ho Nguyen, Tan Ngoc Pham, Nghiem Tuong Diep, Nghi Quoc Phan, Quang Pham, Vinh Tong, Binh T. Nguyen, Ngan Hoang Le, Nhat Ho, Pengtao Xie, Daniel Sonntag, Mathias Niepert

Viaarxiv icon

Best uses of ChatGPT and Generative AI for computer science research

Nov 18, 2023
Eduardo C. Garrido-Merchan

Viaarxiv icon

MixCon3D: Synergizing Multi-View and Cross-Modal Contrastive Learning for Enhancing 3D Representation

Nov 03, 2023
Yipeng Gao, Zeyu Wang, Wei-Shi Zheng, Cihang Xie, Yuyin Zhou

Viaarxiv icon

LLM-driven Multimodal Target Volume Contouring in Radiation Oncology

Nov 03, 2023
Yujin Oh, Sangjoon Park, Hwa Kyung Byun, Jin Sung Kim, Jong Chul Ye

Figure 1 for LLM-driven Multimodal Target Volume Contouring in Radiation Oncology
Figure 2 for LLM-driven Multimodal Target Volume Contouring in Radiation Oncology
Figure 3 for LLM-driven Multimodal Target Volume Contouring in Radiation Oncology
Figure 4 for LLM-driven Multimodal Target Volume Contouring in Radiation Oncology
Viaarxiv icon

Violet: A Vision-Language Model for Arabic Image Captioning with Gemini Decoder

Nov 15, 2023
Abdelrahman Mohamed, Fakhraddin Alwajih, El Moatez Billah Nagoudi, Alcides Alcoba Inciarte, Muhammad Abdul-Mageed

Viaarxiv icon

To token or not to token: A Comparative Study of Text Representations for Cross-Lingual Transfer

Oct 12, 2023
Md Mushfiqur Rahman, Fardin Ahsan Sakib, Fahim Faisal, Antonios Anastasopoulos

Viaarxiv icon

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition

Sep 29, 2023
Pan Zhang, Xiaoyi Dong, Bin Wang, Yuhang Cao, Chao Xu, Linke Ouyang, Zhiyuan Zhao, Shuangrui Ding, Songyang Zhang, Haodong Duan, Hang Yan, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang

Figure 1 for InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Figure 2 for InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Figure 3 for InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Figure 4 for InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Viaarxiv icon

Semantic-aware Video Representation for Few-shot Action Recognition

Nov 10, 2023
Yutao Tang, Benjamin Bejar, Rene Vidal

Viaarxiv icon

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

Sep 27, 2023
David Junhao Zhang, Jay Zhangjie Wu, Jia-Wei Liu, Rui Zhao, Lingmin Ran, Yuchao Gu, Difei Gao, Mike Zheng Shou

Figure 1 for Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Figure 2 for Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Figure 3 for Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Figure 4 for Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Viaarxiv icon

Act As You Wish: Fine-Grained Control of Motion Diffusion Model with Hierarchical Semantic Graphs

Nov 02, 2023
Peng Jin, Yang Wu, Yanbo Fan, Zhongqian Sun, Yang Wei, Li Yuan

Viaarxiv icon