Alert button

"Text": models, code, and papers
Alert button

Towards Large-scale Building Attribute Mapping using Crowdsourced Images: Scene Text Recognition on Flickr and Problems to be Solved

Sep 14, 2023
Yao Sun, Anna Kruspe, Liqiu Meng, Yifan Tian, Eike J Hoffmann, Stefan Auer, Xiao Xiang Zhu

Figure 1 for Towards Large-scale Building Attribute Mapping using Crowdsourced Images: Scene Text Recognition on Flickr and Problems to be Solved
Figure 2 for Towards Large-scale Building Attribute Mapping using Crowdsourced Images: Scene Text Recognition on Flickr and Problems to be Solved
Figure 3 for Towards Large-scale Building Attribute Mapping using Crowdsourced Images: Scene Text Recognition on Flickr and Problems to be Solved
Figure 4 for Towards Large-scale Building Attribute Mapping using Crowdsourced Images: Scene Text Recognition on Flickr and Problems to be Solved
Viaarxiv icon

JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation

Oct 29, 2023
Yao Yao, Peike Li, Boyu Chen, Alex Wang

Figure 1 for JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation
Figure 2 for JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation
Figure 3 for JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation
Figure 4 for JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation
Viaarxiv icon

VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation

Sep 01, 2023
Xin Li, Wenqing Chu, Ye Wu, Weihang Yuan, Fanglong Liu, Qi Zhang, Fu Li, Haocheng Feng, Errui Ding, Jingdong Wang

Figure 1 for VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation
Figure 2 for VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation
Figure 3 for VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation
Figure 4 for VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation
Viaarxiv icon

FedTherapist: Mental Health Monitoring with User-Generated Linguistic Expressions on Smartphones via Federated Learning

Oct 25, 2023
Jaemin Shin, Hyungjun Yoon, Seungjoo Lee, Sungjoon Park, Yunxin Liu, Jinho D. Choi, Sung-Ju Lee

Viaarxiv icon

Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading

Oct 08, 2023
Howard Chen, Ramakanth Pasunuru, Jason Weston, Asli Celikyilmaz

Figure 1 for Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading
Figure 2 for Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading
Figure 3 for Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading
Figure 4 for Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading
Viaarxiv icon

Overview of ImageArg-2023: The First Shared Task in Multimodal Argument Mining

Oct 15, 2023
Zhexiong Liu, Mohamed Elarby, Yang Zhong, Diane Litman

Viaarxiv icon

Language as the Medium: Multimodal Video Classification through text only

Sep 19, 2023
Laura Hanu, Anita L. Verő, James Thewlis

Figure 1 for Language as the Medium: Multimodal Video Classification through text only
Figure 2 for Language as the Medium: Multimodal Video Classification through text only
Figure 3 for Language as the Medium: Multimodal Video Classification through text only
Figure 4 for Language as the Medium: Multimodal Video Classification through text only
Viaarxiv icon

JM3D & JM3D-LLM: Elevating 3D Representation with Joint Multi-modal Cues

Oct 14, 2023
Jiayi Ji, Haowei Wang, Changli Wu, Yiwei Ma, Xiaoshuai Sun, Rongrong Ji

Viaarxiv icon

Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning

Oct 12, 2023
Junyu Lu, Dixiang Zhang, Xiaojun Wu, Xinyu Gao, Ruyi Gan, Jiaxing Zhang, Yan Song, Pingjian Zhang

Figure 1 for Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Figure 2 for Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Figure 3 for Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Figure 4 for Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Viaarxiv icon

RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering

Oct 19, 2023
Yuduo Wang, Pedram Ghamisi

Viaarxiv icon