Alert button

"Text": models, code, and papers
Alert button

VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix

Jun 17, 2022
Teng Wang, Wenhao Jiang, Zhichao Lu, Feng Zheng, Ran Cheng, Chengguo Yin, Ping Luo

Figure 1 for VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
Figure 2 for VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
Figure 3 for VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
Figure 4 for VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
Viaarxiv icon

Generating Coherent Narratives by Learning Dynamic and Discrete Entity States with a Contrastive Framework

Aug 08, 2022
Jian Guan, Zhenyu Yang, Rongsheng Zhang, Zhipeng Hu, Minlie Huang

Figure 1 for Generating Coherent Narratives by Learning Dynamic and Discrete Entity States with a Contrastive Framework
Figure 2 for Generating Coherent Narratives by Learning Dynamic and Discrete Entity States with a Contrastive Framework
Figure 3 for Generating Coherent Narratives by Learning Dynamic and Discrete Entity States with a Contrastive Framework
Figure 4 for Generating Coherent Narratives by Learning Dynamic and Discrete Entity States with a Contrastive Framework
Viaarxiv icon

Text Editing by Command

Oct 24, 2020
Felix Faltings, Michel Galley, Gerold Hintz, Chris Brockett, Chris Quirk, Jianfeng Gao, Bill Dolan

Figure 1 for Text Editing by Command
Figure 2 for Text Editing by Command
Figure 3 for Text Editing by Command
Figure 4 for Text Editing by Command
Viaarxiv icon

GRETEL: Graph Contrastive Topic Enhanced Language Model for Long Document Extractive Summarization

Aug 21, 2022
Qianqian Xie, Jimin Huang, Tulika Saha, Sophia Ananiadou

Figure 1 for GRETEL: Graph Contrastive Topic Enhanced Language Model for Long Document Extractive Summarization
Figure 2 for GRETEL: Graph Contrastive Topic Enhanced Language Model for Long Document Extractive Summarization
Figure 3 for GRETEL: Graph Contrastive Topic Enhanced Language Model for Long Document Extractive Summarization
Figure 4 for GRETEL: Graph Contrastive Topic Enhanced Language Model for Long Document Extractive Summarization
Viaarxiv icon

Language with Vision: a Study on Grounded Word and Sentence Embeddings

Jun 17, 2022
Hassan Shahmohammadi, Maria Heitmeier, Elnaz Shafaei-Bajestan, Hendrik P. A. Lensch, Harald Baayen

Figure 1 for Language with Vision: a Study on Grounded Word and Sentence Embeddings
Figure 2 for Language with Vision: a Study on Grounded Word and Sentence Embeddings
Figure 3 for Language with Vision: a Study on Grounded Word and Sentence Embeddings
Figure 4 for Language with Vision: a Study on Grounded Word and Sentence Embeddings
Viaarxiv icon

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

May 24, 2022
Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, Ziyi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji

Figure 1 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 2 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 3 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 4 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Viaarxiv icon

Self-paced learning to improve text row detection in historical documents with missing labels

Feb 02, 2022
Mihaela Gaman, Lida Ghadamiyan, Radu Tudor Ionescu, Marius Popescu

Figure 1 for Self-paced learning to improve text row detection in historical documents with missing labels
Figure 2 for Self-paced learning to improve text row detection in historical documents with missing labels
Figure 3 for Self-paced learning to improve text row detection in historical documents with missing labels
Viaarxiv icon

Visual Spatio-temporal Relation-enhanced Network for Cross-modal Text-Video Retrieval

Oct 29, 2021
Ning Han, Jingjing Chen, Guangyi Xiao, Yawen Zeng, Chuhao Shi, Hao Chen

Figure 1 for Visual Spatio-temporal Relation-enhanced Network for Cross-modal Text-Video Retrieval
Figure 2 for Visual Spatio-temporal Relation-enhanced Network for Cross-modal Text-Video Retrieval
Figure 3 for Visual Spatio-temporal Relation-enhanced Network for Cross-modal Text-Video Retrieval
Figure 4 for Visual Spatio-temporal Relation-enhanced Network for Cross-modal Text-Video Retrieval
Viaarxiv icon

Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training

Jan 20, 2022
J. Yang, Lei He

Figure 1 for Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training
Figure 2 for Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training
Figure 3 for Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training
Figure 4 for Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training
Viaarxiv icon

Multimodal Masked Autoencoders Learn Transferable Representations

May 31, 2022
Xinyang Geng, Hao Liu, Lisa Lee, Dale Schuurmans, Sergey Levine, Pieter Abbeel

Figure 1 for Multimodal Masked Autoencoders Learn Transferable Representations
Figure 2 for Multimodal Masked Autoencoders Learn Transferable Representations
Figure 3 for Multimodal Masked Autoencoders Learn Transferable Representations
Figure 4 for Multimodal Masked Autoencoders Learn Transferable Representations
Viaarxiv icon