Alert button

"Text": models, code, and papers
Alert button

YORO -- Lightweight End to End Visual Grounding

Nov 15, 2022
Chih-Hui Ho, Srikar Appalaraju, Bhavan Jasani, R. Manmatha, Nuno Vasconcelos

Figure 1 for YORO -- Lightweight End to End Visual Grounding
Figure 2 for YORO -- Lightweight End to End Visual Grounding
Figure 3 for YORO -- Lightweight End to End Visual Grounding
Figure 4 for YORO -- Lightweight End to End Visual Grounding
Viaarxiv icon

GroupViT: Semantic Segmentation Emerges from Text Supervision

Feb 22, 2022
Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang

Figure 1 for GroupViT: Semantic Segmentation Emerges from Text Supervision
Figure 2 for GroupViT: Semantic Segmentation Emerges from Text Supervision
Figure 3 for GroupViT: Semantic Segmentation Emerges from Text Supervision
Figure 4 for GroupViT: Semantic Segmentation Emerges from Text Supervision
Viaarxiv icon

Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models

Jul 26, 2022
Robin Rombach, Andreas Blattmann, Björn Ommer

Figure 1 for Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models
Figure 2 for Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models
Figure 3 for Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models
Figure 4 for Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models
Viaarxiv icon

OhMG: Zero-shot Open-vocabulary Human Motion Generation

Oct 28, 2022
Junfan Lin, Jianlong Chang, Lingbo Liu, Guanbin Li, Liang Lin, Qi Tian, Chang-wen Chen

Figure 1 for OhMG: Zero-shot Open-vocabulary Human Motion Generation
Figure 2 for OhMG: Zero-shot Open-vocabulary Human Motion Generation
Figure 3 for OhMG: Zero-shot Open-vocabulary Human Motion Generation
Figure 4 for OhMG: Zero-shot Open-vocabulary Human Motion Generation
Viaarxiv icon

Character-Centric Story Visualization via Visual Planning and Token Alignment

Oct 20, 2022
Hong Chen, Rujun Han, Te-Lin Wu, Hideki Nakayama, Nanyun Peng

Figure 1 for Character-Centric Story Visualization via Visual Planning and Token Alignment
Figure 2 for Character-Centric Story Visualization via Visual Planning and Token Alignment
Figure 3 for Character-Centric Story Visualization via Visual Planning and Token Alignment
Figure 4 for Character-Centric Story Visualization via Visual Planning and Token Alignment
Viaarxiv icon

MUSIED: A Benchmark for Event Detection from Multi-Source Heterogeneous Informal Texts

Nov 25, 2022
Xiangyu Xi, Jianwei Lv, Shuaipeng Liu, Wei Ye, Fan Yang, Guanglu Wan

Figure 1 for MUSIED: A Benchmark for Event Detection from Multi-Source Heterogeneous Informal Texts
Figure 2 for MUSIED: A Benchmark for Event Detection from Multi-Source Heterogeneous Informal Texts
Figure 3 for MUSIED: A Benchmark for Event Detection from Multi-Source Heterogeneous Informal Texts
Figure 4 for MUSIED: A Benchmark for Event Detection from Multi-Source Heterogeneous Informal Texts
Viaarxiv icon

Understanding Text Classification Data and Models Using Aggregated Input Salience

Nov 10, 2022
Sebastian Ebert, Alice Shoshana Jakobovits, Katja Filippova

Figure 1 for Understanding Text Classification Data and Models Using Aggregated Input Salience
Figure 2 for Understanding Text Classification Data and Models Using Aggregated Input Salience
Figure 3 for Understanding Text Classification Data and Models Using Aggregated Input Salience
Figure 4 for Understanding Text Classification Data and Models Using Aggregated Input Salience
Viaarxiv icon

Image Search with Text Feedback by Additive Attention Compositional Learning

Mar 08, 2022
Yuxin Tian, Shawn Newsam, Kofi Boakye

Figure 1 for Image Search with Text Feedback by Additive Attention Compositional Learning
Figure 2 for Image Search with Text Feedback by Additive Attention Compositional Learning
Figure 3 for Image Search with Text Feedback by Additive Attention Compositional Learning
Figure 4 for Image Search with Text Feedback by Additive Attention Compositional Learning
Viaarxiv icon

Renmin University of China at TRECVID 2022: Improving Video Search by Feature Fusion and Negation Understanding

Nov 28, 2022
Xirong Li, Aozhu Chen, Ziyue Wang, Fan Hu, Kaibin Tian, Xinru Chen, Chengbo Dong

Figure 1 for Renmin University of China at TRECVID 2022: Improving Video Search by Feature Fusion and Negation Understanding
Figure 2 for Renmin University of China at TRECVID 2022: Improving Video Search by Feature Fusion and Negation Understanding
Figure 3 for Renmin University of China at TRECVID 2022: Improving Video Search by Feature Fusion and Negation Understanding
Figure 4 for Renmin University of China at TRECVID 2022: Improving Video Search by Feature Fusion and Negation Understanding
Viaarxiv icon

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

May 10, 2022
Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, Yuanhao Yi, Lei He, Frank Soong, Tao Qin, Sheng Zhao, Tie-Yan Liu

Figure 1 for NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Figure 2 for NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Figure 3 for NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Figure 4 for NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Viaarxiv icon