Alert button

"Text": models, code, and papers
Alert button

Multimodal Frame-Scoring Transformer for Video Summarization

Jul 05, 2022
Jeiyoon Park, Kiho Kwoun, Chanhee Lee, Heuiseok Lim

Figure 1 for Multimodal Frame-Scoring Transformer for Video Summarization
Figure 2 for Multimodal Frame-Scoring Transformer for Video Summarization
Figure 3 for Multimodal Frame-Scoring Transformer for Video Summarization
Figure 4 for Multimodal Frame-Scoring Transformer for Video Summarization
Viaarxiv icon

Wide Attention Is The Way Forward For Transformers

Oct 02, 2022
Jason Ross Brown, Yiren Zhao, Ilia Shumailov, Robert D Mullins

Figure 1 for Wide Attention Is The Way Forward For Transformers
Figure 2 for Wide Attention Is The Way Forward For Transformers
Figure 3 for Wide Attention Is The Way Forward For Transformers
Figure 4 for Wide Attention Is The Way Forward For Transformers
Viaarxiv icon

AttentionHTR: Handwritten Text Recognition Based on Attention Encoder-Decoder Networks

Jan 23, 2022
Dmitrijs Kass, Ekta Vats

Figure 1 for AttentionHTR: Handwritten Text Recognition Based on Attention Encoder-Decoder Networks
Figure 2 for AttentionHTR: Handwritten Text Recognition Based on Attention Encoder-Decoder Networks
Figure 3 for AttentionHTR: Handwritten Text Recognition Based on Attention Encoder-Decoder Networks
Figure 4 for AttentionHTR: Handwritten Text Recognition Based on Attention Encoder-Decoder Networks
Viaarxiv icon

Probabilistic Generative Transformer Language models for Generative Design of Molecules

Sep 20, 2022
Lai Wei, Nihang Fu, Yuqi Song, Qian Wang, Jianjun Hu

Figure 1 for Probabilistic Generative Transformer Language models for Generative Design of Molecules
Figure 2 for Probabilistic Generative Transformer Language models for Generative Design of Molecules
Figure 3 for Probabilistic Generative Transformer Language models for Generative Design of Molecules
Figure 4 for Probabilistic Generative Transformer Language models for Generative Design of Molecules
Viaarxiv icon

Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems

Jun 30, 2022
Hyun-Wook Yoon, Ohsung Kwon, Hoyeon Lee, Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim, Min-Jae Hwang

Figure 1 for Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems
Figure 2 for Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems
Figure 3 for Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems
Figure 4 for Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems
Viaarxiv icon

ScaleFace: Uncertainty-aware Deep Metric Learning

Sep 12, 2022
Roman Kail, Kirill Fedyanin, Nikita Muravev, Alexey Zaytsev, Maxim Panov

Figure 1 for ScaleFace: Uncertainty-aware Deep Metric Learning
Figure 2 for ScaleFace: Uncertainty-aware Deep Metric Learning
Figure 3 for ScaleFace: Uncertainty-aware Deep Metric Learning
Figure 4 for ScaleFace: Uncertainty-aware Deep Metric Learning
Viaarxiv icon

PDO-e$\text{S}^\text{2}$CNNs: Partial Differential Operator Based Equivariant Spherical CNNs

Apr 08, 2021
Zhengyang Shen, Tiancheng Shen, Zhouchen Lin, Jinwen Ma

Figure 1 for PDO-e$\text{S}^\text{2}$CNNs: Partial Differential Operator Based Equivariant Spherical CNNs
Figure 2 for PDO-e$\text{S}^\text{2}$CNNs: Partial Differential Operator Based Equivariant Spherical CNNs
Figure 3 for PDO-e$\text{S}^\text{2}$CNNs: Partial Differential Operator Based Equivariant Spherical CNNs
Figure 4 for PDO-e$\text{S}^\text{2}$CNNs: Partial Differential Operator Based Equivariant Spherical CNNs
Viaarxiv icon

Injecting Text in Self-Supervised Speech Pretraining

Aug 27, 2021
Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Gary Wang, Pedro Moreno

Figure 1 for Injecting Text in Self-Supervised Speech Pretraining
Figure 2 for Injecting Text in Self-Supervised Speech Pretraining
Figure 3 for Injecting Text in Self-Supervised Speech Pretraining
Figure 4 for Injecting Text in Self-Supervised Speech Pretraining
Viaarxiv icon

LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

Nov 03, 2021
Christoph Schuhmann, Richard Vencu, Romain Beaumont, Robert Kaczmarczyk, Clayton Mullis, Aarush Katta, Theo Coombes, Jenia Jitsev, Aran Komatsuzaki

Figure 1 for LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Figure 2 for LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Figure 3 for LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Figure 4 for LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Viaarxiv icon

Multi-Tailed, Multi-Headed, Spatial Dynamic Memory refined Text-to-Image Synthesis

Oct 15, 2021
Amrit Diggavi Seshadri, Balaraman Ravindran

Figure 1 for Multi-Tailed, Multi-Headed, Spatial Dynamic Memory refined Text-to-Image Synthesis
Figure 2 for Multi-Tailed, Multi-Headed, Spatial Dynamic Memory refined Text-to-Image Synthesis
Figure 3 for Multi-Tailed, Multi-Headed, Spatial Dynamic Memory refined Text-to-Image Synthesis
Figure 4 for Multi-Tailed, Multi-Headed, Spatial Dynamic Memory refined Text-to-Image Synthesis
Viaarxiv icon