Alert button

"Text": models, code, and papers
Alert button

DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data

Jun 26, 2023
Stephanie Fu, Netanel Tamir, Shobhita Sundaram, Lucy Chai, Richard Zhang, Tali Dekel, Phillip Isola

Figure 1 for DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data
Figure 2 for DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data
Figure 3 for DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data
Figure 4 for DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data
Viaarxiv icon

PTVD: A Large-Scale Plot-Oriented Multimodal Dataset Based on Television Dramas

Jun 26, 2023
Chen Li, Xutan Peng, Teng Wang, Yixiao Ge, Mengyang Liu, Xuyuan Xu, Yexin Wang, Ying Shan

Figure 1 for PTVD: A Large-Scale Plot-Oriented Multimodal Dataset Based on Television Dramas
Figure 2 for PTVD: A Large-Scale Plot-Oriented Multimodal Dataset Based on Television Dramas
Figure 3 for PTVD: A Large-Scale Plot-Oriented Multimodal Dataset Based on Television Dramas
Figure 4 for PTVD: A Large-Scale Plot-Oriented Multimodal Dataset Based on Television Dramas
Viaarxiv icon

eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer

Jun 20, 2023
Ammar Abbas, Sri Karlapati, Bastian Schnell, Penny Karanasou, Marcel Granero Moya, Amith Nagaraj, Ayman Boustati, Nicole Peinelt, Alexis Moinet, Thomas Drugman

Figure 1 for eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer
Figure 2 for eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer
Figure 3 for eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer
Figure 4 for eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer
Viaarxiv icon

Test-Time Training on Nearest Neighbors for Large Language Models

Jun 07, 2023
Moritz Hardt, Yu Sun

Figure 1 for Test-Time Training on Nearest Neighbors for Large Language Models
Figure 2 for Test-Time Training on Nearest Neighbors for Large Language Models
Figure 3 for Test-Time Training on Nearest Neighbors for Large Language Models
Figure 4 for Test-Time Training on Nearest Neighbors for Large Language Models
Viaarxiv icon

Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination

May 25, 2023
Hao Fei, Qian Liu, Meishan Zhang, Min Zhang, Tat-Seng Chua

Figure 1 for Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination
Figure 2 for Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination
Figure 3 for Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination
Figure 4 for Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination
Viaarxiv icon

LOWA: Localize Objects in the Wild with Attributes

May 31, 2023
Xiaoyuan Guo, Kezhen Chen, Jinmeng Rao, Yawen Zhang, Baochen Sun, Jie Yang

Figure 1 for LOWA: Localize Objects in the Wild with Attributes
Figure 2 for LOWA: Localize Objects in the Wild with Attributes
Figure 3 for LOWA: Localize Objects in the Wild with Attributes
Figure 4 for LOWA: Localize Objects in the Wild with Attributes
Viaarxiv icon

MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation

May 24, 2023
Marco Bellagente, Manuel Brack, Hannah Teufel, Felix Friedrich, Björn Deiseroth, Constantin Eichenberg, Andrew Dai, Robert Baldock, Souradeep Nanda, Koen Oostermeijer, Andres Felipe Cruz-Salinas, Patrick Schramowski, Kristian Kersting, Samuel Weinbach

Figure 1 for MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation
Figure 2 for MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation
Figure 3 for MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation
Figure 4 for MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation
Viaarxiv icon

Toward Fairness in Text Generation via Mutual Information Minimization based on Importance Sampling

Feb 25, 2023
Rui Wang, Pengyu Cheng, Ricardo Henao

Figure 1 for Toward Fairness in Text Generation via Mutual Information Minimization based on Importance Sampling
Figure 2 for Toward Fairness in Text Generation via Mutual Information Minimization based on Importance Sampling
Figure 3 for Toward Fairness in Text Generation via Mutual Information Minimization based on Importance Sampling
Figure 4 for Toward Fairness in Text Generation via Mutual Information Minimization based on Importance Sampling
Viaarxiv icon

Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications

Jun 05, 2023
Han Xie, Da Zheng, Jun Ma, Houyu Zhang, Vassilis N. Ioannidis, Xiang Song, Qing Ping, Sheng Wang, Carl Yang, Yi Xu, Belinda Zeng, Trishul Chilimbi

Figure 1 for Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications
Figure 2 for Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications
Figure 3 for Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications
Figure 4 for Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications
Viaarxiv icon

Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis

Jun 05, 2023
Dengfeng Ke, Yayue Deng, Yukang Jia, Jinlong Xue, Qi Luo, Ya Li, Jianqing Sun, Jiaen Liang, Binghuai Lin

Figure 1 for Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis
Figure 2 for Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis
Figure 3 for Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis
Figure 4 for Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis
Viaarxiv icon