Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Changyou Chen

LAFITE: Towards Language-Free Training for Text-to-Image Generation

Dec 13, 2021
Yufan Zhou, Ruiyi Zhang, Changyou Chen, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu, Tong Sun

Figure 1 for LAFITE: Towards Language-Free Training for Text-to-Image Generation

Figure 2 for LAFITE: Towards Language-Free Training for Text-to-Image Generation

Figure 3 for LAFITE: Towards Language-Free Training for Text-to-Image Generation

Figure 4 for LAFITE: Towards Language-Free Training for Text-to-Image Generation

One of the major challenges in training text-to-image generation models is the need of a large number of high-quality image-text pairs. While image samples are often easily accessible, the associated text descriptions typically require careful human captioning, which is particularly time- and cost-consuming. In this paper, we propose the first work to train text-to-image generation models without any text data. Our method leverages the well-aligned multi-modal semantic space of the powerful pre-trained CLIP model: the requirement of text-conditioning is seamlessly alleviated via generating text features from image features. Extensive experiments are conducted to illustrate the effectiveness of the proposed method. We obtain state-of-the-art results in the standard text-to-image generation tasks. Importantly, the proposed language-free model outperforms most existing models trained with full image-text pairs. Furthermore, our method can be applied in fine-tuning pre-trained models, which saves both training time and cost in training text-to-image generation models. Our pre-trained model obtains competitive results in zero-shot text-to-image generation on the MS-COCO dataset, yet with around only 1% of the model size and training data size relative to the recently proposed large DALL-E model.

* The code and pre-trained models will be publicly available soon

Via

Access Paper or Ask Questions

A Generic Approach for Enhancing GANs by Regularized Latent Optimization

Dec 07, 2021
Yufan Zhou, Chunyuan Li, Changyou Chen, Jinhui Xu

Figure 1 for A Generic Approach for Enhancing GANs by Regularized Latent Optimization

Figure 2 for A Generic Approach for Enhancing GANs by Regularized Latent Optimization

Figure 3 for A Generic Approach for Enhancing GANs by Regularized Latent Optimization

Figure 4 for A Generic Approach for Enhancing GANs by Regularized Latent Optimization

With the rapidly growing model complexity and data volume, training deep generative models (DGMs) for better performance has becoming an increasingly more important challenge. Previous research on this problem has mainly focused on improving DGMs by either introducing new objective functions or designing more expressive model architectures. However, such approaches often introduce significantly more computational and/or designing overhead. To resolve such issues, we introduce in this paper a generic framework called {\em generative-model inference} that is capable of enhancing pre-trained GANs effectively and seamlessly in a variety of application scenarios. Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques, instead of re-training or fine-tuning pre-trained model parameters. Extensive experimental results on applications like image generation, image translation, text-to-image generation, image inpainting, and text-guided image editing suggest the effectiveness and superiority of our proposed framework.

Via

Access Paper or Ask Questions

Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees

Nov 17, 2021
Yaman Kumar Singla, Sriram Krishna, Rajiv Ratn Shah, Changyou Chen

Figure 1 for Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees

Figure 2 for Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees

Figure 3 for Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees

Figure 4 for Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees

Automated Scoring (AS), the natural language processing task of scoring essays and speeches in an educational testing setting, is growing in popularity and being deployed across contexts from government examinations to companies providing language proficiency services. However, existing systems either forgo human raters entirely, thus harming the reliability of the test, or score every response by both human and machine thereby increasing costs. We target the spectrum of possible solutions in between, making use of both humans and machines to provide a higher quality test while keeping costs reasonable to democratize access to AS. In this work, we propose a combination of the existing paradigms, sampling responses to be scored by humans intelligently. We propose reward sampling and observe significant gains in accuracy (19.80% increase on average) and quadratic weighted kappa (QWK) (25.60% on average) with a relatively small human budget (30% samples) using our proposed sampling. The accuracy increase observed using standard random and importance sampling baselines are 8.6% and 12.2% respectively. Furthermore, we demonstrate the system's model agnostic nature by measuring its performance on a variety of models currently deployed in an AS setting as well as pseudo models. Finally, we propose an algorithm to estimate the accuracy/QWK with statistical guarantees (Our code is available at https://git.io/J1IOy).

Via

Access Paper or Ask Questions

AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Oct 14, 2021
Yaman Kumar Singla, Swapnil Parekh, Somesh Singh, Junyi Jessy Li, Rajiv Ratn Shah, Changyou Chen

Figure 1 for AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Figure 2 for AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Figure 3 for AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Figure 4 for AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Deep-learning based Automatic Essay Scoring (AES) systems are being actively used by states and language testing agencies alike to evaluate millions of candidates for life-changing decisions ranging from college applications to visa approvals. However, little research has been put to understand and interpret the black-box nature of deep-learning based scoring algorithms. Previous studies indicate that scoring models can be easily fooled. In this paper, we explore the reason behind their surprising adversarial brittleness. We utilize recent advances in interpretability to find the extent to which features such as coherence, content, vocabulary, and relevance are important for automated scoring mechanisms. We use this to investigate the oversensitivity i.e., large change in output score with a little change in input essay content) and overstability i.e., little change in output scores with large changes in input essay content) of AES. Our results indicate that autoscoring models, despite getting trained as "end-to-end" models with rich contextual embeddings such as BERT, behave like bag-of-words models. A few words determine the essay score without the requirement of any context making the model largely overstable. This is in stark contrast to recent probing studies on pre-trained representation learning models, which show that rich linguistic features such as parts-of-speech and morphology are encoded by them. Further, we also find that the models have learnt dataset biases, making them oversensitive. To deal with these issues, we propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies. We find that our proposed models are able to detect unusual attribution patterns and flag adversarial samples successfully.

* arXiv admin note: text overlap with arXiv:2012.13872

Via

Access Paper or Ask Questions

Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks

Oct 13, 2021
Anuj Saraswat, Mehar Bhatia, Yaman Kumar Singla, Changyou Chen, Rajiv Ratn Shah

Figure 1 for Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks

Figure 2 for Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks

Figure 3 for Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks

Figure 4 for Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks

Recent studies in speech perception have been closely linked to fields of cognitive psychology, phonology, and phonetics in linguistics. During perceptual attunement, a critical and sensitive developmental trajectory has been examined in bilingual and monolingual infants where they can best discriminate common phonemes. In this paper, we compare and identify these cognitive aspects on deep neural-based visual lip-reading models. We conduct experiments on the two most extensive public visual speech recognition datasets for English and Mandarin. Through our experimental results, we observe a strong correlation between these theories in cognitive psychology and our unique modeling. We inspect how these computational models develop similar phases in speech perception and acquisitions.

* 9 pages, 6 figures, 2 tables

Via

Access Paper or Ask Questions

MINIMAL: Mining Models for Data Free Universal Adversarial Triggers

Sep 25, 2021
Swapnil Parekh, Yaman Singla Kumar, Somesh Singh, Changyou Chen, Balaji Krishnamurthy, Rajiv Ratn Shah

Figure 1 for MINIMAL: Mining Models for Data Free Universal Adversarial Triggers

Figure 2 for MINIMAL: Mining Models for Data Free Universal Adversarial Triggers

Figure 3 for MINIMAL: Mining Models for Data Free Universal Adversarial Triggers

Figure 4 for MINIMAL: Mining Models for Data Free Universal Adversarial Triggers

It is well known that natural language models are vulnerable to adversarial attacks, which are mostly input-specific in nature. Recently, it has been shown that there also exist input-agnostic attacks in NLP models, called universal adversarial triggers. However, existing methods to craft universal triggers are data intensive. They require large amounts of data samples to generate adversarial triggers, which are typically inaccessible by attackers. For instance, previous works take 3000 data samples per class for the SNLI dataset to generate adversarial triggers. In this paper, we present a novel data-free approach, MINIMAL, to mine input-agnostic adversarial triggers from models. Using the triggers produced with our data-free algorithm, we reduce the accuracy of Stanford Sentiment Treebank's positive class from 93.6% to 9.6%. Similarly, for the Stanford Natural Language Inference (SNLI), our single-word trigger reduces the accuracy of the entailment class from 90.95% to less than 0.6\%. Despite being completely data-free, we get equivalent accuracy drops as data-dependent methods.

Via

Access Paper or Ask Questions

AES Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Sep 24, 2021
Yaman Kumar Singla, Swapnil Parekh, Somesh Singh, Junyi Jessy Li, Rajiv Ratn Shah, Changyou Chen

Figure 1 for AES Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Figure 2 for AES Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Figure 3 for AES Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

Figure 4 for AES Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

* arXiv admin note: text overlap with arXiv:2012.13872

Via

Access Paper or Ask Questions

Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring

Aug 30, 2021
Yaman Kumar Singla, Avykat Gupta, Shaurya Bagga, Changyou Chen, Balaji Krishnamurthy, Rajiv Ratn Shah

Figure 1 for Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring

Figure 2 for Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring

Figure 3 for Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring

Figure 4 for Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring

Automatic Speech Scoring (ASS) is the computer-assisted evaluation of a candidate's speaking proficiency in a language. ASS systems face many challenges like open grammar, variable pronunciations, and unstructured or semi-structured content. Recent deep learning approaches have shown some promise in this domain. However, most of these approaches focus on extracting features from a single audio, making them suffer from the lack of speaker-specific context required to model such a complex task. We propose a novel deep learning technique for non-native ASS, called speaker-conditioned hierarchical modeling. In our technique, we take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. We extract context vectors from these responses and feed them as additional speaker-specific context to our network to score a particular response. We compare our technique with strong baselines and find that such modeling improves the model's average performance by 6.92% (maximum = 12.86%, minimum = 4.51%). We further show both quantitative and qualitative insights into the importance of this additional context in solving the problem of ASS.

* Published in CIKM 2021

Via

Access Paper or Ask Questions