Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lun-Wei Ku

Academia Sinica, Taiwan

Plot and Rework: Modeling Storylines for Visual Storytelling

May 23, 2021
Chi-Yang Hsu, Yun-Wei Chu, Ting-Hao 'Kenneth' Huang, Lun-Wei Ku

Figure 1 for Plot and Rework: Modeling Storylines for Visual Storytelling

Figure 2 for Plot and Rework: Modeling Storylines for Visual Storytelling

Figure 3 for Plot and Rework: Modeling Storylines for Visual Storytelling

Figure 4 for Plot and Rework: Modeling Storylines for Visual Storytelling

Writing a coherent and engaging story is not easy. Creative writers use their knowledge and worldview to put disjointed elements together to form a coherent storyline, and work and rework iteratively toward perfection. Automated visual storytelling (VIST) models, however, make poor use of external knowledge and iterative generation when attempting to create stories. This paper introduces PR-VIST, a framework that represents the input image sequence as a story graph in which it finds the best path to form a storyline. PR-VIST then takes this path and learns to generate the final story via an iterative training process. This framework produces stories that are superior in terms of diversity, coherence, and humanness, per both automatic and human evaluations. An ablation study shows that both plotting and reworking contribute to the model's superiority.

* Accepted by ACL'21 Findings; this is not the camera-ready version

Via

Access Paper or Ask Questions

Happy Dance, Slow Clap: Using Reaction GIFs to Predict Induced Affect on Twitter

May 20, 2021
Boaz Shmueli, Soumya Ray, Lun-Wei Ku

Figure 1 for Happy Dance, Slow Clap: Using Reaction GIFs to Predict Induced Affect on Twitter

Figure 2 for Happy Dance, Slow Clap: Using Reaction GIFs to Predict Induced Affect on Twitter

Figure 3 for Happy Dance, Slow Clap: Using Reaction GIFs to Predict Induced Affect on Twitter

Figure 4 for Happy Dance, Slow Clap: Using Reaction GIFs to Predict Induced Affect on Twitter

Datasets with induced emotion labels are scarce but of utmost importance for many NLP tasks. We present a new, automated method for collecting texts along with their induced reaction labels. The method exploits the online use of reaction GIFs, which capture complex affective states. We show how to augment the data with induced emotion and induced sentiment labels. We use our method to create and publish ReactionGIF, a first-of-its-kind affective dataset of 30K tweets. We provide baselines for three new tasks, including induced sentiment prediction and multilabel classification of induced emotions. Our method and dataset open new research opportunities in emotion detection and affective computing.

* To be published in ACL 2021. 7 pages, 4 figures, 2 tables

Via

Access Paper or Ask Questions

Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing

Apr 20, 2021
Boaz Shmueli, Jan Fell, Soumya Ray, Lun-Wei Ku

Figure 1 for Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing

Figure 2 for Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing

Figure 3 for Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing

Figure 4 for Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing

The use of crowdworkers in NLP research is growing rapidly, in tandem with the exponential increase in research production in machine learning and AI. Ethical discussion regarding the use of crowdworkers within the NLP research community is typically confined in scope to issues related to labor conditions such as fair pay. We draw attention to the lack of ethical considerations related to the various tasks performed by workers, including labeling, evaluation, and production. We find that the Final Rule, the common ethical framework used by researchers, did not anticipate the use of online crowdsourcing platforms for data collection, resulting in gaps between the spirit and practice of human-subjects ethics in NLP research. We enumerate common scenarios where crowdworkers performing NLP tasks are at risk of harm. We thus recommend that researchers evaluate these risks by considering the three ethical principles set up by the Belmont Report. We also clarify some common misconceptions regarding the Institutional Review Board (IRB) application. We hope this paper will serve to reopen the discussion within our community regarding the ethical use of crowdworkers.

* To be published in NAACL-HLT 2021. 12 pages, 1 figure, 3 tables

Via

Access Paper or Ask Questions

SocialNLP EmotionGIF 2020 Challenge Overview: Predicting Reaction GIF Categories on Social Media

Feb 24, 2021
Boaz Shmueli, Lun-Wei Ku, Soumya Ray

Figure 1 for SocialNLP EmotionGIF 2020 Challenge Overview: Predicting Reaction GIF Categories on Social Media

Figure 2 for SocialNLP EmotionGIF 2020 Challenge Overview: Predicting Reaction GIF Categories on Social Media

Figure 3 for SocialNLP EmotionGIF 2020 Challenge Overview: Predicting Reaction GIF Categories on Social Media

Figure 4 for SocialNLP EmotionGIF 2020 Challenge Overview: Predicting Reaction GIF Categories on Social Media

We present an overview of the EmotionGIF2020 Challenge, held at the 8th International Workshop on Natural Language Processing for Social Media (SocialNLP), in conjunction with ACL 2020. The challenge required predicting affective reactions to online texts, and included the EmotionGIF dataset, with tweets labeled for the reaction categories. The novel dataset included 40K tweets with their reaction GIFs. Due to the special circumstances of year 2020, two rounds of the competition were conducted. A total of 84 teams registered for the task. Of these, 25 teams success-fully submitted entries to the evaluation phase in the first round, while 13 teams participated successfully in the second round. Of the top participants, five teams presented a technical report and shared their code. The top score of the winning team using the Recall@K metric was 62.47%.

* The 8th International Workshop on Natural Language Processing for Social Media co-located with ACL-2020. 7 pages, 5 figures, 3 tables

Via

Access Paper or Ask Questions

Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent

Oct 05, 2020
Yun-Hsuan Jen, Chieh-Yang Huang, Mei-Hua Chen, Ting-Hao 'Kenneth' Huang, Lun-Wei Ku

Figure 1 for Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent

Figure 2 for Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent

Figure 3 for Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent

Figure 4 for Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent

Many English-as-a-second language learners have trouble using near-synonym words (e.g., small vs.little; briefly vs.shortly) correctly, and often look for example sentences to learn how two nearly synonymous terms differ. Prior work uses hand-crafted scores to recommend sentences but has difficulty in adopting such scores to all the near-synonyms as near-synonyms differ in various ways. We notice that the helpfulness of the learning material would reflect on the learners' performance. Thus, we propose the inference-based learner-like agent to mimic learner behavior and identify good learning materials by examining the agent's performance. To enable the agent to behave like a learner, we leverage entailment modeling's capability of inferring answers from the provided materials. Experimental results show that the proposed agent is equipped with good learner-like behavior to achieve the best performance in both fill-in-the-blank (FITB) and good example sentence selection tasks. We further conduct a classroom user study with college ESL learners. The results of the user study show that the proposed agent can find out example sentences that help students learn more easily and efficiently. Compared to other models, the proposed agent improves the score of more than 17% of students after learning.

* 9 pages, to appear in EMNLP 2020 as a long paper

Via

Access Paper or Ask Questions

Reactive Supervision: A New Method for Collecting Sarcasm Data

Sep 28, 2020
Boaz Shmueli, Lun-Wei Ku, Soumya Ray

Figure 1 for Reactive Supervision: A New Method for Collecting Sarcasm Data

Figure 2 for Reactive Supervision: A New Method for Collecting Sarcasm Data

Figure 3 for Reactive Supervision: A New Method for Collecting Sarcasm Data

Figure 4 for Reactive Supervision: A New Method for Collecting Sarcasm Data

Sarcasm detection is an important task in affective computing, requiring large amounts of labeled data. We introduce reactive supervision, a novel data collection method that utilizes the dynamics of online conversations to overcome the limitations of existing data collection techniques. We use the new method to create and release a first-of-its-kind large dataset of tweets with sarcasm perspective labels and new contextual features. The dataset is expected to advance sarcasm detection research. Our method can be adapted to other affective computing domains, thus opening up new research opportunities.

* 7 pages, 2 figures, 8 tables. To be published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)

Via

Access Paper or Ask Questions

MVIN: Learning Multiview Items for Recommendation

May 26, 2020
Chang-You Tai, Meng-Ru Wu, Yun-Wei Chu, Shao-Yu Chu, Lun-Wei Ku

Figure 1 for MVIN: Learning Multiview Items for Recommendation

Figure 2 for MVIN: Learning Multiview Items for Recommendation

Figure 3 for MVIN: Learning Multiview Items for Recommendation

Figure 4 for MVIN: Learning Multiview Items for Recommendation

Researchers have begun to utilize heterogeneous knowledge graphs (KGs) as auxiliary information in recommendation systems to mitigate the cold start and sparsity issues. However, utilizing a graph neural network (GNN) to capture information in KG and further apply in RS is still problematic as it is unable to see each item's properties from multiple perspectives. To address these issues, we propose the multi-view item network (MVIN), a GNN-based recommendation model which provides superior recommendations by describing items from a unique mixed view from user and entity angles. MVIN learns item representations from both the user view and the entity view. From the user view, user-oriented modules score and aggregate features to make recommendations from a personalized perspective constructed according to KG entities which incorporates user click information. From the entity view, the mixing layer contrasts layer-wise GCN information to further obtain comprehensive features from internal entity-entity interactions in the KG. We evaluate MVIN on three real-world datasets: MovieLens-1M (ML-1M), LFM-1b 2015 (LFM-1b), and Amazon-Book (AZ-book). Results show that MVIN significantly outperforms state-of-the-art methods on these three datasets. In addition, from user-view cases, we find that MVIN indeed captures entities that attract users. Figures further illustrate that mixing layers in a heterogeneous KG plays a vital role in neighborhood information aggregation.

Via

Access Paper or Ask Questions

Attractive or Faithful? Popularity-Reinforced Learning for Inspired Headline Generation

Feb 06, 2020
Yun-Zhu Song, Hong-Han Shuai, Sung-Lin Yeh, Yi-Lun Wu, Lun-Wei Ku, Wen-Chih Peng

Figure 1 for Attractive or Faithful? Popularity-Reinforced Learning for Inspired Headline Generation

Figure 2 for Attractive or Faithful? Popularity-Reinforced Learning for Inspired Headline Generation

Figure 3 for Attractive or Faithful? Popularity-Reinforced Learning for Inspired Headline Generation

Figure 4 for Attractive or Faithful? Popularity-Reinforced Learning for Inspired Headline Generation

With the rapid proliferation of online media sources and published news, headlines have become increasingly important for attracting readers to news articles, since users may be overwhelmed with the massive information. In this paper, we generate inspired headlines that preserve the nature of news articles and catch the eye of the reader simultaneously. The task of inspired headline generation can be viewed as a specific form of Headline Generation (HG) task, with the emphasis on creating an attractive headline from a given news article. To generate inspired headlines, we propose a novel framework called POpularity-Reinforced Learning for inspired Headline Generation (PORL-HG). PORL-HG exploits the extractive-abstractive architecture with 1) Popular Topic Attention (PTA) for guiding the extractor to select the attractive sentence from the article and 2) a popularity predictor for guiding the abstractor to rewrite the attractive sentence. Moreover, since the sentence selection of the extractor is not differentiable, techniques of reinforcement learning (RL) are utilized to bridge the gap with rewards obtained from a popularity score predictor. Through quantitative and qualitative experiments, we show that the proposed PORL-HG significantly outperforms the state-of-the-art headline generation models in terms of attractiveness evaluated by both human (71.03%) and the predictor (at least 27.60%), while the faithfulness of PORL-HG is also comparable to the state-of-the-art generation model.

* AAAI 2020

Via

Access Paper or Ask Questions

Multi-step Joint-Modality Attention Network for Scene-Aware Dialogue System

Jan 17, 2020
Yun-Wei Chu, Kuan-Yen Lin, Chao-Chun Hsu, Lun-Wei Ku

Figure 1 for Multi-step Joint-Modality Attention Network for Scene-Aware Dialogue System

Figure 2 for Multi-step Joint-Modality Attention Network for Scene-Aware Dialogue System

Figure 3 for Multi-step Joint-Modality Attention Network for Scene-Aware Dialogue System

Figure 4 for Multi-step Joint-Modality Attention Network for Scene-Aware Dialogue System

Understanding dynamic scenes and dialogue contexts in order to converse with users has been challenging for multimodal dialogue systems. The 8-th Dialog System Technology Challenge (DSTC8) proposed an Audio Visual Scene-Aware Dialog (AVSD) task, which contains multiple modalities including audio, vision, and language, to evaluate how dialogue systems understand different modalities and response to users. In this paper, we proposed a multi-step joint-modality attention network (JMAN) based on recurrent neural network (RNN) to reason on videos. Our model performs a multi-step attention mechanism and jointly considers both visual and textual representations in each reasoning process to better integrate information from the two different modalities. Compared to the baseline released by AVSD organizers, our model achieves a relative 12.1% and 22.4% improvement over the baseline on ROUGE-L score and CIDEr score.

* DSTC8 collocated with AAAI2020

Via

Access Paper or Ask Questions