Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hideki Nakayama

Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations

May 28, 2024

Yi-Pei Chen, Noriki Nishida, Hideki Nakayama, Yuji Matsumoto

Figure 1 for Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations

Figure 2 for Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations

Figure 3 for Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations

Figure 4 for Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations

Abstract:Enhancing user engagement through personalization in conversational agents has gained significance, especially with the advent of large language models that generate fluent responses. Personalized dialogue generation, however, is multifaceted and varies in its definition -- ranging from instilling a persona in the agent to capturing users' explicit and implicit cues. This paper seeks to systemically survey the recent landscape of personalized dialogue generation, including the datasets employed, methodologies developed, and evaluation metrics applied. Covering 22 datasets, we highlight benchmark datasets and newer ones enriched with additional features. We further analyze 17 seminal works from top conferences between 2021-2023 and identify five distinct types of problems. We also shed light on recent progress by LLMs in personalized dialogue generation. Our evaluation section offers a comprehensive summary of assessment facets and metrics utilized in these works. In conclusion, we discuss prevailing challenges and envision prospect directions for future research in personalized dialogue generation.

* Presented in LREC-COLING 2024

Via

Access Paper or Ask Questions

LayoutFlow: Flow Matching for Layout Generation

Mar 27, 2024

Julian Jorge Andrade Guerreiro, Naoto Inoue, Kento Masui, Mayu Otani, Hideki Nakayama

Abstract:Finding a suitable layout represents a crucial task for diverse applications in graphic design. Motivated by simpler and smoother sampling trajectories, we explore the use of Flow Matching as an alternative to current diffusion-based layout generation models. Specifically, we propose LayoutFlow, an efficient flow-based model capable of generating high-quality layouts. Instead of progressively denoising the elements of a noisy layout, our method learns to gradually move, or flow, the elements of an initial sample until it reaches its final prediction. In addition, we employ a conditioning scheme that allows us to handle various generation tasks with varying degrees of conditioning with a single model. Empirically, LayoutFlow performs on par with state-of-the-art models while being significantly faster.

Via

Access Paper or Ask Questions

EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

Nov 27, 2023

Jiaxuan Li, Duc Minh Vo, Akihiro Sugimoto, Hideki Nakayama

Figure 1 for EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

Figure 2 for EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

Figure 3 for EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

Figure 4 for EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

Abstract:Large language models (LLMs)-based image captioning has the capability of describing objects not explicitly observed in training data; yet novel objects occur frequently, necessitating the requirement of sustaining up-to-date object knowledge for open-world comprehension. Instead of relying on large amounts of data and scaling up network parameters, we introduce a highly effective retrieval-augmented image captioning method that prompts LLMs with object names retrieved from External Visual--name memory (EVCap). We build ever-changing object knowledge memory using objects' visuals and names, enabling us to (i) update the memory at a minimal cost and (ii) effortlessly augment LLMs with retrieved object names utilizing a lightweight and fast-to-train model. Our model, which was trained only on the COCO dataset, can be adapted to out-domain data without additional fine-tuning or retraining. Our comprehensive experiments conducted on various benchmarks and synthetic commonsense-violating data demonstrate that EVCap, comprising solely 3.97M trainable parameters, exhibits superior performance compared to other methods of equivalent model size scale. Notably, it achieves competitive performance against specialist SOTAs with an enormous number of parameters. Our code is available at https://jiaxuan-li.github.io/EVCap.

* Project page: https://jiaxuan-li.github.io/EVCap

Via

Access Paper or Ask Questions

Partition-and-Debias: Agnostic Biases Mitigation via A Mixture of Biases-Specific Experts

Aug 19, 2023

Jiaxuan Li, Duc Minh Vo, Hideki Nakayama

Abstract:Bias mitigation in image classification has been widely researched, and existing methods have yielded notable results. However, most of these methods implicitly assume that a given image contains only one type of known or unknown bias, failing to consider the complexities of real-world biases. We introduce a more challenging scenario, agnostic biases mitigation, aiming at bias removal regardless of whether the type of bias or the number of types is unknown in the datasets. To address this difficult task, we present the Partition-and-Debias (PnD) method that uses a mixture of biases-specific experts to implicitly divide the bias space into multiple subspaces and a gating module to find a consensus among experts to achieve debiased classification. Experiments on both public and constructed benchmarks demonstrated the efficacy of the PnD. Code is available at: https://github.com/Jiaxuan-Li/PnD.

* ICCV 2023

Via

Access Paper or Ask Questions

Revisiting Latent Space of GAN Inversion for Real Image Editing

Jul 18, 2023

Kai Katsumata, Duc Minh Vo, Bei Liu, Hideki Nakayama

Figure 1 for Revisiting Latent Space of GAN Inversion for Real Image Editing

Figure 2 for Revisiting Latent Space of GAN Inversion for Real Image Editing

Figure 3 for Revisiting Latent Space of GAN Inversion for Real Image Editing

Figure 4 for Revisiting Latent Space of GAN Inversion for Real Image Editing

Abstract:The exploration of the latent space in StyleGANs and GAN inversion exemplify impressive real-world image editing, yet the trade-off between reconstruction quality and editing quality remains an open problem. In this study, we revisit StyleGANs' hyperspherical prior $\mathcal{Z}$ and combine it with highly capable latent spaces to build combined spaces that faithfully invert real images while maintaining the quality of edited images. More specifically, we propose $\mathcal{F}/\mathcal{Z}^{+}$ space consisting of two subspaces: $\mathcal{F}$ space of an intermediate feature map of StyleGANs enabling faithful reconstruction and $\mathcal{Z}^{+}$ space of an extended StyleGAN prior supporting high editing quality. We project the real images into the proposed space to obtain the inverted codes, by which we then move along $\mathcal{Z}^{+}$, enabling semantic editing without sacrificing image quality. Comprehensive experiments show that $\mathcal{Z}^{+}$ can replace the most commonly-used $\mathcal{W}$, $\mathcal{W}^{+}$, and $\mathcal{S}$ spaces while preserving reconstruction quality, resulting in reduced distortion of edited images.

* 10 pages, 12 figures. arXiv admin note: substantial text overlap with arXiv:2306.00241

Via

Access Paper or Ask Questions

Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and Uncurated Unlabeled Data

Jul 17, 2023

Kai Katsumata, Duc Minh Vo, Tatsuya Harada, Hideki Nakayama

Abstract:Label-noise or curated unlabeled data is used to compensate for the assumption of clean labeled data in training the conditional generative adversarial network; however, satisfying such an extended assumption is occasionally laborious or impractical. As a step towards generative modeling accessible to everyone, we introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated unlabeled data during training: (i) closed-set and open-set label noise in labeled data and (ii) closed-set and open-set unlabeled data. To combat it, we propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data and correcting wrong labels for labeled data. Unlike popular curriculum learning, which uses a threshold to pick the training samples, our soft curriculum controls the effect of each training instance by using the weights predicted by the auxiliary classifier, resulting in the preservation of useful samples while ignoring harmful ones. Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance. In particular, the proposed approach is able to match the performance of (semi-) supervised GANs even with less than half the labeled data.

* 10 pages, 13 figures

Via

Access Paper or Ask Questions

Balancing Reconstruction and Editing Quality of GAN Inversion for Real Image Editing with StyleGAN Prior Latent Space

May 31, 2023

Kai Katsumata, Duc Minh Vo, Bei Liu, Hideki Nakayama

Abstract:The exploration of the latent space in StyleGANs and GAN inversion exemplify impressive real-world image editing, yet the trade-off between reconstruction quality and editing quality remains an open problem. In this study, we revisit StyleGANs' hyperspherical prior $\mathcal{Z}$ and $\mathcal{Z}^+$ and integrate them into seminal GAN inversion methods to improve editing quality. Besides faithful reconstruction, our extensions achieve sophisticated editing quality with the aid of the StyleGAN prior. We project the real images into the proposed space to obtain the inverted codes, by which we then move along $\mathcal{Z}^{+}$, enabling semantic editing without sacrificing image quality. Comprehensive experiments show that $\mathcal{Z}^{+}$ can replace the most commonly-used $\mathcal{W}$, $\mathcal{W}^{+}$, and $\mathcal{S}$ spaces while preserving reconstruction quality, resulting in reduced distortion of edited images.

* 5 pages, 9 figures, AI4CC Workshop at CVPR 2023

Via

Access Paper or Ask Questions

LED: A Dataset for Life Event Extraction from Dialogs

Apr 17, 2023

Yi-Pei Chen, An-Zi Yen, Hen-Hsen Huang, Hideki Nakayama, Hsin-Hsi Chen

Figure 1 for LED: A Dataset for Life Event Extraction from Dialogs

Figure 2 for LED: A Dataset for Life Event Extraction from Dialogs

Figure 3 for LED: A Dataset for Life Event Extraction from Dialogs

Figure 4 for LED: A Dataset for Life Event Extraction from Dialogs

Abstract:Lifelogging has gained more attention due to its wide applications, such as personalized recommendations or memory assistance. The issues of collecting and extracting personal life events have emerged. People often share their life experiences with others through conversations. However, extracting life events from conversations is rarely explored. In this paper, we present Life Event Dialog, a dataset containing fine-grained life event annotations on conversational data. In addition, we initiate a novel conversational life event extraction task and differentiate the task from the public event extraction or the life event extraction from other sources like microblogs. We explore three information extraction (IE) frameworks to address the conversational life event extraction task: OpenIE, relation extraction, and event extraction. A comprehensive empirical analysis of the three baselines is established. The results suggest that the current event extraction model still struggles with extracting life events from human daily conversations. Our proposed life event dialog dataset and in-depth analysis of IE frameworks will facilitate future research on life event extraction from conversations.

* Accepted to EACL 2023 Findings

Via

Access Paper or Ask Questions

A-CAP: Anticipation Captioning with Commonsense Knowledge

Apr 13, 2023

Duc Minh Vo, Quoc-An Luong, Akihiro Sugimoto, Hideki Nakayama

Figure 1 for A-CAP: Anticipation Captioning with Commonsense Knowledge

Figure 2 for A-CAP: Anticipation Captioning with Commonsense Knowledge

Figure 3 for A-CAP: Anticipation Captioning with Commonsense Knowledge

Figure 4 for A-CAP: Anticipation Captioning with Commonsense Knowledge

Abstract:Humans possess the capacity to reason about the future based on a sparse collection of visual cues acquired over time. In order to emulate this ability, we introduce a novel task called Anticipation Captioning, which generates a caption for an unseen oracle image using a sparsely temporally-ordered set of images. To tackle this new task, we propose a model called A-CAP, which incorporates commonsense knowledge into a pre-trained vision-language model, allowing it to anticipate the caption. Through both qualitative and quantitative evaluations on a customized visual storytelling dataset, A-CAP outperforms other image captioning methods and establishes a strong baseline for anticipation captioning. We also address the challenges inherent in this task.

* Accepted to CVPR 2023

Via

Access Paper or Ask Questions

Character-Centric Story Visualization via Visual Planning and Token Alignment

Oct 20, 2022

Hong Chen, Rujun Han, Te-Lin Wu, Hideki Nakayama, Nanyun Peng

Figure 1 for Character-Centric Story Visualization via Visual Planning and Token Alignment

Figure 2 for Character-Centric Story Visualization via Visual Planning and Token Alignment

Figure 3 for Character-Centric Story Visualization via Visual Planning and Token Alignment

Figure 4 for Character-Centric Story Visualization via Visual Planning and Token Alignment

Abstract:Story visualization advances the traditional text-to-image generation by enabling multiple image generation based on a complete story. This task requires machines to 1) understand long text inputs and 2) produce a globally consistent image sequence that illustrates the contents of the story. A key challenge of consistent story visualization is to preserve characters that are essential in stories. To tackle the challenge, we propose to adapt a recent work that augments Vector-Quantized Variational Autoencoders (VQ-VAE) with a text-tovisual-token (transformer) architecture. Specifically, we modify the text-to-visual-token module with a two-stage framework: 1) character token planning model that predicts the visual tokens for characters only; 2) visual token completion model that generates the remaining visual token sequence, which is sent to VQ-VAE for finalizing image generations. To encourage characters to appear in the images, we further train the two-stage framework with a character-token alignment objective. Extensive experiments and evaluations demonstrate that the proposed method excels at preserving characters and can produce higher quality image sequences compared with the strong baselines. Codes can be found in https://github.com/sairin1202/VP-CSV

* accepted by EMNLP2022

Via

Access Paper or Ask Questions