Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Franck Dernoncourt

Adobe Research

Large Generative Graph Models

Jun 07, 2024

Yu Wang, Ryan A. Rossi, Namyong Park, Huiyuan Chen, Nesreen K. Ahmed, Puja Trivedi, Franck Dernoncourt, Danai Koutra, Tyler Derr

Figure 1 for Large Generative Graph Models

Figure 2 for Large Generative Graph Models

Figure 3 for Large Generative Graph Models

Figure 4 for Large Generative Graph Models

Abstract:Large Generative Models (LGMs) such as GPT, Stable Diffusion, Sora, and Suno are trained on a huge amount of language corpus, images, videos, and audio that are extremely diverse from numerous domains. This training paradigm over diverse well-curated data lies at the heart of generating creative and sensible content. However, all previous graph generative models (e.g., GraphRNN, MDVAE, MoFlow, GDSS, and DiGress) have been trained only on one dataset each time, which cannot replicate the revolutionary success achieved by LGMs in other fields. To remedy this crucial gap, we propose a new class of graph generative model called Large Graph Generative Model (LGGM) that is trained on a large corpus of graphs (over 5000 graphs) from 13 different domains. We empirically demonstrate that the pre-trained LGGM has superior zero-shot generative capability to existing graph generative models. Furthermore, our pre-trained LGGM can be easily fine-tuned with graphs from target domains and demonstrate even better performance than those directly trained from scratch, behaving as a solid starting point for real-world customization. Inspired by Stable Diffusion, we further equip LGGM with the capability to generate graphs given text prompts (Text-to-Graph), such as the description of the network name and domain (i.e., "The power-1138-bus graph represents a network of buses in a power distribution system."), and network statistics (i.e., "The graph has a low average degree, suitable for modeling social media interactions."). This Text-to-Graph capability integrates the extensive world knowledge in the underlying language model, offering users fine-grained control of the generated graphs. We release the code, the model checkpoint, and the datasets at https://lggm-lg.github.io/.

Via

Access Paper or Ask Questions

ROAST: Review-level Opinion Aspect Sentiment Target Joint Detection

May 30, 2024

Siva Uday Sampreeth Chebolu, Franck Dernoncourt, Nedim Lipka, Thamar Solorio

Figure 1 for ROAST: Review-level Opinion Aspect Sentiment Target Joint Detection

Figure 2 for ROAST: Review-level Opinion Aspect Sentiment Target Joint Detection

Figure 3 for ROAST: Review-level Opinion Aspect Sentiment Target Joint Detection

Figure 4 for ROAST: Review-level Opinion Aspect Sentiment Target Joint Detection

Abstract:Aspect-Based Sentiment Analysis (ABSA) has experienced tremendous expansion and diversity due to various shared tasks spanning several languages and fields and organized via SemEval workshops and Germeval. Nonetheless, a few shortcomings still need to be addressed, such as the lack of low-resource language evaluations and the emphasis on sentence-level analysis. To thoroughly assess ABSA techniques in the context of complete reviews, this research presents a novel task, Review-Level Opinion Aspect Sentiment Target (ROAST). ROAST seeks to close the gap between sentence-level and text-level ABSA by identifying every ABSA constituent at the review level. We extend the available datasets to enable ROAST, addressing the drawbacks noted in previous research by incorporating low-resource languages, numerous languages, and a variety of topics. Through this effort, ABSA research will be able to cover more ground and get a deeper comprehension of the task and its practical application in a variety of languages and domains (https://github.com/RiTUAL-UH/ROAST-ABSA).

* arXiv admin note: text overlap with arXiv:2309.13297

Via

Access Paper or Ask Questions

Retrieval Augmented Generation for Domain-specific Question Answering

Apr 23, 2024

Sanat Sharma, David Seunghyun Yoon, Franck Dernoncourt, Dewang Sultania, Karishma Bagga, Mengjiao Zhang, Trung Bui, Varun Kotte

Figure 1 for Retrieval Augmented Generation for Domain-specific Question Answering

Figure 2 for Retrieval Augmented Generation for Domain-specific Question Answering

Figure 3 for Retrieval Augmented Generation for Domain-specific Question Answering

Figure 4 for Retrieval Augmented Generation for Domain-specific Question Answering

Abstract:Question answering (QA) has become an important application in the advanced development of large language models. General pre-trained large language models for question-answering are not trained to properly understand the knowledge or terminology for a specific domain, such as finance, healthcare, education, and customer service for a product. To better cater to domain-specific understanding, we build an in-house question-answering system for Adobe products. We propose a novel framework to compile a large question-answer database and develop the approach for retrieval-aware finetuning of a Large Language model. We showcase that fine-tuning the retriever leads to major improvements in the final generation. Our overall approach reduces hallucinations during generation while keeping in context the latest retrieval information for contextual grounding.

* AAAI 2024 (Association for the Advancement of Artificial Intelligence) Scientific Document Understanding Workshop

Via

Access Paper or Ask Questions

Scaling Up Video Summarization Pretraining with Large Language Models

Apr 04, 2024

Dawit Mureja Argaw, Seunghyun Yoon, Fabian Caba Heilbron, Hanieh Deilamsalehy, Trung Bui, Zhaowen Wang, Franck Dernoncourt, Joon Son Chung

Abstract:Long-form video content constitutes a significant portion of internet traffic, making automated video summarization an essential research problem. However, existing video summarization datasets are notably limited in their size, constraining the effectiveness of state-of-the-art methods for generalization. Our work aims to overcome this limitation by capitalizing on the abundance of long-form videos with dense speech-to-video alignment and the remarkable capabilities of recent large language models (LLMs) in summarizing long text. We introduce an automated and scalable pipeline for generating a large-scale video summarization dataset using LLMs as Oracle summarizers. By leveraging the generated dataset, we analyze the limitations of existing approaches and propose a new video summarization model that effectively addresses them. To facilitate further research in the field, our work also presents a new benchmark dataset that contains 1200 long videos each with high-quality summaries annotated by professionals. Extensive experiments clearly indicate that our proposed approach sets a new state-of-the-art in video summarization across several benchmarks.

* Accepted to CVPR 2024

Via

Access Paper or Ask Questions

Fine-tuning CLIP Text Encoders with Two-step Paraphrasing

Feb 23, 2024

Hyunjae Kim, Seunghyun Yoon, Trung Bui, Handong Zhao, Quan Tran, Franck Dernoncourt, Jaewoo Kang

Figure 1 for Fine-tuning CLIP Text Encoders with Two-step Paraphrasing

Figure 2 for Fine-tuning CLIP Text Encoders with Two-step Paraphrasing

Figure 3 for Fine-tuning CLIP Text Encoders with Two-step Paraphrasing

Figure 4 for Fine-tuning CLIP Text Encoders with Two-step Paraphrasing

Abstract:Contrastive language-image pre-training (CLIP) models have demonstrated considerable success across various vision-language tasks, such as text-to-image retrieval, where the model is required to effectively process natural language input to produce an accurate visual output. However, current models still face limitations in dealing with linguistic variations in input queries, such as paraphrases, making it challenging to handle a broad range of user queries in real-world applications. In this study, we introduce a straightforward fine-tuning approach to enhance the representations of CLIP models for paraphrases. Our approach involves a two-step paraphrase generation process, where we automatically create two categories of paraphrases from web-scale image captions by leveraging large language models. Subsequently, we fine-tune the CLIP text encoder using these generated paraphrases while freezing the image encoder. Our resulting model, which we call ParaCLIP, exhibits significant improvements over baseline CLIP models across various tasks, including paraphrased retrieval (with rank similarity scores improved by up to 2.0% and 5.6%), Visual Genome Relation and Attribution, as well as seven semantic textual similarity tasks.

* EACL 2024 (Findings of the ACL)

Via

Access Paper or Ask Questions

Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes

Feb 03, 2024

Isabel O. Gallegos, Ryan A. Rossi, Joe Barrow, Md Mehrab Tanjim, Tong Yu, Hanieh Deilamsalehy, Ruiyi Zhang, Sungchul Kim, Franck Dernoncourt

Abstract:Large language models (LLMs) have shown remarkable advances in language generation and understanding but are also prone to exhibiting harmful social biases. While recognition of these behaviors has generated an abundance of bias mitigation techniques, most require modifications to the training data, model parameters, or decoding strategy, which may be infeasible without access to a trainable model. In this work, we leverage the zero-shot capabilities of LLMs to reduce stereotyping in a technique we introduce as zero-shot self-debiasing. With two approaches, self-debiasing via explanation and self-debiasing via reprompting, we show that self-debiasing can significantly reduce the degree of stereotyping across nine different social groups while relying only on the LLM itself and a simple prompt, with explanations correctly identifying invalid assumptions and reprompting delivering the greatest reductions in bias. We hope this work opens inquiry into other zero-shot techniques for bias mitigation.

Via

Access Paper or Ask Questions

Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation

Nov 30, 2023

Linzi Xing, Quan Tran, Fabian Caba, Franck Dernoncourt, Seunghyun Yoon, Zhaowen Wang, Trung Bui, Giuseppe Carenini

Figure 1 for Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation

Figure 2 for Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation

Figure 3 for Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation

Figure 4 for Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation

Abstract:Video topic segmentation unveils the coarse-grained semantic structure underlying videos and is essential for other video understanding tasks. Given the recent surge in multi-modal, relying solely on a single modality is arguably insufficient. On the other hand, prior solutions for similar tasks like video scene/shot segmentation cater to short videos with clear visual shifts but falter for long videos with subtle changes, such as livestreams. In this paper, we introduce a multi-modal video topic segmenter that utilizes both video transcripts and frames, bolstered by a cross-modal attention mechanism. Furthermore, we propose a dual-contrastive learning framework adhering to the unsupervised domain adaptation paradigm, enhancing our model's adaptability to longer, more semantically complex videos. Experiments on short and long video corpora demonstrate that our proposed solution, significantly surpasses baseline methods in terms of both accuracy and transferability, in both intra- and cross-domain settings.

* Accepted at the 30th International Conference on Multimedia Modeling (MMM 2024)

Via

Access Paper or Ask Questions

Leveraging Graph Diffusion Models for Network Refinement Tasks

Nov 29, 2023

Puja Trivedi, Ryan Rossi, David Arbour, Tong Yu, Franck Dernoncourt, Sungchul Kim, Nedim Lipka, Namyong Park, Nesreen K. Ahmed, Danai Koutra

Figure 1 for Leveraging Graph Diffusion Models for Network Refinement Tasks

Figure 2 for Leveraging Graph Diffusion Models for Network Refinement Tasks

Figure 3 for Leveraging Graph Diffusion Models for Network Refinement Tasks

Figure 4 for Leveraging Graph Diffusion Models for Network Refinement Tasks

Abstract:Most real-world networks are noisy and incomplete samples from an unknown target distribution. Refining them by correcting corruptions or inferring unobserved regions typically improves downstream performance. Inspired by the impressive generative capabilities that have been used to correct corruptions in images, and the similarities between "in-painting" and filling in missing nodes and edges conditioned on the observed graph, we propose a novel graph generative framework, SGDM, which is based on subgraph diffusion. Our framework not only improves the scalability and fidelity of graph diffusion models, but also leverages the reverse process to perform novel, conditional generation tasks. In particular, through extensive empirical analysis and a set of novel metrics, we demonstrate that our proposed model effectively supports the following refinement tasks for partially observable networks: T1: denoising extraneous subgraphs, T2: expanding existing subgraphs and T3: performing "style" transfer by regenerating a particular subgraph to match the characteristics of a different node or subgraph.

* Work in Progress. 21 pages, 7 figures

Via

Access Paper or Ask Questions

Aspect-based Meeting Transcript Summarization: A Two-Stage Approach with Weak Supervision on Sentence Classification

Nov 07, 2023

Zhongfen Deng, Seunghyun Yoon, Trung Bui, Franck Dernoncourt, Quan Hung Tran, Shuaiqi Liu, Wenting Zhao, Tao Zhang, Yibo Wang, Philip S. Yu

Figure 1 for Aspect-based Meeting Transcript Summarization: A Two-Stage Approach with Weak Supervision on Sentence Classification

Figure 2 for Aspect-based Meeting Transcript Summarization: A Two-Stage Approach with Weak Supervision on Sentence Classification

Figure 3 for Aspect-based Meeting Transcript Summarization: A Two-Stage Approach with Weak Supervision on Sentence Classification

Figure 4 for Aspect-based Meeting Transcript Summarization: A Two-Stage Approach with Weak Supervision on Sentence Classification

Abstract:Aspect-based meeting transcript summarization aims to produce multiple summaries, each focusing on one aspect of content in a meeting transcript. It is challenging as sentences related to different aspects can mingle together, and those relevant to a specific aspect can be scattered throughout the long transcript of a meeting. The traditional summarization methods produce one summary mixing information of all aspects, which cannot deal with the above challenges of aspect-based meeting transcript summarization. In this paper, we propose a two-stage method for aspect-based meeting transcript summarization. To select the input content related to specific aspects, we train a sentence classifier on a dataset constructed from the AMI corpus with pseudo-labeling. Then we merge the sentences selected for a specific aspect as the input for the summarizer to produce the aspect-based summary. Experimental results on the AMI corpus outperform many strong baselines, which verifies the effectiveness of our proposed method.

* Accepted by 2023 IEEE International Conference on Big Data

Via

Access Paper or Ask Questions

OATS: Opinion Aspect Target Sentiment Quadruple Extraction Dataset for Aspect-Based Sentiment Analysis

Sep 23, 2023

Siva Uday Sampreeth Chebolu, Franck Dernoncourt, Nedim Lipka, Thamar Solorio

Figure 1 for OATS: Opinion Aspect Target Sentiment Quadruple Extraction Dataset for Aspect-Based Sentiment Analysis

Figure 2 for OATS: Opinion Aspect Target Sentiment Quadruple Extraction Dataset for Aspect-Based Sentiment Analysis

Figure 3 for OATS: Opinion Aspect Target Sentiment Quadruple Extraction Dataset for Aspect-Based Sentiment Analysis

Figure 4 for OATS: Opinion Aspect Target Sentiment Quadruple Extraction Dataset for Aspect-Based Sentiment Analysis

Abstract:Aspect-based sentiment Analysis (ABSA) delves into understanding sentiments specific to distinct elements within textual content. It aims to analyze user-generated reviews to determine a) the target entity being reviewed, b) the high-level aspect to which it belongs, c) the sentiment words used to express the opinion, and d) the sentiment expressed toward the targets and the aspects. While various benchmark datasets have fostered advancements in ABSA, they often come with domain limitations and data granularity challenges. Addressing these, we introduce the OATS dataset, which encompasses three fresh domains and consists of 20,000 sentence-level quadruples and 13,000 review-level tuples. Our initiative seeks to bridge specific observed gaps: the recurrent focus on familiar domains like restaurants and laptops, limited data for intricate quadruple extraction tasks, and an occasional oversight of the synergy between sentence and review-level sentiments. Moreover, to elucidate OATS's potential and shed light on various ABSA subtasks that OATS can solve, we conducted in-domain and cross-domain experiments, establishing initial baselines. We hope the OATS dataset augments current resources, paving the way for an encompassing exploration of ABSA.

* Initial submission

Via

Access Paper or Ask Questions