Alert button
Picture for Hayate Iso

Hayate Iso

Alert button

Distilling Large Language Models using Skill-Occupation Graph Context for HR-Related Tasks

Nov 10, 2023
Pouya Pezeshkpour, Hayate Iso, Thom Lake, Nikita Bhutani, Estevam Hruschka

Numerous HR applications are centered around resumes and job descriptions. While they can benefit from advancements in NLP, particularly large language models, their real-world adoption faces challenges due to absence of comprehensive benchmarks for various HR tasks, and lack of smaller models with competitive capabilities. In this paper, we aim to bridge this gap by introducing the Resume-Job Description Benchmark (RJDB). We meticulously craft this benchmark to cater to a wide array of HR tasks, including matching and explaining resumes to job descriptions, extracting skills and experiences from resumes, and editing resumes. To create this benchmark, we propose to distill domain-specific knowledge from a large language model (LLM). We rely on a curated skill-occupation graph to ensure diversity and provide context for LLMs generation. Our benchmark includes over 50 thousand triples of job descriptions, matched resumes and unmatched resumes. Using RJDB, we train multiple smaller student models. Our experiments reveal that the student models achieve near/better performance than the teacher model (GPT-4), affirming the effectiveness of the benchmark. Additionally, we explore the utility of RJDB on out-of-distribution data for skill extraction and resume-job description matching, in zero-shot and weak supervision manner. We release our datasets and code to foster further research and industry applications.

Viaarxiv icon

XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates

Sep 20, 2023
Haopeng Zhang, Hayate Iso, Sairam Gurajada, Nikita Bhutani

Figure 1 for XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates
Figure 2 for XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates
Figure 3 for XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates
Figure 4 for XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates

Text editing is a crucial task that involves modifying text to better align with user intents. However, existing text editing benchmark datasets have limitations in providing only coarse-grained instructions. Consequently, although the edited output may seem reasonable, it often deviates from the intended changes outlined in the gold reference, resulting in low evaluation scores. To comprehensively investigate the text editing capabilities of large language models, this paper introduces XATU, the first benchmark specifically designed for fine-grained instruction-based explainable text editing. XATU covers a wide range of topics and text types, incorporating lexical, syntactic, semantic, and knowledge-intensive edits. To enhance interpretability, we leverage high-quality data sources and human annotation, resulting in a benchmark that includes fine-grained instructions and gold-standard edit explanations. By evaluating existing open and closed large language models against our benchmark, we demonstrate the effectiveness of instruction tuning and the impact of underlying architecture across various editing tasks. Furthermore, extensive experimentation reveals the significant role of explanations in fine-tuning language models for text editing tasks. The benchmark will be open-sourced to support reproduction and facilitate future research.

* Work in progress 
Viaarxiv icon

Less is More for Long Document Summary Evaluation by LLMs

Sep 14, 2023
Yunshu Wu, Hayate Iso, Pouya Pezeshkpour, Nikita Bhutani, Estevam Hruschka

Figure 1 for Less is More for Long Document Summary Evaluation by LLMs
Figure 2 for Less is More for Long Document Summary Evaluation by LLMs
Figure 3 for Less is More for Long Document Summary Evaluation by LLMs
Figure 4 for Less is More for Long Document Summary Evaluation by LLMs

Large Language Models (LLMs) have shown promising performance in summary evaluation tasks, yet they face challenges such as high computational costs and the Lost-in-the-Middle problem where important information in the middle of long documents is often overlooked. To address these issues, this paper introduces a novel approach, Extract-then-Evaluate, which involves extracting key sentences from a long source document and then evaluating the summary by prompting LLMs. The results reveal that the proposed method not only significantly reduces evaluation costs but also exhibits a higher correlation with human evaluations. Furthermore, we provide practical recommendations for optimal document length and sentence extraction methods, contributing to the development of cost-effective yet more accurate methods for LLM-based text generation evaluation.

* Work in progress 
Viaarxiv icon

Zero-shot Triplet Extraction by Template Infilling

Dec 21, 2022
Bosung Kim, Hayate Iso, Nikita Bhutani, Estevam Hruschka, Ndapa Nakashole

Figure 1 for Zero-shot Triplet Extraction by Template Infilling
Figure 2 for Zero-shot Triplet Extraction by Template Infilling
Figure 3 for Zero-shot Triplet Extraction by Template Infilling
Figure 4 for Zero-shot Triplet Extraction by Template Infilling

Triplet extraction aims to extract entities and their corresponding relations in unstructured text. Most existing methods train an extraction model on high-quality training data, and hence are incapable of extracting relations that were not observed during training. Generalizing the model to unseen relations typically requires fine-tuning on synthetic training data which is often noisy and unreliable. In this paper, we argue that reducing triplet extraction to a template filling task over a pre-trained language model can equip the model with zero-shot learning capabilities and enable it to leverage the implicit knowledge in the language model. Embodying these ideas, we propose a novel framework, ZETT (ZEro-shot Triplet extraction by Template infilling), that is based on end-to-end generative transformers. Our experiments show that without any data augmentation or pipeline systems, ZETT can outperform previous state-of-the-art models with 25% less parameters. We further show that ZETT is more robust in detecting entities and can be incorporated with automatically generated templates for relations.

* 12 pages, 2 figures 
Viaarxiv icon

Noisy Pairing and Partial Supervision for Opinion Summarization

Nov 16, 2022
Hayate Iso, Xiaolan Wang, Yoshi Suhara

Figure 1 for Noisy Pairing and Partial Supervision for Opinion Summarization
Figure 2 for Noisy Pairing and Partial Supervision for Opinion Summarization
Figure 3 for Noisy Pairing and Partial Supervision for Opinion Summarization
Figure 4 for Noisy Pairing and Partial Supervision for Opinion Summarization

Current opinion summarization systems simply generate summaries reflecting important opinions from customer reviews, but the generated summaries may not attract the reader's attention. Although it is helpful to automatically generate professional reviewer-like summaries from customer reviews, collecting many training pairs of customer and professional reviews is generally tricky. We propose a weakly supervised opinion summarization framework, Noisy Pairing and Partial Supervision (NAPA) that can build a stylized opinion summarization system with no customer-professional review pairs. Experimental results show consistent improvements in automatic evaluation metrics, and qualitative analysis shows that our weakly supervised opinion summarization system can generate summaries that look more like those written by professional reviewers.

Viaarxiv icon

AutoTemplate: A Simple Recipe for Lexically Constrained Text Generation

Nov 15, 2022
Hayate Iso

Figure 1 for AutoTemplate: A Simple Recipe for Lexically Constrained Text Generation
Figure 2 for AutoTemplate: A Simple Recipe for Lexically Constrained Text Generation
Figure 3 for AutoTemplate: A Simple Recipe for Lexically Constrained Text Generation
Figure 4 for AutoTemplate: A Simple Recipe for Lexically Constrained Text Generation

Lexically constrained text generation is one of the constrained text generation tasks, which aims to generate text that covers all the given constraint lexicons. While the existing approaches tackle this problem using a lexically constrained beam search algorithm or dedicated model using non-autoregressive decoding, there is a trade-off between the generated text quality and the hard constraint satisfaction. We introduce AutoTemplate, a simple yet effective lexically constrained text generation framework divided into template generation and lexicalization tasks. The template generation is to generate the text with the placeholders, and lexicalization replaces them into the constraint lexicons to perform lexically constrained text generation. We conducted the experiments on two tasks: keywords-to-sentence generations and entity-guided summarization. Experimental results show that the AutoTemplate outperforms the competitive baselines on both tasks while satisfying the hard lexical constraints.

Viaarxiv icon

Comparative Opinion Summarization via Collaborative Decoding

Oct 14, 2021
Hayate Iso, Xiaolan Wang, Yoshihiko Suhara

Figure 1 for Comparative Opinion Summarization via Collaborative Decoding
Figure 2 for Comparative Opinion Summarization via Collaborative Decoding
Figure 3 for Comparative Opinion Summarization via Collaborative Decoding
Figure 4 for Comparative Opinion Summarization via Collaborative Decoding

Opinion summarization focuses on generating summaries that reflect popular opinions of multiple reviews for a single entity (e.g., a hotel or a product.) While generated summaries offer general and concise information about a particular entity, the information may be insufficient to help the user compare multiple entities. Thus, the user may still struggle with the question "Which one should I pick?" In this paper, we propose a {\em comparative opinion summarization} task, which is to generate two contrastive summaries and one common summary from two given sets of reviews from different entities. We develop a comparative summarization framework CoCoSum, which consists of two few-shot summarization models that are jointly used to generate contrastive and common summaries. Experimental results on a newly created benchmark CoCoTrip show that CoCoSum can produce high-quality contrastive and common summaries than state-of-the-art opinion summarization models.

Viaarxiv icon

Biomedical Entity Linking with Contrastive Context Matching

Jun 15, 2021
Shogo Ujiie, Hayate Iso, Eiji Aramaki

Figure 1 for Biomedical Entity Linking with Contrastive Context Matching
Figure 2 for Biomedical Entity Linking with Contrastive Context Matching
Figure 3 for Biomedical Entity Linking with Contrastive Context Matching
Figure 4 for Biomedical Entity Linking with Contrastive Context Matching

We introduce BioCoM, a contrastive learning framework for biomedical entity linking that uses only two resources: a small-sized dictionary and a large number of raw biomedical articles. Specifically, we build the training instances from raw PubMed articles by dictionary matching and use them to train a context-aware entity linking model with contrastive learning. We predict the normalized biomedical entity at inference time through a nearest-neighbor search. Results found that BioCoM substantially outperforms state-of-the-art models, especially in low-resource settings, by effectively using the context of the entities.

Viaarxiv icon

End-to-end Biomedical Entity Linking with Span-based Dictionary Matching

Apr 21, 2021
Shogo Ujiie, Hayate Iso, Shuntaro Yada, Shoko Wakamiya, Eiji Aramaki

Figure 1 for End-to-end Biomedical Entity Linking with Span-based Dictionary Matching
Figure 2 for End-to-end Biomedical Entity Linking with Span-based Dictionary Matching
Figure 3 for End-to-end Biomedical Entity Linking with Span-based Dictionary Matching
Figure 4 for End-to-end Biomedical Entity Linking with Span-based Dictionary Matching

Disease name recognition and normalization, which is generally called biomedical entity linking, is a fundamental process in biomedical text mining. Recently, neural joint learning of both tasks has been proposed to utilize the mutual benefits. While this approach achieves high performance, disease concepts that do not appear in the training dataset cannot be accurately predicted. This study introduces a novel end-to-end approach that combines span representations with dictionary-matching features to address this problem. Our model handles unseen concepts by referring to a dictionary while maintaining the performance of neural network-based models, in an end-to-end fashion. Experiments using two major datasets demonstrate that our model achieved competitive results with strong baselines, especially for unseen concepts during training.

Viaarxiv icon

Convex Aggregation for Opinion Summarization

Apr 03, 2021
Hayate Iso, Xiaolan Wang, Yoshihiko Suhara, Stefanos Angelidis, Wang-Chiew Tan

Figure 1 for Convex Aggregation for Opinion Summarization
Figure 2 for Convex Aggregation for Opinion Summarization
Figure 3 for Convex Aggregation for Opinion Summarization
Figure 4 for Convex Aggregation for Opinion Summarization

Recent approaches for unsupervised opinion summarization have predominantly used the review reconstruction training paradigm. An encoder-decoder model is trained to reconstruct single reviews and learns a latent review encoding space. At summarization time, the unweighted average of latent review vectors is decoded into a summary. In this paper, we challenge the convention of simply averaging the latent vector set, and claim that this simplistic approach fails to consider variations in the quality of input reviews or the idiosyncrasies of the decoder. We propose Coop, a convex vector aggregation framework for opinion summarization, that searches for better combinations of input reviews. Coop requires no further supervision and uses a simple word overlap objective to help the model generate summaries that are more consistent with input reviews. Experimental results show that extending opinion summarizers with Coop results in state-of-the-art performance, with ROUGE-1 improvements of 3.7% and 2.9% on the Yelp and Amazon benchmark datasets, respectively.

Viaarxiv icon