Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bill Byrne

Control-DAG: Constrained Decoding for Non-Autoregressive Directed Acyclic T5 using Weighted Finite State Automata

Apr 10, 2024

Jinghong Chen, Weizhe Lin, Jingbiao Mei, Bill Byrne

Figure 1 for Control-DAG: Constrained Decoding for Non-Autoregressive Directed Acyclic T5 using Weighted Finite State Automata

Figure 2 for Control-DAG: Constrained Decoding for Non-Autoregressive Directed Acyclic T5 using Weighted Finite State Automata

Figure 3 for Control-DAG: Constrained Decoding for Non-Autoregressive Directed Acyclic T5 using Weighted Finite State Automata

Figure 4 for Control-DAG: Constrained Decoding for Non-Autoregressive Directed Acyclic T5 using Weighted Finite State Automata

Abstract:The Directed Acyclic Transformer is a fast non-autoregressive (NAR) model that performs well in Neural Machine Translation. Two issues prevent its application to general Natural Language Generation (NLG) tasks: frequent Out-Of-Vocabulary (OOV) errors and the inability to faithfully generate entity names. We introduce Control-DAG, a constrained decoding algorithm for our Directed Acyclic T5 (DA-T5) model which offers lexical, vocabulary and length control. We show that Control-DAG significantly enhances DA-T5 on the Schema Guided Dialogue and the DART datasets, establishing strong NAR results for Task-Oriented Dialogue and Data-to-Text NLG.

* 11 pages. NAACL 2024

Via

Access Paper or Ask Questions

Few-Shot VQA with Frozen LLMs: A Tale of Two Approaches

Mar 17, 2024

Igor Sterner, Weizhe Lin, Jinghong Chen, Bill Byrne

Abstract:Two approaches have emerged to input images into large language models (LLMs). The first is to caption images into natural language. The second is to map image feature embeddings into the domain of the LLM and pass the mapped embeddings directly to the LLM. The majority of recent few-shot multimodal work reports performance using architectures that employ variations of one of these two approaches. But they overlook an important comparison between them. We design a controlled and focused experiment to compare these two approaches to few-shot visual question answering (VQA) with LLMs. Our findings indicate that for Flan-T5 XL, a 3B parameter LLM, connecting visual embeddings directly to the LLM embedding space does not guarantee improved performance over using image captions. In the zero-shot regime, we find using textual image captions is better. In the few-shot regimes, how the in-context examples are selected determines which is better.

Via

Access Paper or Ask Questions

PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Mar 15, 2024

Hakim Sidahmed, Samrat Phatale, Alex Hutcheson, Zhuonan Lin, Zhang Chen, Zac Yu, Jarvis Jin, Roman Komarytsia, Christiane Ahlheim, Yonghao Zhu(+9 more)

Figure 1 for PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Figure 2 for PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Figure 3 for PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Figure 4 for PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Abstract:Reinforcement Learning from Human Feedback (RLHF) has proven to be a strong method to align Pretrained Large Language Models (LLMs) with human preferences. But training models with RLHF is computationally expensive, and an overall complex process. In this work, we study RLHF where the underlying models are trained using the parameter efficient method of Low-Rank Adaptation (LoRA) introduced by Hu et al. [2021]. We investigate the setup of "Parameter Efficient Reinforcement Learning" (PERL), in which we perform reward model training and reinforcement learning using LoRA. We compare PERL to conventional fine-tuning (full-tuning) across various configurations for 7 benchmarks, including 2 novel datasets, of reward modeling and reinforcement learning. We find that PERL performs on par with the conventional RLHF setting, while training faster, and with less memory. This enables the high performance of RLHF, while reducing the computational burden that limits its adoption as an alignment technique for Large Language Models. We also release 2 novel thumbs up/down preference datasets: "Taskmaster Coffee", and "Taskmaster Ticketing" to promote research around RLHF.

Via

Access Paper or Ask Questions

PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers

Feb 13, 2024

Weizhe Lin, Jingbiao Mei, Jinghong Chen, Bill Byrne

Figure 1 for PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers

Figure 2 for PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers

Figure 3 for PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers

Figure 4 for PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers

Abstract:Large Multimodal Models (LMMs) excel in natural language and visual understanding but are challenged by exacting tasks such as Knowledge-based Visual Question Answering (KB-VQA) which involve the retrieval of relevant information from document collections to use in shaping answers to questions. We present an extensive training and evaluation framework, M2KR, for KB-VQA. M2KR contains a collection of vision and language tasks which we have incorporated into a single suite of benchmark tasks for training and evaluating general-purpose multi-modal retrievers. We use M2KR to develop PreFLMR, a pre-trained version of the recently developed Fine-grained Late-interaction Multi-modal Retriever (FLMR) approach to KB-VQA, and we report new state-of-the-art results across a range of tasks. We also present investigations into the scaling behaviors of PreFLMR intended to be useful in future developments in general-purpose multi-modal retrievers.

* 8 pages

Via

Access Paper or Ask Questions

Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding

Nov 14, 2023

Guangyu Yang, Jinghong Chen, Weizhe Lin, Bill Byrne

Figure 1 for Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding

Figure 2 for Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding

Figure 3 for Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding

Figure 4 for Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding

Abstract:Minimum Bayes Risk (MBR) decoding can significantly improve translation performance of Multilingual Large Language Models (MLLMs). However, MBR decoding is computationally expensive and in this paper, we show how recently developed Reinforcement Learning (RL) technique, Direct Preference Optimization (DPO) can be used to fine-tune MLLMs so that we get the gains from MBR without the additional computation in inference. Our fine-tuned models have significantly improved performance on multiple NMT test sets compared to base MLLMs without preference optimization. Our method boosts the translation performance of MLLMs using relatively small monolingual fine-tuning sets.

Via

Access Paper or Ask Questions

Improving hateful memes detection via learning hatefulness-aware embedding space through retrieval-guided contrastive learning

Nov 14, 2023

Jingbiao Mei, Jinghong Chen, Weizhe Lin, Bill Byrne, Marcus Tomalin

Figure 1 for Improving hateful memes detection via learning hatefulness-aware embedding space through retrieval-guided contrastive learning

Figure 2 for Improving hateful memes detection via learning hatefulness-aware embedding space through retrieval-guided contrastive learning

Figure 3 for Improving hateful memes detection via learning hatefulness-aware embedding space through retrieval-guided contrastive learning

Figure 4 for Improving hateful memes detection via learning hatefulness-aware embedding space through retrieval-guided contrastive learning

Abstract:Hateful memes have emerged as a significant concern on the Internet. These memes, which are a combination of image and text, often convey messages vastly different from their individual meanings. Thus, detecting hateful memes requires the system to jointly understand the visual and textual modalities. However, our investigation reveals that the embedding space of existing CLIP-based systems lacks sensitivity to subtle differences in memes that are vital for correct hatefulness classification. To address this issue, we propose constructing a hatefulness-aware embedding space through retrieval-guided contrastive training. Specifically, we add an auxiliary loss that utilizes hard negative and pseudo-gold samples to train the embedding space. Our approach achieves state-of-the-art performance on the HatefulMemes dataset with an AUROC of 86.7. Notably, our approach outperforms much larger fine-tuned Large Multimodal Models like Flamingo and LLaVA. Finally, we demonstrate a retrieval-based hateful memes detection system, which is capable of making hatefulness classification based on data unseen in training from a database. This allows developers to update the hateful memes detection system by simply adding new data without retraining, a desirable feature for real services in the constantly-evolving landscape of hateful memes on the Internet.

Via

Access Paper or Ask Questions

Promptor: A Conversational and Autonomous Prompt Generation Agent for Intelligent Text Entry Techniques

Oct 15, 2023

Junxiao Shen, John J. Dudley, Jingyao Zheng, Bill Byrne, Per Ola Kristensson

Abstract:Text entry is an essential task in our day-to-day digital interactions. Numerous intelligent features have been developed to streamline this process, making text entry more effective, efficient, and fluid. These improvements include sentence prediction and user personalization. However, as deep learning-based language models become the norm for these advanced features, the necessity for data collection and model fine-tuning increases. These challenges can be mitigated by harnessing the in-context learning capability of large language models such as GPT-3.5. This unique feature allows the language model to acquire new skills through prompts, eliminating the need for data collection and fine-tuning. Consequently, large language models can learn various text prediction techniques. We initially showed that, for a sentence prediction task, merely prompting GPT-3.5 surpassed a GPT-2 backed system and is comparable with a fine-tuned GPT-3.5 model, with the latter two methods requiring costly data collection, fine-tuning and post-processing. However, the task of prompting large language models to specialize in specific text prediction tasks can be challenging, particularly for designers without expertise in prompt engineering. To address this, we introduce Promptor, a conversational prompt generation agent designed to engage proactively with designers. Promptor can automatically generate complex prompts tailored to meet specific needs, thus offering a solution to this challenge. We conducted a user study involving 24 participants creating prompts for three intelligent text entry tasks, half of the participants used Promptor while the other half designed prompts themselves. The results show that Promptor-designed prompts result in a 35% increase in similarity and 22% in coherence over those by designers.

Via

Access Paper or Ask Questions

Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering

Sep 29, 2023

Weizhe Lin, Jinghong Chen, Jingbiao Mei, Alexandru Coca, Bill Byrne

Figure 1 for Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering

Figure 2 for Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering

Figure 3 for Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering

Figure 4 for Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering

Abstract:Knowledge-based Visual Question Answering (KB-VQA) requires VQA systems to utilize knowledge from existing knowledge bases to answer visually-grounded questions. Retrieval-Augmented Visual Question Answering (RA-VQA), a strong framework to tackle KB-VQA, first retrieves related documents with Dense Passage Retrieval (DPR) and then uses them to answer questions. This paper proposes Fine-grained Late-interaction Multi-modal Retrieval (FLMR) which significantly improves knowledge retrieval in RA-VQA. FLMR addresses two major limitations in RA-VQA's retriever: (1) the image representations obtained via image-to-text transforms can be incomplete and inaccurate and (2) relevance scores between queries and documents are computed with one-dimensional embeddings, which can be insensitive to finer-grained relevance. FLMR overcomes these limitations by obtaining image representations that complement those from the image-to-text transforms using a vision model aligned with an existing text-based retriever through a simple alignment network. FLMR also encodes images and questions using multi-dimensional embeddings to capture finer-grained relevance between queries and documents. FLMR significantly improves the original RA-VQA retriever's PRRecall@5 by approximately 8\%. Finally, we equipped RA-VQA with two state-of-the-art large multi-modal/language models to achieve $\sim61\%$ VQA score in the OK-VQA dataset.

* To appear at NeurIPS 2023. This is a submission version, and the camera-ready version will be updated soon

Via

Access Paper or Ask Questions

Grounding Description-Driven Dialogue State Trackers with Knowledge-Seeking Turns

Sep 23, 2023

Alexandru Coca, Bo-Hsiang Tseng, Jinghong Chen, Weizhe Lin, Weixuan Zhang, Tisha Anders, Bill Byrne

Abstract:Schema-guided dialogue state trackers can generalise to new domains without further training, yet they are sensitive to the writing style of the schemata. Augmenting the training set with human or synthetic schema paraphrases improves the model robustness to these variations but can be either costly or difficult to control. We propose to circumvent these issues by grounding the state tracking model in knowledge-seeking turns collected from the dialogue corpus as well as the schema. Including these turns in prompts during finetuning and inference leads to marked improvements in model robustness, as demonstrated by large average joint goal accuracy and schema sensitivity improvements on SGD and SGD-X.

* Best Long Paper of SIGDIAL 2023

Via

Access Paper or Ask Questions

xPQA: Cross-Lingual Product Question Answering across 12 Languages

May 16, 2023

Xiaoyu Shen, Akari Asai, Bill Byrne, Adrià de Gispert

Abstract:Product Question Answering (PQA) systems are key in e-commerce applications to provide responses to customers' questions as they shop for products. While existing work on PQA focuses mainly on English, in practice there is need to support multiple customer languages while leveraging product information available in English. To study this practical industrial task, we present xPQA, a large-scale annotated cross-lingual PQA dataset in 12 languages across 9 branches, and report results in (1) candidate ranking, to select the best English candidate containing the information to answer a non-English question; and (2) answer generation, to generate a natural-sounding non-English answer based on the selected English candidate. We evaluate various approaches involving machine translation at runtime or offline, leveraging multilingual pre-trained LMs, and including or excluding xPQA training data. We find that (1) In-domain data is essential as cross-lingual rankers trained on other domains perform poorly on the PQA task; (2) Candidate ranking often prefers runtime-translation approaches while answer generation prefers multilingual approaches; (3) Translating offline to augment multilingual models helps candidate ranking mainly on languages with non-Latin scripts; and helps answer generation mainly on languages with Latin scripts. Still, there remains a significant performance gap between the English and the cross-lingual test sets.

* ACL 2023 industry track. Dataset available in https://github.com/amazon-science/contextual-product-qa

Via

Access Paper or Ask Questions