Alert button
Picture for Josh Magnus Ludan

Josh Magnus Ludan

Alert button

Interpretable-by-Design Text Classification with Iteratively Generated Concept Bottleneck

Oct 30, 2023
Josh Magnus Ludan, Qing Lyu, Yue Yang, Liam Dugan, Mark Yatskar, Chris Callison-Burch

Figure 1 for Interpretable-by-Design Text Classification with Iteratively Generated Concept Bottleneck
Figure 2 for Interpretable-by-Design Text Classification with Iteratively Generated Concept Bottleneck
Figure 3 for Interpretable-by-Design Text Classification with Iteratively Generated Concept Bottleneck
Figure 4 for Interpretable-by-Design Text Classification with Iteratively Generated Concept Bottleneck

Deep neural networks excel in text classification tasks, yet their application in high-stakes domains is hindered by their lack of interpretability. To address this, we propose Text Bottleneck Models (TBMs), an intrinsically interpretable text classification framework that offers both global and local explanations. Rather than directly predicting the output label, TBMs predict categorical values for a sparse set of salient concepts and use a linear layer over those concept values to produce the final prediction. These concepts can be automatically discovered and measured by a Large Language Model (LLM), without the need for human curation. On 12 diverse datasets, using GPT-4 for both concept generation and measurement, we show that TBMs can rival the performance of established black-box baselines such as GPT-4 fewshot and finetuned DeBERTa, while falling short against finetuned GPT-3.5. Overall, our findings suggest that TBMs are a promising new framework that enhances interpretability, with minimal performance tradeoffs, particularly for general-domain text.

Viaarxiv icon

Explanation-based Finetuning Makes Models More Robust to Spurious Cues

May 08, 2023
Josh Magnus Ludan, Yixuan Meng, Tai Nguyen, Saurabh Shah, Qing Lyu, Marianna Apidianaki, Chris Callison-Burch

Figure 1 for Explanation-based Finetuning Makes Models More Robust to Spurious Cues
Figure 2 for Explanation-based Finetuning Makes Models More Robust to Spurious Cues
Figure 3 for Explanation-based Finetuning Makes Models More Robust to Spurious Cues
Figure 4 for Explanation-based Finetuning Makes Models More Robust to Spurious Cues

Large Language Models (LLMs) are so powerful that they sometimes learn correlations between labels and features that are irrelevant to the task, leading to poor generalization on out-of-distribution data. We propose explanation-based finetuning as a novel and general approach to mitigate LLMs' reliance on spurious correlations. Unlike standard finetuning where the model only predicts the answer given the input, we finetune the model to additionally generate a free-text explanation supporting its answer. To evaluate our method, we finetune the model on artificially constructed training sets containing different types of spurious cues, and test it on a test set without these cues. Compared to standard finetuning, our method makes models remarkably more robust against spurious cues in terms of accuracy drop across four classification tasks: ComVE (+1.2), CREAK (+9.1), e-SNLI (+15.4), and SBIC (+6.5). Moreover, our method works equally well with explanations generated by the model, implying its applicability to more datasets without human-written explanations.

Viaarxiv icon