Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Raphael Tang

Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models

Oct 11, 2023

Raphael Tang, Xinyu Zhang, Xueguang Ma, Jimmy Lin, Ferhan Ture

Figure 1 for Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models

Figure 2 for Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models

Figure 3 for Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models

Figure 4 for Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models

Abstract:Large language models (LLMs) exhibit positional bias in how they use context, which especially complicates listwise ranking. To address this, we propose permutation self-consistency, a form of self-consistency over ranking list outputs of black-box LLMs. Our key idea is to marginalize out different list orders in the prompt to produce an order-independent ranking with less positional bias. First, given some input prompt, we repeatedly shuffle the list in the prompt and pass it through the LLM while holding the instructions the same. Next, we aggregate the resulting sample of rankings by computing the central ranking closest in distance to all of them, marginalizing out prompt order biases in the process. Theoretically, we prove the robustness of our method, showing convergence to the true ranking in the presence of random perturbations. Empirically, on five list-ranking datasets in sorting and passage reranking, our approach improves scores from conventional inference by up to 7-18% for GPT-3.5 and 8-16% for LLaMA v2 (70B), surpassing the previous state of the art in passage reranking. Our code is at https://github.com/castorini/perm-sc.

* First two authors contributed equally; 10 pages, 6 figures

Via

Access Paper or Ask Questions

Less is More: Parameter-Free Text Classification with Gzip

Dec 19, 2022

Zhiying Jiang, Matthew Y. R. Yang, Mikhail Tsirlin, Raphael Tang, Jimmy Lin

Abstract:Deep neural networks (DNNs) are often used for text classification tasks as they usually achieve high levels of accuracy. However, DNNs can be computationally intensive with billions of parameters and large amounts of labeled data, which can make them expensive to use, to optimize and to transfer to out-of-distribution (OOD) cases in practice. In this paper, we propose a non-parametric alternative to DNNs that's easy, light-weight and universal in text classification: a combination of a simple compressor like gzip with a $k$-nearest-neighbor classifier. Without any training, pre-training or fine-tuning, our method achieves results that are competitive with non-pretrained deep learning methods on six in-distributed datasets. It even outperforms BERT on all five OOD datasets, including four low-resource languages. Our method also performs particularly well in few-shot settings where labeled data are too scarce for DNNs to achieve a satisfying accuracy.

Via

Access Paper or Ask Questions

SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale

Nov 21, 2022

Raphael Tang, Karun Kumar, Gefei Yang, Akshat Pandey, Yajie Mao, Vladislav Belyaev, Madhuri Emmadi, Craig Murray, Ferhan Ture, Jimmy Lin

Figure 1 for SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale

Figure 2 for SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale

Figure 3 for SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale

Figure 4 for SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale

Abstract:End-to-end automatic speech recognition systems represent the state of the art, but they rely on thousands of hours of manually annotated speech for training, as well as heavyweight computation for inference. Of course, this impedes commercialization since most companies lack vast human and computational resources. In this paper, we explore training and deploying an ASR system in the label-scarce, compute-limited setting. To reduce human labor, we use a third-party ASR system as a weak supervision source, supplemented with labeling functions derived from implicit user feedback. To accelerate inference, we propose to route production-time queries across a pool of CUDA graphs of varying input lengths, the distribution of which best matches the traffic's. Compared to our third-party ASR, we achieve a relative improvement in word-error rate of 8% and a speedup of 600%. Our system, called SpeechNet, currently serves 12 million queries per day on our voice-enabled smart television. To our knowledge, this is the first time a large-scale, Wav2vec-based deployment has been described in the academic literature.

* Accepted to EMNLP 2022 Industry Track; 9 pages, 7 figures

Via

Access Paper or Ask Questions

What the DAAM: Interpreting Stable Diffusion Using Cross Attention

Oct 11, 2022

Raphael Tang, Akshat Pandey, Zhiying Jiang, Gefei Yang, Karun Kumar, Jimmy Lin, Ferhan Ture

Figure 1 for What the DAAM: Interpreting Stable Diffusion Using Cross Attention

Figure 2 for What the DAAM: Interpreting Stable Diffusion Using Cross Attention

Figure 3 for What the DAAM: Interpreting Stable Diffusion Using Cross Attention

Figure 4 for What the DAAM: Interpreting Stable Diffusion Using Cross Attention

Abstract:Large-scale diffusion neural networks represent a substantial milestone in text-to-image generation, with some performing similar to real photographs in human evaluation. However, they remain poorly understood, lacking explainability and interpretability analyses, largely due to their proprietary, closed-source nature. In this paper, to shine some much-needed light on text-to-image diffusion models, we perform a text-image attribution analysis on Stable Diffusion, a recently open-sourced large diffusion model. To produce pixel-level attribution maps, we propose DAAM, a novel method based on upscaling and aggregating cross-attention activations in the latent denoising subnetwork. We support its correctness by evaluating its unsupervised semantic segmentation quality on its own generated imagery, compared to supervised segmentation models. We show that DAAM performs strongly on COCO caption-generated images, achieving an mIoU of 61.0, and it outperforms supervised models on open-vocabulary segmentation, for an mIoU of 51.5. We further find that certain parts of speech, like punctuation and conjunctions, influence the generated imagery most, which agrees with the prior literature, while determiners and numerals the least, suggesting poor numeracy. To our knowledge, we are the first to propose and study word-pixel attribution for large-scale text-to-image diffusion models. Our code and data are at https://github.com/castorini/daam.

* 5 pages, 5 figures

Via

Access Paper or Ask Questions

Building an Efficiency Pipeline: Commutativity and Cumulativeness of Efficiency Operators for Transformers

Jul 31, 2022

Ji Xin, Raphael Tang, Zhiying Jiang, Yaoliang Yu, Jimmy Lin

Figure 1 for Building an Efficiency Pipeline: Commutativity and Cumulativeness of Efficiency Operators for Transformers

Figure 2 for Building an Efficiency Pipeline: Commutativity and Cumulativeness of Efficiency Operators for Transformers

Figure 3 for Building an Efficiency Pipeline: Commutativity and Cumulativeness of Efficiency Operators for Transformers

Figure 4 for Building an Efficiency Pipeline: Commutativity and Cumulativeness of Efficiency Operators for Transformers

Abstract:There exists a wide variety of efficiency methods for natural language processing (NLP) tasks, such as pruning, distillation, dynamic inference, quantization, etc. We can consider an efficiency method as an operator applied on a model. Naturally, we may construct a pipeline of multiple efficiency methods, i.e., to apply multiple operators on the model sequentially. In this paper, we study the plausibility of this idea, and more importantly, the commutativity and cumulativeness of efficiency operators. We make two interesting observations: (1) Efficiency operators are commutative -- the order of efficiency methods within the pipeline has little impact on the final results; (2) Efficiency operators are also cumulative -- the final results of combining several efficiency methods can be estimated by combining the results of individual methods. These observations deepen our understanding of efficiency operators and provide useful guidelines for their real-world applications.

Via

Access Paper or Ask Questions

Inserting Information Bottlenecks for Attribution in Transformers

Dec 27, 2020

Zhiying Jiang, Raphael Tang, Ji Xin, Jimmy Lin

Figure 1 for Inserting Information Bottlenecks for Attribution in Transformers

Figure 2 for Inserting Information Bottlenecks for Attribution in Transformers

Figure 3 for Inserting Information Bottlenecks for Attribution in Transformers

Figure 4 for Inserting Information Bottlenecks for Attribution in Transformers

Abstract:Pretrained transformers achieve the state of the art across tasks in natural language processing, motivating researchers to investigate their inner mechanisms. One common direction is to understand what features are important for prediction. In this paper, we apply information bottlenecks to analyze the attribution of each feature for prediction on a black-box model. We use BERT as the example and evaluate our approach both quantitatively and qualitatively. We show the effectiveness of our method in terms of attribution and the ability to provide insight into how information flows through layers. We demonstrate that our technique outperforms two competitive methods in degradation tests on four datasets. Code is available at https://github.com/bazingagin/IBA.

* Accepted by EMNLP2020 Findings

Via

Access Paper or Ask Questions

Howl: A Deployed, Open-Source Wake Word Detection System

Aug 21, 2020

Raphael Tang, Jaejun Lee, Afsaneh Razi, Julia Cambre, Ian Bicking, Jofish Kaye, Jimmy Lin

Figure 1 for Howl: A Deployed, Open-Source Wake Word Detection System

Figure 2 for Howl: A Deployed, Open-Source Wake Word Detection System

Figure 3 for Howl: A Deployed, Open-Source Wake Word Detection System

Abstract:We describe Howl, an open-source wake word detection toolkit with native support for open speech datasets, like Mozilla Common Voice and Google Speech Commands. We report benchmark results on Speech Commands and our own freely available wake word detection dataset, built from MCV. We operationalize our system for Firefox Voice, a plugin enabling speech interactivity for the Firefox web browser. Howl represents, to the best of our knowledge, the first fully productionized yet open-source wake word detection toolkit with a web browser deployment target. Our codebase is at https://github.com/castorini/howl.

* The first two authors contributed equally

Via

Access Paper or Ask Questions

Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset

Jul 14, 2020

Edwin Zhang, Nikhil Gupta, Raphael Tang, Xiao Han, Ronak Pradeep, Kuang Lu, Yue Zhang, Rodrigo Nogueira, Kyunghyun Cho, Hui Fang(+1 more)

Figure 1 for Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset

Figure 2 for Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset

Abstract:We present Covidex, a search engine that exploits the latest neural ranking models to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI. Our system has been online and serving users since late March 2020. The Covidex is the user application component of our three-pronged strategy to develop technologies for helping domain experts tackle the ongoing global pandemic. In addition, we provide robust and easy-to-use keyword search infrastructure that exploits mature fusion-based methods as well as standalone neural ranking models that can be incorporated into other applications. These techniques have been evaluated in the ongoing TREC-COVID challenge: Our infrastructure and baselines have been adopted by many participants, including some of the highest-scoring runs in rounds 1, 2, and 3. In round 3, we report the highest-scoring run that takes advantage of previous training data and the second-highest fully automatic run.

* arXiv admin note: text overlap with arXiv:2004.05125

Via

Access Paper or Ask Questions

Showing Your Work Doesn't Always Work

Apr 28, 2020

Raphael Tang, Jaejun Lee, Ji Xin, Xinyu Liu, Yaoliang Yu, Jimmy Lin

Figure 1 for Showing Your Work Doesn't Always Work

Figure 2 for Showing Your Work Doesn't Always Work

Figure 3 for Showing Your Work Doesn't Always Work

Figure 4 for Showing Your Work Doesn't Always Work

Abstract:In natural language processing, a recently popular line of work explores how to best report the experimental results of neural networks. One exemplar publication, titled "Show Your Work: Improved Reporting of Experimental Results," advocates for reporting the expected validation effectiveness of the best-tuned model, with respect to the computational budget. In the present work, we critically examine this paper. As far as statistical generalizability is concerned, we find unspoken pitfalls and caveats with this approach. We analytically show that their estimator is biased and uses error-prone assumptions. We find that the estimator favors negative errors and yields poor bootstrapped confidence intervals. We derive an unbiased alternative and bolster our claims with empirical evidence from statistical simulation. Our codebase is at http://github.com/castorini/meanmax.

* Accepted to ACL 2020

Via

Access Paper or Ask Questions

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

Apr 27, 2020

Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, Jimmy Lin

Figure 1 for DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

Figure 2 for DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

Figure 3 for DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

Figure 4 for DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

Abstract:Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications. However, they are also notorious for being slow in inference, which makes them difficult to deploy in real-time applications. We propose a simple but effective method, DeeBERT, to accelerate BERT inference. Our approach allows samples to exit earlier without passing through the entire model. Experiments show that DeeBERT is able to save up to ~40% inference time with minimal degradation in model quality. Further analyses show different behaviors in the BERT transformer layers and also reveal their redundancy. Our work provides new ideas to efficiently apply deep transformer-based models to downstream tasks. Code is available at https://github.com/castorini/DeeBERT.

* Accepted at ACL 2020

Via

Access Paper or Ask Questions