Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rebeckah K. Fussell

Scalable and consistent few-shot classification of survey responses using text embeddings

Aug 27, 2025

Jonas Timmann Mjaaland, Markus Fleten Kreutzer, Halvor Tyseng, Rebeckah K. Fussell, Gina Passante, N. G. Holmes, Anders Malthe-Sørenssen, Tor Ole B. Odden

Figure 1 for Scalable and consistent few-shot classification of survey responses using text embeddings

Figure 2 for Scalable and consistent few-shot classification of survey responses using text embeddings

Figure 3 for Scalable and consistent few-shot classification of survey responses using text embeddings

Figure 4 for Scalable and consistent few-shot classification of survey responses using text embeddings

Abstract:Qualitative analysis of open-ended survey responses is a commonly-used research method in the social sciences, but traditional coding approaches are often time-consuming and prone to inconsistency. Existing solutions from Natural Language Processing such as supervised classifiers, topic modeling techniques, and generative large language models have limited applicability in qualitative analysis, since they demand extensive labeled data, disrupt established qualitative workflows, and/or yield variable results. In this paper, we introduce a text embedding-based classification framework that requires only a handful of examples per category and fits well with standard qualitative workflows. When benchmarked against human analysis of a conceptual physics survey consisting of 2899 open-ended responses, our framework achieves a Cohen's Kappa ranging from 0.74 to 0.83 as compared to expert human coders in an exhaustive coding scheme. We further show how performance of this framework improves with fine-tuning of the text embedding model, and how the method can be used to audit previously-analyzed datasets. These findings demonstrate that text embedding-assisted coding can flexibly scale to thousands of responses without sacrificing interpretability, opening avenues for deductive qualitative analysis at scale.

Via

Access Paper or Ask Questions