Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marshall A. Taylor

Selecting Language Models for Social Science: Start Small, Start Open, and Validate

Jan 16, 2026

Dustin S. Stoltz, Marshall A. Taylor, Sanuj Kumar

Abstract:Currently, there are thousands of large pretrained language models (LLMs) available to social scientists. How do we select among them? Using validity, reliability, reproducibility, and replicability as guides, we explore the significance of: (1) model openness, (2) model footprint, (3) training data, and (4) model architectures and fine-tuning. While ex-ante tests of validity (i.e., benchmarks) are often privileged in these discussions, we argue that social scientists cannot altogether avoid validating computational measures (ex-post). Replicability, in particular, is a more pressing guide for selecting language models. Being able to reliably replicate a particular finding that entails the use of a language model necessitates reliably reproducing a task. To this end, we propose starting with smaller, open models, and constructing delimited benchmarks to demonstrate the validity of the entire computational pipeline.

Via

Access Paper or Ask Questions

Cultural Cartography with Word Embeddings

Jul 12, 2020

Dustin S. Stoltz, Marshall A. Taylor

Figure 1 for Cultural Cartography with Word Embeddings

Figure 2 for Cultural Cartography with Word Embeddings

Figure 3 for Cultural Cartography with Word Embeddings

Figure 4 for Cultural Cartography with Word Embeddings

Abstract:Using the presence or frequency of keywords is a classic approach in the formal analysis of text, but has the drawback of glossing over the relationality of word meanings. Word embedding models overcome this problem by constructing a standardized meaning space where words are assigned a location based on relations of similarity to, and difference from, other words based on how they are used in natural language samples. We show how word embeddings can be put to the task of interpretation via two kinds of navigation. First, one can hold terms constant and measure how the embedding space moves around them--much like astronomers measured the changing of celestial bodies with the seasons. Second, one can also hold the embedding space constant and see how documents or authors move relative to it--just as ships use the stars on a given night to determine their location. Using the empirical case of immigration discourse in the United States, we demonstrate the merits of these two broad strategies to advance formal approaches to cultural analysis.

Via

Access Paper or Ask Questions