Abstract:Research on conspiracy theories has largely focused on belief formation, exposure, and diffusion, while paying less attention to how their meanings change over time. This gap persists partly because conspiracy-related terms are often treated as stable lexical markers, making it difficult to separate genuine semantic changes from surface-level vocabulary changes. In this paper, we measure the semantic structure and evolution of conspiracy theories in online political discourse. Using 169.9M comments from Reddit's r/politics subreddit spanning 2012--2022, we first demonstrate that conspiracy-related language forms coherent and semantically distinguishable regions of language space, allowing conspiracy theories to be treated as semantic objects. We then track how these objects evolve over time using aligned word embeddings, enabling comparisons of semantic neighborhoods across periods. Our analysis reveals that conspiracy theories evolve non-uniformly, exhibiting patterns of semantic stability, expansion, contraction, and replacement that are not captured by keyword-based approaches alone.
Abstract:The relationship between content production and consumption on algorithm-driven platforms like YouTube plays a critical role in shaping ideological behaviors. While prior work has largely focused on user behavior and algorithmic recommendations, the interplay between what is produced and what gets consumed, and its role in ideological shifts remains understudied. In this paper, we present a longitudinal, mixed-methods analysis combining one year of YouTube watch history with two waves of ideological surveys from 1,100 U.S. participants. We identify users who exhibited significant shifts toward more extreme ideologies and compare their content consumption and the production patterns of YouTube channels they engaged with to ideologically stable users. Our findings show that users who became more extreme consumed have different consumption habits from those who do not. This gets amplified by the fact that channels favored by users with extreme ideologies also have a higher affinity to produce content with a higher anger, grievance and other such markers. Lastly, using time series analysis, we examine whether content producers are the primary drivers of consumption behavior or merely responding to user demand.




Abstract:Data generated by audits of social media websites have formed the basis of our understanding of the biases presented in algorithmic content recommendation systems. As legislators around the world are beginning to consider regulating the algorithmic systems that drive online platforms, it is critical to ensure the correctness of these inferred biases. However, as we will show in this paper, doing so is a challenging task for a variety of reasons related to the complexity of configuration parameters associated with the audits that gather data from a specific platform. Focusing specifically on YouTube, we show that conducting audits to make inferences about YouTube's recommendation systems is more methodologically challenging than one might expect. There are many methodological decisions that need to be considered in order to obtain scientifically valid results, and each of these decisions incur costs. For example, should an auditor use (expensive to obtain) logged-in YouTube accounts while gathering recommendations from the algorithm to obtain more accurate inferences? We explore the impact of this and many other decisions and make some startling discoveries about the methodological choices that impact YouTube's recommendations. Taken all together, our research suggests auditing configuration compromises that YouTube auditors and researchers can use to reduce audit overhead, both economically and computationally, without sacrificing accuracy of their inferences. Similarly, we also identify several configuration parameters that have a significant impact on the accuracy of measured inferences and should be carefully considered.