Abstract:Correcting misinformation in public online spaces often exposes users to hostility and ad hominem attacks, discouraging participation in corrective discourse. This study presents empirical evidence that invoking Grok, the native large language model on X, rather than directly confronting other users, is associated with different social responses during misinformation correction. Using an observational design, 100 correction replies across five high-conflict misinformation topics were analyzed, with corrections balanced between Grok-mediated and direct human-issued responses. The primary outcome was whether a correction received at least one ad hominem attack within a 24-hour window. Ad hominem attacks occurred in 72 percent of human-issued corrections and in none of the Grok-mediated corrections. A chi-square test confirmed a statistically significant association with a large effect size. These findings suggest that AI-mediated correction may alter the social dynamics of public disagreement by reducing interpersonal hostility during misinformation responses.
Abstract:This study embarked on a comprehensive exploration of user preferences between Search Engines and Large Language Models (LLMs) in the context of various information retrieval scenarios. Conducted with a sample size of 100 internet users (N=100) from across the United States, the research delved into 20 distinct use cases ranging from factual searches, such as looking up COVID-19 guidelines, to more subjective tasks, like seeking interpretations of complex concepts in layman's terms. Participants were asked to state their preference between using a traditional search engine or an LLM for each scenario. This approach allowed for a nuanced understanding of how users perceive and utilize these two predominant digital tools in differing contexts. The use cases were carefully selected to cover a broad spectrum of typical online queries, thus ensuring a comprehensive analysis of user preferences. The findings reveal intriguing patterns in user choices, highlighting a clear tendency for participants to favor search engines for direct, fact-based queries, while LLMs were more often preferred for tasks requiring nuanced understanding and language processing. These results offer valuable insights into the current state of digital information retrieval and pave the way for future innovations in this field. This study not only sheds light on the specific contexts in which each tool is favored but also hints at the potential for developing hybrid models that leverage the strengths of both search engines and LLMs. The insights gained from this research are pivotal for developers, researchers, and policymakers in understanding the evolving landscape of digital information retrieval and user interaction with these technologies.
Abstract:This study aimed to evaluate the proficiency of prominent Large Language Models (LLMs), namely OpenAI's ChatGPT 3.5 and 4.0, Google's Bard(LaMDA), and Microsoft's Bing AI in discerning the truthfulness of news items using black box testing. A total of 100 fact-checked news items, all sourced from independent fact-checking agencies, were presented to each of these LLMs under controlled conditions. Their responses were classified into one of three categories: True, False, and Partially True/False. The effectiveness of the LLMs was gauged based on the accuracy of their classifications against the verified facts provided by the independent agencies. The results showed a moderate proficiency across all models, with an average score of 65.25 out of 100. Among the models, OpenAI's GPT-4.0 stood out with a score of 71, suggesting an edge in newer LLMs' abilities to differentiate fact from deception. However, when juxtaposed against the performance of human fact-checkers, the AI models, despite showing promise, lag in comprehending the subtleties and contexts inherent in news information. The findings highlight the potential of AI in the domain of fact-checking while underscoring the continued importance of human cognitive skills and the necessity for persistent advancements in AI capabilities. Finally, the experimental data produced from the simulation of this work is openly available on Kaggle.