Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

James P. Bagrow

What we write about when we write about causality: Features of causal statements across large-scale social discourse

Apr 21, 2016

Thomas C. McAndrew, Joshua C. Bongard, Christopher M. Danforth, Peter S. Dodds, Paul D. H. Hines, James P. Bagrow

Figure 1 for What we write about when we write about causality: Features of causal statements across large-scale social discourse

Figure 2 for What we write about when we write about causality: Features of causal statements across large-scale social discourse

Figure 3 for What we write about when we write about causality: Features of causal statements across large-scale social discourse

Figure 4 for What we write about when we write about causality: Features of causal statements across large-scale social discourse

Abstract:Identifying and communicating relationships between causes and effects is important for understanding our world, but is affected by language structure, cognitive and emotional biases, and the properties of the communication medium. Despite the increasing importance of social media, much remains unknown about causal statements made online. To study real-world causal attribution, we extract a large-scale corpus of causal statements made on the Twitter social network platform as well as a comparable random control corpus. We compare causal and control statements using statistical language and sentiment analysis tools. We find that causal statements have a number of significant lexical and grammatical differences compared with controls and tend to be more negative in sentiment than controls. Causal statements made online tend to focus on news and current events, medicine and health, or interpersonal relationships, as shown by topic models. By quantifying the features and potential biases of causality communication, this study improves our understanding of the accuracy of information and opinions found online.

* 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), San Francisco, CA, 2016, pp. 519-524

Via

Access Paper or Ask Questions

Identifying missing dictionary entries with frequency-conserving context models

Jul 29, 2015

Jake Ryland Williams, Eric M. Clark, James P. Bagrow, Christopher M. Danforth, Peter Sheridan Dodds

Figure 1 for Identifying missing dictionary entries with frequency-conserving context models

Figure 2 for Identifying missing dictionary entries with frequency-conserving context models

Figure 3 for Identifying missing dictionary entries with frequency-conserving context models

Figure 4 for Identifying missing dictionary entries with frequency-conserving context models

Abstract:In an effort to better understand meaning from natural language texts, we explore methods aimed at organizing lexical objects into contexts. A number of these methods for organization fall into a family defined by word ordering. Unlike demographic or spatial partitions of data, these collocation models are of special importance for their universal applicability. While we are interested here in text and have framed our treatment appropriately, our work is potentially applicable to other areas of research (e.g., speech, genomics, and mobility patterns) where one has ordered categorical data, (e.g., sounds, genes, and locations). Our approach focuses on the phrase (whether word or larger) as the primary meaning-bearing lexical unit and object of study. To do so, we employ our previously developed framework for generating word-conserving phrase-frequency data. Upon training our model with the Wiktionary---an extensive, online, collaborative, and open-source dictionary that contains over 100,000 phrasal-definitions---we develop highly effective filters for the identification of meaningful, missing phrase-entries. With our predictions we then engage the editorial community of the Wiktionary and propose short lists of potential missing entries for definition, developing a breakthrough, lexical extraction technique, and expanding our knowledge of the defined English lexicon of phrases.

* 16 pages, 6 figures, and 7 tables

Via

Access Paper or Ask Questions

Zipf's law holds for phrases, not words

Mar 04, 2015

Jake Ryland Williams, Paul R. Lessard, Suma Desu, Eric Clark, James P. Bagrow, Christopher M. Danforth, Peter Sheridan Dodds

Figure 1 for Zipf's law holds for phrases, not words

Figure 2 for Zipf's law holds for phrases, not words

Figure 3 for Zipf's law holds for phrases, not words

Abstract:With Zipf's law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrases of one or more words comprise the most coherent units of meaning in language, we show empirically that Zipf's law for phrases extends over as many as nine orders of rank magnitude. In doing so, we develop a principled and scalable statistical mechanical method of random text partitioning, which opens up a rich frontier of rigorous text analysis via a rank ordering of mixed length phrases.

* Manuscript: 6 pages, 3 figures; Supplementary Information: 8 pages, 18 tables

Via

Access Paper or Ask Questions

Text mixing shapes the anatomy of rank-frequency distributions: A modern Zipfian mechanics for natural language

Jan 30, 2015

Jake Ryland Williams, James P. Bagrow, Christopher M. Danforth, Peter Sheridan Dodds

Figure 1 for Text mixing shapes the anatomy of rank-frequency distributions: A modern Zipfian mechanics for natural language

Figure 2 for Text mixing shapes the anatomy of rank-frequency distributions: A modern Zipfian mechanics for natural language

Figure 3 for Text mixing shapes the anatomy of rank-frequency distributions: A modern Zipfian mechanics for natural language

Figure 4 for Text mixing shapes the anatomy of rank-frequency distributions: A modern Zipfian mechanics for natural language

Abstract:Natural languages are full of rules and exceptions. One of the most famous quantitative rules is Zipf's law which states that the frequency of occurrence of a word is approximately inversely proportional to its rank. Though this `law' of ranks has been found to hold across disparate texts and forms of data, analyses of increasingly large corpora over the last 15 years have revealed the existence of two scaling regimes. These regimes have thus far been explained by a hypothesis suggesting a separability of languages into core and non-core lexica. Here, we present and defend an alternative hypothesis, that the two scaling regimes result from the act of aggregating texts. We observe that text mixing leads to an effective decay of word introduction, which we show provides accurate predictions of the location and severity of breaks in scaling. Upon examining large corpora from 10 languages in the Project Gutenberg eBooks collection (eBooks), we find emphatic empirical support for the universality of our claim.

* Phys. Rev. E 91, 052811 (2015)
* 9 pages, 6 figures, and 1 table

Via

Access Paper or Ask Questions

Human language reveals a universal positivity bias

Jun 15, 2014

Peter Sheridan Dodds, Eric M. Clark, Suma Desu, Morgan R. Frank, Andrew J. Reagan, Jake Ryland Williams, Lewis Mitchell, Kameron Decker Harris, Isabel M. Kloumann, James P. Bagrow(+4 more)

Figure 1 for Human language reveals a universal positivity bias

Figure 2 for Human language reveals a universal positivity bias

Figure 3 for Human language reveals a universal positivity bias

Figure 4 for Human language reveals a universal positivity bias

Abstract:Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (1) the words of natural human language possess a universal positivity bias; (2) the estimated emotional content of words is consistent between languages under translation; and (3) this positivity bias is strongly independent of frequency of word usage. Alongside these general regularities, we describe inter-language variations in the emotional spectrum of languages which allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts.

* Manuscript: 7 pages, 4 figures; Supplementary Material: 49 pages, 43 figures, 6 tables. Online appendices available at http://www.uvm.edu/storylab/share/papers/dodds2014a/

Via

Access Paper or Ask Questions