Mansi Phute

Semantic Shift in Online Communities: Unmasking Information Flow through Linguistic Fingerprints

Under Review,
*Authors contributed equally

Julia Kruk*
Amit Bhatacharjeee
Sanchita Porwal


Understanding novel, community-specific language is pivotal in computational sociolinguistics, enabling better slang detection, resolving dog whistles, and enriching communication studies. We present a simple but powerful approach to the unsupervised resolution of community-specific neologisms by determining how a word’s meaning in a community diverges from its general use. This semantic shift is gauged by the evolution of a word’s embedded representation achieved through fine-tuning a pre-trained Large Language Model (LLM) on a community’s lexicon. We demonstrate that identified neologisms, such as politically-charged ‘dog whistles’, can be characterized by a high degree of semantic shift. Additionally, we investigate an application of this approach in the study of information flow, in which neologisms shared between communities could be indicative of shared user bases. We observe a statistically significant relationship between active users and shared neologisms, indicating that information flow between communities could be estimated from linguistic features without the need for complex user network representations.

Overlap between words having high semantic shift across communities. A chord represents a word having a high semantic shift in both communities it joins. Thicker chords show more overlap between the communities. Our results show high overlap between community pairs such as {relationships, relationship advice} and {news, worldnews}.