Exploring trends and themes of science communication using probabilistic topic modelling: Four decades of Science Communication

Author: Rikki Lee Mendiola – University of the Philippines, Philippines


  • Garry Jay Montemayor – University of the Philippines, Philippines

Several studies in the past have attempted to reflect on how theory and practice of science communication (scicom) as a field had progressed through the years. Scicom’s intellectual progression is usually discussed: vis-a-vis the conceptual advancements of communication discipline (Trench, 2008); and in parallel with the dominant technology of the times (Kurath & Gisler, 2009). The advancement of scicom is commonly investigated through content analysis (Bauer & Howard, 2013), case studies (Brossard & Lewenstein, 2010), and bibliometric study (Suerdem et al., 2013).

We propose an exploration of a novel method””probabilistic topic modelling””to generate latent thematic structure on a given corpus (Blei, 2012). We are using data science toolkits (implemented using Python) in exploring the collection of abstracts of peer-reviewed articles in the journal Science Communication from 1979 to 2019: data collection and pre-processing, exploratory data analysis (i.e., term frequency and inverse document frequency), and unsupervised machine learning model i.e. Latent Dirichlet allocation (LDA) for topic modelling. Furthermore, emerging topics from the ML model will be explored further using word embeddings to examine semantic similarity.

The proposed paper mainly argues that the computational approach to content analysis employed is useful in exploring a large corpus of data””establishing a macroview of science communication through the years. First, the existing knowledge claims on the progression of scicom as a field can be validated. Second, emerging themes for scholarly debate can be identified. Using inductive approach with quantitative measurements in studying text (Maier et al., 2018), this proposed paper aims to establish the potential of examining text using machine learning in natural language processing.

There are 1331 articles analyzed in total. Initial EDA results can be viewed here. The study is currently implementing unsupervised ML, word embeddings, and the evaluation of the model for its realiability as part of the results in the final paper.

The author has not yet submitted a copy of the full paper.

Presentation type: Individual paper
Theme: Time