Jupyter Notebooks ¶

The following notebooks are available in the project’s repo under the directory notebooks . A short description for each notebook is listed below:

Notebook	Description
1.0-jf-fetching-tweets-example	Contains a simple demonstration on how to fetch tweets using a Twitter Sanbox Environment. The sample data is saved in the form of a json file, which must then be preprocessed.
1.1-jf-data-pull-testing	Applies and tests the data pulling functions in the src directory.
1.2-jf-data-etl-example	Basic data extraction, transformation, exploring and highlighting the important information about tweet and user object metadata.
1.3-jr-data-etl-load-ES	Exploring and transforming the contents in the raw data, and loading into the Elasticsearch database for further preprocessing.
1.4-jf-data-etl-testing	Implements the functions developed for Twitter data ETL. See notebooks 1.2 and 1.3, as well as documentation to understand the steps developed/implemented in these functions.
1.5-km-tweet-preprocessing	Runs the preprocessing pipeline on on the sample data extracted during the data pull and etl steps.
1.5-jr-tweet-preprocessing-full-data	Runs the preprocessing pipeline and VADER sentiment model on the entire set of transformed tweets stored in the ES database.
1.6-jr-tweet-preprocessing-extension	Loading the preprocessed data into the ES database using function in the src directory.
2.0-msc-basic-EDA	Performs an initial EDA based on the sample data extracted during the data pull and etl steps.
2.1-jf-EDA	Performs a detailed EDA based on the entire set of transformed tweets.
3.0-jf-User2Vec	Generates user vectors based on average doc2vec representations for each user. Implementation based on Hallacet al, 2019 .
3.0-jf-network-analysis	Builds a user network based on the number of retweets and/or replies among users.
3.0-jr-tweets2vec	Generates 200-dimensional tweet vectors based on doc2vec implementation for the unique tweets/replies.
3.0-msc-pov-analysis	Explores the results of Point-of-View analysis of unique tweets and replies.
3.0-km-topic-modelling-lda	Implements topic modelling with LDA and PyLDAvis visualization over the unique tweets/replies.
3.1-km-topic-modelling-biterm	Implements Biterm Topic Model over the unique tweets/replies.
3.1-km-topic-modelling-nmf	Implements NMF topic modelling with wordclouds to visualize the topics.
3.2-km-user-topic-analysis	Explores user-topic relationship with topics generated from LDA and NMF methods implemented in notebooks 3.0 and 3.1. Analysing top N topics for top N users based on an aggregated popularity metric.
4.0-km-zstc	Runs Zero-Shot Text Classification model on the translated version of unique tweets/replies, based on the transformers (pipeline) package. The model is using Bart with a classification head trained on MNLI.
4.0-jf-zstc	Implementing an alternative Active Learning approach for generating topics.
4.1-km-user-zstc-analysis	Explores user-topic relationship with topics generated from zero-shot text classification model implemented in notebook 4.0. Extracting top N topics and visualizing topic distribution for all users based on an aggregated popularity metric.
5.0-research-question-1	A complete end-to-end analysis addressing the research question - Identifying negative experiences and unmet needs . Includes functions to generate wordclouds and K-length first-person based extractive summaries to highlight the unmet needs for each topic.