Jupyter Notebooks ¶
The following notebooks are available in the project’s repo under the directory notebooks . A short description for each notebook is listed below:
Notebook |
Description |
---|---|
1.0-jf-fetching-tweets-example |
Contains a simple demonstration on how to fetch tweets using a Twitter Sanbox Environment. The sample data is saved in the form of a json file, which must then be preprocessed. |
1.1-jf-data-pull-testing |
Applies and tests the data pulling functions in the src directory. |
1.2-jf-data-etl-example |
Basic data extraction, transformation, exploring and highlighting the important information about tweet and user object metadata. |
1.3-jr-data-etl-load-ES |
Exploring and transforming the contents in the raw data, and loading into the Elasticsearch database for further preprocessing. |
1.4-jf-data-etl-testing |
Implements the functions developed for Twitter data ETL. See notebooks 1.2 and 1.3, as well as documentation to understand the steps developed/implemented in these functions. |
1.5-km-tweet-preprocessing |
Runs the preprocessing pipeline on on the sample data extracted during the data pull and etl steps. |
1.5-jr-tweet-preprocessing-full-data |
Runs the preprocessing pipeline and VADER sentiment model on the entire set of transformed tweets stored in the ES database. |
1.6-jr-tweet-preprocessing-extension |
Loading the preprocessed data into the ES database using function in the src directory. |
2.0-msc-basic-EDA |
Performs an initial EDA based on the sample data extracted during the data pull and etl steps. |
2.1-jf-EDA |
Performs a detailed EDA based on the entire set of transformed tweets. |
3.0-jf-User2Vec |
Generates user vectors based on average doc2vec representations for each user. Implementation based on Hallacet al, 2019 . |
3.0-jf-network-analysis |
Builds a user network based on the number of retweets and/or replies among users. |
3.0-jr-tweets2vec |
Generates 200-dimensional tweet vectors based on doc2vec implementation for the unique tweets/replies. |
3.0-msc-pov-analysis |
Explores the results of Point-of-View analysis of unique tweets and replies. |
3.0-km-topic-modelling-lda |
Implements topic modelling with LDA and PyLDAvis visualization over the unique tweets/replies. |
3.1-km-topic-modelling-biterm |
Implements Biterm Topic Model over the unique tweets/replies. |
3.1-km-topic-modelling-nmf |
Implements NMF topic modelling with wordclouds to visualize the topics. |
3.2-km-user-topic-analysis |
Explores user-topic relationship with topics generated from LDA and NMF methods implemented in notebooks 3.0 and 3.1. Analysing top N topics for top N users based on an aggregated popularity metric. |
4.0-km-zstc |
Runs Zero-Shot Text Classification model on the translated version of unique tweets/replies, based on the transformers (pipeline) package. The model is using Bart with a classification head trained on MNLI. |
4.0-jf-zstc |
Implementing an alternative Active Learning approach for generating topics. |
4.1-km-user-zstc-analysis |
Explores user-topic relationship with topics generated from zero-shot text classification model implemented in notebook 4.0. Extracting top N topics and visualizing topic distribution for all users based on an aggregated popularity metric. |
5.0-research-question-1 |
A complete end-to-end analysis addressing the research question - Identifying negative experiences and unmet needs . Includes functions to generate wordclouds and K-length first-person based extractive summaries to highlight the unmet needs for each topic. |