API ¶
This is the full API documentation of the src package.
src.data
¶
Functions to perform ETL tasks. These functions include the code necessary for the Twitter data pulling and data preprocessing.
|
Pulls data (i.e., tweets and user info) from Twitter using its API. |
|
Returns the number of existing Tweets for a given query and time frame. |
|
Converts a raw .json file containing Tweets’ data into a clean(er) dataset. |
|
Loads a dataframe into the Elastic Search database. |
|
Queries an Elastic Search database to get all the results of a query. |
src.features
¶
Functions to run preprocessing tasks and basic feature extraction.
|
Translate a block of text (this function can be time consuming). |
|
Function to use the .apply method on all rows of a dataframe to translate text. |
Function to run the preprocessing pipeline on all tweets to generate the feature “full_text_processed”: Translating tweets to English, removing stopwords & lemmatization, removing URLs and reserved words, lowercasing & punctuation removal and VADER sentiment analysis. |
src.models
¶
Functions to run the models used for analysis. It includes the User2Vec algorithm used and an experimental topic extraction method based on Active Learning and Zero-shot classification.
|
Utility function to Tokenize a single tweet. |
|
Generates vectors for each user in the dataset. |
|
Active Learning with Zero-Shot classification |