This is the full API documentation of the src package.


Functions to perform ETL tasks. These functions include the code necessary for the Twitter data pulling and data preprocessing.

data.pull_tweets (query, from_date, to_date, …)

Pulls data (i.e., tweets and user info) from Twitter using its API.

data.count_tweets (query, from_date, to_date, …)

Returns the number of existing Tweets for a given query and time frame.

data.transform (json_path[, verbose])

Converts a raw .json file containing Tweets’ data into a clean(er) dataset.

data.load_es (df_merged[, ip_address, verbose])

Loads a dataframe into the Elastic Search database.

data.query_es (client[, body, index_query, …])

Queries an Elastic Search database to get all the results of a query.


Functions to run preprocessing tasks and basic feature extraction.

features.translate_tweet (text, lang)

Translate a block of text (this function can be time consuming).

features.translate_func (x, text, lang)

Function to use the .apply method on all rows of a dataframe to translate text.

features.preprocessDataFrame (df)

Function to run the preprocessing pipeline on all tweets to generate the feature “full_text_processed”: Translating tweets to English, removing stopwords & lemmatization, removing URLs and reserved words, lowercasing & punctuation removal and VADER sentiment analysis.


Functions to run the models used for analysis. It includes the User2Vec algorithm used and an experimental topic extraction method based on Active Learning and Zero-shot classification.

models.tokenize (doc[, tag])

Utility function to Tokenize a single tweet.

models.User2Vec (vector_size, min_count, …)

Generates vectors for each user in the dataset.

models.ALZeroShotWrapper (classifier[, …])

Active Learning with Zero-Shot classification