API ¶

This is the full API documentation of the src package.

`src.data` ¶

Functions to perform ETL tasks. These functions include the code necessary for the Twitter data pulling and data preprocessing.

`data.pull_tweets` (query, from_date, to_date, …)	Pulls data (i.e., tweets and user info) from Twitter using its API.
`data.count_tweets` (query, from_date, to_date, …)	Returns the number of existing Tweets for a given query and time frame.
`data.transform` (json_path[, verbose])	Converts a raw .json file containing Tweets’ data into a clean(er) dataset.
`data.load_es` (df_merged[, ip_address, verbose])	Loads a dataframe into the Elastic Search database.
`data.query_es` (client[, body, index_query, …])	Queries an Elastic Search database to get all the results of a query.

`src.features` ¶

Functions to run preprocessing tasks and basic feature extraction.

`features.translate_tweet` (text, lang)	Translate a block of text (this function can be time consuming).
`features.translate_func` (x, text, lang)	Function to use the .apply method on all rows of a dataframe to translate text.
`features.preprocessDataFrame` (df)	Function to run the preprocessing pipeline on all tweets to generate the feature “full_text_processed”: Translating tweets to English, removing stopwords & lemmatization, removing URLs and reserved words, lowercasing & punctuation removal and VADER sentiment analysis.

`src.models` ¶

Functions to run the models used for analysis. It includes the User2Vec algorithm used and an experimental topic extraction method based on Active Learning and Zero-shot classification.

models.tokenize (doc[, tag])

Utility function to Tokenize a single tweet.

`models.User2Vec` (vector_size, min_count, …)	Generates vectors for each user in the dataset.
`models.ALZeroShotWrapper` (classifier[, …])	Active Learning with Zero-Shot classification

API ¶

src.data ¶

src.features ¶

src.models ¶

src.visualization ¶

`src.data` ¶

`src.features` ¶

`src.models` ¶

`src.visualization` ¶