src.features .preprocessDataFrame

src.features. preprocessDataFrame ( df ) [source]

Function to run the preprocessing pipeline on all tweets to generate the feature “full_text_processed”: Translating tweets to English, removing stopwords & lemmatization, removing URLs and reserved words, lowercasing & punctuation removal and VADER sentiment analysis.

Parameters
df DataFrame

Transformed DataFrame with original tweets

Returns:
df DataFrame

DataFrame with processed tweets