src.models .tokenize

src.models. tokenize ( doc , tag = None ) [source]

Utility function to Tokenize a single tweet.

Parameters
doc str

Text to be tokenized.

tag int, str or NoneType, default=None

Document identifier. If None, returns list of tokens instead of tagged document.

Returns
document list or gensim.models.doc2vec.TaggedDocument

Tokenized, tagged document