Normalization, Tokenization, Sentence Segmentation + Useful Methods
What does normalizing a text do?
We have previously called this method
.lower() to turn all of the words lowercase, so that strings like “the” and “The” both become “the”, so we don’t double count them.
What if we wanna do even more?
For example we can strip the affixes from words in a process called stemming. …