TextBlob is an open source text processing library. It is built on top of Natural Language ToolKit (NLTK) and Pattern libraries and provides easy-to-use functions that simplifies text processing tasks. It has an advantage of being faster than NLTK. In the last post we have looked at text processing with NLTK library, while in this post we will look at text processing with TextBlob. TextBlob integrates well with other Natural Language Processing (NLP) tasks such as Part-of-Speech tagging, sentiment analysis, text classification, language translation among others.

Text Processing With TextBlob

If you are new to NLP and text processing then TextBlob is the best choice to start with. It has easy to use API’s that provides NLP and text analytics functions. Before we use TextBlob we need to install it, being a cross platform tools TextBlob can run on major operating systems.
To install TextBlob run the pi command;

If you are using Anaconda distribution you can simply run the conda command in the Anaconda Prompt;

These command will install TextBlob.

Now that you have installed TextBlob, let’s start processing text data.

1. Creating text in textblob

2. Tokenization

3. Lemmatization

lemmatize method can accept part of speech argument that defines the output text. In this case we inform our method to treat the text as a verb.

4. Words Inflection

Words Inflection is the process of modifying a word to express different grammatical categories such as tense, case. We can pluralize and singularize words accordingly

5. Spelling Correction

TextBlob enables us to Check the spelling of word using spellcheck() method which returns (word, confidence) with a list of suggestions.

Spelling correction. Returns a word with the highest confidence.

6. Part-Of-Speech (POS) Tagging

7. Extracting Noun Phrase

A noun phrase is a word or group of words containing a noun and functioning in a sentence as subject, object, or prepositional object.

8. Language Detection and Translation
Textblob uses language translation and detection API’s provided by Google Translate.

Language Detection

Language Translation

Conclusion

Text processing is a critical phase in machine learning and data analytics development. TextBlob provides simple functions that makes text processing effortless. However, there are many ways of doing text preprocessing such as using NLTK library which we covered in the previous post. TextBlob is a light-weight framework built on top of NLTK and Pattern. It is extremely fast as compared to NLTK, but the choice of which framework to use should be carefully evaluated.

What’s Next

In this post we have looked at text processing with TextBlob, in the next post we are going to look at text analysis using the NLP frameworks we have covered in this and the previous post.

Text Processing With TextBlob

Post navigation