Text forms the largest quantity of unstructured data being generated daily. Unlike numeric data which can be easily analysed with simple statistical and machine learning models, text data requires advanced statistical and machine learning approaches when analysing it. One of the major application of Natural Language Processing (NLP) is in text analysis in tasks such as sentiment analysis. In the previous posts we looked at text processing with Natural Language Processing Toolkit (NLTK) and TextBlob, in this post we will look at text analysis using both NLTK and TextBlob libraries.

Text Analysis

Text analysis is the process of extracting actionable insights from text data using statistical and machine learning approaches. In text analysis we transform data into a quantitative format that can be easily analysed and visualized. Text analysis has many different use cases such as sentiment analysis and market analysis. Let’s look at different ways we can analyse text data.

1. Word Frequency with TextBlob

Word frequency using word_count function

Word frequency using count function

Case-Insensitive word frequency using count function

Noun Phrase frequency

2. Frequency Distribution with NLTK

Words count

Sentence count

Word distribution

Most common words

Most common words from a sentence

Specific word appearance

3. Lexical Diversity

Lexical diversity is the average number of times a given word has repeated across the text. It’s computed as (total number of words in text / total number of unique words in text)

Sentiment Analysis

Sentiment analysis is the process of determining the attitude or emotion of the text. The sentiment can be positive, negative or neutral. In TextBlob the sentiment function returns two properties namely; polarity and subjectivity. Polarity is float which lies in the range of [-1.o ,1.o]. The polarity of value towards 1 implies that the text is positive statement and -1 implies that the statement is negative. Subjectivity determines if the text is an opinion,emotion while objectivity implies that the text is factual. Subjectivity is float whose value ranges between 0.0 and 1.0.

Conclusion

In this post we have briefly looked at text analysis using the NLTK and TextBlob libraries. There are different tools and approaches of doing text analysis. Some complex text analysis tasks requires the use of advanced machine learning, artificial intelligence, NLP and statistical methods.

Calculating Sentiment Analysis with TextBlob

Extracting the polarity of a text

Extracting the subjectivity of a text

Conclusion

In this post we have briefly looked at text analysis using the NLTK and TextBlob libraries. There are different tools and approaches of doing text analysis. Some complex text analysis tasks requires the use of advanced machine learning, artificial intelligence, NLP and statistical methods.

What’s

In this post we have looked at text analysis approaches, in the next post we will look at document classification using Term Frequency Inverse Document Frequency (TF/IDF) technique.

Text Analysis

Post navigation