Tweet Sentiment Analysis

Tweet Sentiment Analysis

What is Sentiment analysis?

Sentiment Analysis, also called opinion mining is a computation technique used to determine if a piece of writing is positive, negative or neutral.

Applications?

  1. Business: sentiment analysis can be applied to understand the opinions or feelings towards a brand and with this business insights or strategies can be developed to improve the business and customer satisfaction.

  2. Politicians use it to keep track of political views and to spot consistency or inconsistency between statements and actions. Election results can also be predicted by it.

  3. Public opinion: Additionally, sentiment analysis can be used to detect potentially dangerous situations, as well as assess the general mood of blogs, social media etc.

Important Installations.

  1. Tweepy: Tweepy is the python client for the official Twitter API. Install it using following pip command - 'pip install tweepy'.

  2. TextBlob: Textblob is a python library for processing textual data. Install it using following pip command - 'pip install textblob'

Authentication:

To be able to fetch tweets using the Twitter API, you needs to register an App through your twitter account. Follow these steps below to achieve this:

  1. Click on this link then click the button: ‘Create New App’.

  2. Fill the application details. callback url is not important at this stage you can leave the field empty.

  3. You will be redirected to the app page as soon as the app is created.

  4. Navigate to the ‘Keys and Access Tokens’ tab.

  5. Copy ‘Consumer Key’, ‘Consumer Secret’, ‘Access token’ and ‘Access Token Secret’.

Note: in the Jupiter notebook you will find variables with for the keys above. Replace 'xxxxxxxx' with the correct values.

import re
import tweepy
from tweepy import OAuthHandler
from textblob import TextBlob
import pandas as pd
import matplotlib.pyplot as plt

class TwitterClient(object):
    '''
    Twitter client class in the

    Parameters:
    ----------

    Keys needed for aunthenication

    consumer_key: str
    consumer_secret: str
    access_token: str
    accress_token_secret


    Methods:
    -------
    clean_tweet(tweet)
        cleans tweet

    get_tweets(query, count)
        check if a tweet is positive, negative or neutral

    get_tweet_sentiment(tweet):
        returns the sentiment of tweet
    '''

    def __init__(self):
        # keys and tokens from the Twitter Dev Console
        consumer_key = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
        consumer_secret = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
        access_token =  'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
        access_token_secret =  'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'

        # attempt authentication
        try:
            # create OAuthHandler object
            self.auth = OAuthHandler(consumer_key, consumer_secret)
            # set access token and secret
            self.auth.set_access_token(access_token, access_token_secret)

            # create tweepy API object to fetch tweets
            self.api = tweepy.API(self.auth)

            self.api
            # print("Authentication Passed")

        except:
            print("Error: Authentication Failed")


    def clean_tweet(self, tweet): 
        '''
        cleans tweet

        Parameters:
        ----------
        tweet: str
            The tweet to be cleaned
        '''

        return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+) ", " ", tweet).split())


    def get_tweets(self, query, count = 10):
        '''
        check if a tweet is positive, negative or neutral

        Parameters:
        ----------
        query: str
            The tweet to be queried

        count: int
            number of tweets to fetch. default value is 10
        '''

        # empty list to store parsed tweets
        tweets = []


        try:
            print("Query in progress...") 

            # fetch tweets
            fetched_tweets = self.api.search_tweets(q = query, count = count)

            print("Query complete")

            # parsing tweets 
            for tweet in fetched_tweets:

                # empty dictionary for tweet data
                parsed_tweet = {}

                # tweet text
                parsed_tweet['text'] = self.clean_tweet(tweet.text)
                # tweet sentiment
                parsed_tweet['sentiment'] = self.get_tweet_sentiment(tweet.text)


                if tweet.retweet_count > 0:
                    # append retweet only once
                    if parsed_tweet not in tweets:
                        tweets.append(parsed_tweet)
                else:
                    tweets.append(parsed_tweet)

            # return tweets
            return tweets

        except tweepy.TweepyException as e:
            # print error if any exists
            print("Error : " + str(e))


    def get_tweet_sentiment(self, tweet):
        '''
        check if a tweet is positive, negative or neutral

        Parameters:
        ----------
        tweet: str
            The tweet to be checked
        '''

        # create TextBlob object of passed tweet text
        analysis = TextBlob(tweet)

        # set sentiment
        if analysis.sentiment.polarity > 0:
            return 'positive'
        elif analysis.sentiment.polarity == 0:
            return 'neutral'
        else:
            return 'negative'


def visualize_sentiment(positive_tweets_perc,negative_tweets_perc,neutral_tweets_perc):
    '''
    visualise tweets 

    Parameters:
    ----------
    positive_tweets_perc: str
        percentage of postive tweets

    negative_tweets_perc: str
        percentage of postive tweets

    neutral_tweets_perc: str
        percentage of postive tweets
    '''

    data = {'Count':  [positive_tweets_perc,
                negative_tweets_perc,
                neutral_tweets_perc]
    }
    sentiment = pd.DataFrame(data, index = ['Postive','Negative','Neutral'])

    plt.xlabel('Polarity' , fontsize=16 ,labelpad = 20)
    plt.ylabel('Frequency' , fontsize=16 ,labelpad = 20)
    plt.title('Frequency Of Polarity', fontsize=16 , pad=30)

    sentiment['Count'].plot(kind = 'bar', figsize=(10,8) , grid = True)


def main():
    '''
    Program main driver
    '''
    # TwitterClient Class object
    api = TwitterClient()

    # query
    query = input("Enter search keyword: ")

    count = input("Enter tweet query count")

    # get tweets
    tweets = api.get_tweets(query = query, count = count)

    # picking positive tweets from tweets
    positive_tweets = [tweet for tweet in tweets if tweet['sentiment'] == 'positive']

    # positive tweets %
    positive_tweets_perc = 100*len(positive_tweets)/len(tweets)
    print("Positive tweets percentage: {r:2.2f} %".format(r= positive_tweets_perc))

    # negative tweets 
    negative_tweets = [tweet for tweet in tweets if tweet['sentiment'] == 'negative']

    # negative tweets %
    negative_tweets_perc = 100*len(negative_tweets)/len(tweets)
    print("Negative tweets percentage: {r:2.2f} %".format(r= negative_tweets_perc))

    #neutral tweets
    neutral_tweets = [tweet for tweet in tweets if tweet['sentiment'] == 'neutral']

    # neutral tweets %
    neutral_tweets_perc = 100*(len(tweets) - (len( negative_tweets )+len( positive_tweets)))/len(tweets)
    print("Neutral tweets percentage: {r:2.2f} %".format(r= neutral_tweets_perc))

    #create dataframes
    df_postive_tweets = pd.DataFrame()
    df_negative_tweets = pd.DataFrame()
    df_neutral_tweets = pd.DataFrame()

    for dict in positive_tweets:

        data = {
            'text': dict['text'], 
            'sentiment': dict['sentiment']
            }

        data_df = pd.DataFrame([data])

        df_postive_tweets = pd.concat([df_postive_tweets, data_df], ignore_index=True)


    for dict in negative_tweets:

        data = {
            'text': dict['text'], 
            'sentiment': dict['sentiment']
            }

        data_df = pd.DataFrame([data])

        df_negative_tweets = pd.concat([df_negative_tweets, data_df], ignore_index=True)


    for dict in neutral_tweets:

        data = {
            'text': dict['text'], 
            'sentiment': dict['sentiment']
            }

        data_df = pd.DataFrame([data])

        df_neutral_tweets = pd.concat([df_neutral_tweets, data_df], ignore_index=True)


    return df_postive_tweets, df_negative_tweets, df_neutral_tweets, visualize_sentiment, positive_tweets_perc, negative_tweets_perc, neutral_tweets_perc

To call the main driver

if __name__ == "__main__":
    # main function
    df_postive_tweets, df_negative_tweets, df_neutral_tweets, visualize_sentiment, positive_tweets_perc, negative_tweets_perc, neutral_tweets_perc = main()

Create a simple visualisation of the sentiments

visualize_sentiment(positive_tweets_perc, negative_tweets_perc, neutral_tweets_perc)