A Geolocation and Sentiment Analysis of Tweets

by Morgan de Ferrante, Nadiya Pavlishyn, Kaitlin Maciejewski, Kathryn Addabbo, and Peter Batten

About the Project

Ever wanted to know what everyone’s been tweeting about? Well, thanks to Twitter’s use of the hasthag system, that’s already possible. But how about the most popular places everyone’s been tweeting from? Or how about a simplified way to see how all those twitter users are feeling? Thanks to some in-depth exploratory analyses from Dr. Jeff Goldsmith’s Team Awesome^TM, and courtesy of Followthehashtag’s publicly available twitter APIs, even this is possible.

In a rapidly changing and increasingly tech-based world, people now have the power to essentially react to global events happening thousands of miles away in real time. Social media as a whole, but twitter especially, are some of the biggest domains for capturing these reactions. Our team’s motivation for this analysis comes from a desire to aggregate these reactions in as compact and sensible format as possible.

The dataset we used from Followthehashtag is a comprehensive but incomplete list of 200,000 tweets from users across the United States (and outside the U.S., but we focused on domestic tweets) from April 14, 2016 to April 16, 2016, which comes as an easy-to-access csv file within a zipped folder. For each tweet, user information such as name, location (latitude/longitude), number of followers, and the entire content of the tweet itself is given. We used the Syuzhet package from GitHub (thank you Matthew Jockers!) to extract sentiments from tweet content. Our primary analyses consisted of mapping these tweets (using tweet location) as observable sentiments across the United States, which gives a nice aggregate picture of how the U.S. twitterverse was feeling during the dates mentioned above.

About the Sentiment function:

Matthew Jockers’s sentiment function is essentially a dictionary that assigns different words to different sentiments. The general sentiments he uses in this function (and subsequently the ones we use in our analyses) are trust, joy, anger, sadness, fear, disgust, anticipation, surprise, overall negativity and positivity. While some of these sentiments may not seem intuitive to use, altogether they form a relatively broad spectrum of moods and emotions which make for interesting analyses.

Here’s an example tweet from our dataset:

Ran the B.A.A. 5k this morning with Chris Sanford and finished with the exact same time as last… https://t.co/m8ClCPqjiR
— Chris S Jones (@foggypaws) April 16, 2016

The rest of the dataset was made up of 199,999 other tweets! That’s a lot of data!

Here’s how to navigate some of our analyses (found via the tabs in the upper right):

U.S. Tweets: A Shiny-enabled app that can be used to explore general sentiments and feeling of twitter users across the U.S. Included are a heat map that plots aggregated positivity/negativity of tweets overall, a map that plots the feelings of individual users on both a positivity/negativity scale and more complex emotional range, and a map that displays the distribution of tweets across the U.S. averaged across hours throughout each day.
State Tweets: A Shiny-enabled app that can be used to explore sentiment distribution at the state level for up to two states at the same time, allowing for useful sentiment comparisons between states.
Tweet Analysis: A Shiny-enabled app that allows for exploration of the contents of the tweets themselves, including total sentiment contents, most popular words used, and when twitter users are tweeting most frequently.

Discussion: A brief discussion of the conclusions drawn from our exploration of this twitter dataset, including an analysis of some of the most notable events from the time of the tweets in the dataset and comparisons to the sentiments (and ultimately, reactions) of the tweets themselves.

If you’d like to access the dataset yourself, the link to Followthehashtag’s website can be found here.

The GitHub repo for our final project is here