unstructured data and Sentiment Analysis
What follows is an examination of Aesop Rock lyrics. This will look to discover insights into the words used in the music as well as an analysis of their sentiment.
DATA - The data were scraped from two sites. The API for the websites Genius.com and Spotify were used to access lyrics for every song as well as associated metadata. A Python script was developed utilizing the lyricGenius API wrapper as well as other methods to access Spotify data. The result was a dataset comprised of 131 unstructured song lyrics and metadata for each. It would be these unstructured text objects that received the bulk of analysis here.
METHOD - The dataset obtained from the Python script was imported into R for text mining and analysis. The data needed some cleaning. Specifically, the text heavy lyrics objects needed significant work and tokenization for preparation for sentiment analysis. The sentiment analysis utilized the lexicons "afinn", "bing", and "nrc" developed by Saif M. Mohammad and Peter Turney in their 2013 work Crowdsourcing a Word-Emotion Association Lexicon.
CONCLUSION - Having released music over several decades, the overall sentiment of Aesop Rock lyrics has been proportionately more negative when analyzed with the lexicons stated previously. The categories of words used appear to be evenly distributed across the different time periods of his career. We also find that the unique word counts per song have declined over time. This can be interpreted as a simplifying of song writing as the artist’s career evolves. The bigram sentiment analysis did not reveal sufficient evidence to dispute the findings.
Here is a closer look.
After all necessary cleaning and processing of the data was complete, some exploration was conducted. We first examined the distribution of songs over time. As a prolific song writer, Aesop Rock has released music over the course of decades.
We can see above that the bulk of songs were released in the early 2000’s with some output decline in subsequent years. We can look more closely at the song count for each year.
Another way to illustrate productivity of the artist over time will be to look at what songs were released by album by year. Click on the image below to interact with the plot.
The plot shows the most songs releases in a single year was in 2016. The most prolific period still being the early 2000’s as there were multiple years with major song releases. We can see the distribution of songs across the years above. Now let’s take a look at word counts across the years.
This plot shows unique word counts per time period. We see some expected distributions of the counts for when songs were released. Surprisingly, we see that Aesop Rock actually used more words in his songs early in his career when compared to later. The lower song counts for the 1990’s has higher than expected word counts. There appears to be a trend over time where the songs contain less words as his career evolves. It is possible to attribute this to the amount of songs decreasing as well, but as we saw previously, 2016 was a year with a high number of song releases.
Next we will analyze the text of lyrics. As stated previously, the lyrics will be compared with established lexicons of words in order to gain insight into their sentiment. Those lexicons and their relation to Aesop Rock lyrics are summed here:
Each lexicon is illustrated here showing word counts. The counts show the sentiment categories for each. Let’s focus now on the nrc lexicon take a look at it a bit further.
The plot above reveals lyrics determined to be positive and negative here have differing amounts. The nrc lexicon would consider words from Aesop Rock lyrics to be slightly more negative than positive. The highest amount of words falling into the category “fear”, followed closely, by “trust”.
When considering the lyrics with that of the bing lexicon, we see a similar if not more exaggerated analysis. The bing lexicon determines words from the lyrics to be more than twice as negative. The above shows the overall sentiment of Aesop Rock songs to be negative. Let’s take a look at this sentiment over time. The following will examine the song contents over time periods.
We find that the sentiment for each decade is relatively evenly distributed across the time periods. The time period of the early 2000s, having the most amount of overall songs released, has a relatively similar distribution across the sentiment categories when compared to other time periods.
So far we have mostly examined single words. This has indicated, Aesop Rock lyrics are associated more with negative emotions. We next look closer at relationships between more than one word to help determine sentiment associated with songs.
The plots above illustrate bigram counts by time period. We can see that during the artist’s most prolific period, the most often used bigram was “night light”. As we are looking for confirmation of the above assessment that the overall emotional association of lyrics is negative, let’s look at single words associated with both positive and negative emotions and check if they were preceded by a word that might have negated that association.
We see here a larger number of positive words that are preceded by the negated word “not”. This is in support of the above assumptions on negativity. This deserves a closer look. Next we will plot a network of these word and negation word relationships.
There is a good amount of negative words here that are preceded by negating words. This does not however provide sufficient evidence to rule out the earlier findings.