R Programming For Text Analysis On Song Lyrics by dkmathstats

programming · @dkmathstats · Mar 5 '18

$3.10

R Programming For Text Analysis On Song Lyrics

Hi there. This post features experimental R programming work for text analysis and text mining on a few song lyrics. The full complete version of this post can be found on my website [here](http://dkmathstats.com/using-r-for-text-analysis-on-a-few-song-lyrics/).

<center><img src="http://freedesignfile.com/upload/2014/10/Hand-drawn-colored-musical-instruments-vector-01.jpg" /></center>
<center><a href="http://freedesignfile.com/upload/2014/10/Hand-drawn-colored-musical-instruments-vector-01.jpg">Featured Image: Source</a></center>



### Sections
---


* Text Mining And Text Analysis With R
* Example One: Armin Van Buuren Feat. Fiora - Waiting For The Night
* Example Two: Linkin Park - New Divide (No Code, Output Only)
* Notes



### Text Mining And Text Analysis With R
---

The R programming language is capable of all kinds of statistical work and data analysis. One of those tasks includes text mining and text analysis. Text analysis can be done on reviews, Youtube comments, text from articles and song lyrics.

For this project, the R packages that are needed are `dplyr` for data wrangling, `ggplot2` for plotting and `tidytext` for data cleaning. Text analysis will be done on three songs. The lyrics from these songs were copied and pasted from lyrics websites into separate .txt files.

To load a package into R, use the `library()` or `require()` command. To install a package into R, use the command `install.packages("pkg_name")`.

```{r}
library(dplyr)
library(ggplot2)
library(tidytext)
```

### Example One: Armin Van Buuren Feat. Fiora - Waiting For The Night
---

For this first example, I have chosen the track Waiting For The Night from DJ/Producer Armin Van Buuren featuring the vocals of Fiora. (This song falls under the Dance category.)

<center><img src="https://i.ytimg.com/vi/7YpaAR077xA/hqdefault.jpg" /></center>
<center><a href="https://i.ytimg.com/vi/7YpaAR077xA/hqdefault.jpg">Armin Van Buuren - Waiting For The Night Album Image Cover: Source</a></center>

I have named the lyrics text file as `armin_waitingForTheNight.txt`. When you are reading text files offline, you need to set a working directory. In my case, this file is placed inside a folder called `songLyrics_project` on my PC. The working directory would be set to this folder (with RStudio).

```{r}
armin_waiting_lyrics <- readLines("armin_waitingForTheNight.txt")

head(armin_waiting_lyrics) #Preview lyrics.


[1] "Shoot me down and I'll get up again" 
[2] "Emotions running high with double meaning" 
[3] "Just another day to keep it calm within" 
[4] "But I can't find a way to fight this shadow dreaming"
[5] "" 
[6] "We're always waiting for the night"
The lyrics are then put into a data frame in R.

> armin_waiting_lyrics_df <- data_frame(Text = armin_waiting_lyrics) # tibble aka neater data frame
> 
> head(armin_waiting_lyrics_df, n = 20) 
# A tibble: 20 x 1
 Text
 <chr>
 1 Shoot me down and I'll get up again
 2 Emotions running high with double meaning
 3 Just another day to keep it calm within
 4 But I can't find a way to fight this shadow dreaming
 5 
 6 We're always waiting for the night
 7 Never lost cause we can go where the light shines brightest
 8 We're always waiting for the night
 9 So come with me and we can go where the light shines brightest
10 
11 Stay all night, runaway all night
12 We'll stay all night, run away all night
13 Stay all night, runaway all night
14 We'll stay all night, run away all night...
15 
16 Push and shove against the thoughts you left me with
17 Of, every picture of regret my expectation
18 Your emotions can't hide behind those eyes
19 Conversations comes quick to steal me back again

armin_words <- armin_waiting_lyrics_df %>%
unnest_tokens(output = word, input = Text)
```

There are words in the English language that do not carry much meaning on their own but they are used to make sentences flow and make grammar proper. Words such as `the`, `and`, `of`, `me`, `that`, `this`, etc. are referred to as stop words.

From R's `dplyr` package, the `anti_join()` function is used to remove stop words from stop_words which are in the lyrics. (The object stop_words is a dataset.)


```{r}
# data(stop_words) # Stop words.

# Remove stop words:

armin_words <- armin_words %>%
anti_join(stop_words)
```

To achieve the word counts, the count function from R's `dplyr` package is used to obtain counts. Adding the `sort = TRUE` argument will sort the counts.

```{r}
> # Word Counts:
>
> armin_wordcounts <- armin_words %>% count(word, sort = TRUE)
> 
> head(armin_wordcounts)
# A tibble: 6 x 2
 word n
 <chr> <int>
1 night 12
2 brightest 4
3 light 4
4 shines 4
5 stay 4
6 waiting 4
```

We can now make a plot of the word counts with R's `ggplot2` data visualization package.

```{r}
# ggplot2 Plot (Counts greater than 8)
# Bottom axis removed with element_blank()
# Counts in the bar with geom_text.

armin_wordcounts %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n)) +
geom_col(fill = "blue") +
coord_flip() +
labs(x = "Word \n", y = "\n Count ", title = "Word Counts In \n Armin Van Buuren - Waiting For The Night \n") +
geom_text(aes(label = n), hjust = 1.2, colour = "white", fontface = "bold") +
theme(plot.title = element_text(hjust = 0.5),
axis.title.x = element_blank(),
axis.ticks.x = element_blank(),
axis.text.x = element_blank(),
axis.title.y = element_text(face="bold", colour="darkblue", size = 12))
```

<center>![01arminWordCounts.png](https://steemitimages.com/DQmaZVY38huP63XUEds7XeZJDAZQW4J6kLqadnsqLf8iT1D/01arminWordCounts.png)</center>

It appears that the word night is the most frequent word with a count of 12.

**Sentiment Analysis Of Armin Van Buuren - Waiting For The Night**

For song lyrics, sentiment analysis analyzes words and text and determines whether a song is positive or negative. (Note that this sort of analysis does not factor in sound, melodies and such. The listeners determine this in a subjective manner.)

There are three main lexicons which determine whether a song is positive or negative. These three are `AFINN`, `bing `and `nrc`.

The `AFINN` lexicon is used here.

```{r}
get_sentiments("afinn") #AFINN sentiments
```

```{r}
armin_words_AFINN <- armin_wordcounts %>%
inner_join(get_sentiments("afinn"), by = "word") %>%
mutate(is_positive = score > 0)

armin_words_AFINN %>%
ggplot(aes(x = word, y = n, fill = is_positive)) +
geom_bar(stat = "identity", position = "identity") +
labs(x = "\n Word \n", y = "Word Count \n", title = "Sentiment Scores Of Words \n") +
theme(plot.title = element_text(hjust = 0.5),
axis.title.x = element_text(face="bold", colour="darkblue", size = 12),
axis.title.y = element_text(face="bold", colour="darkblue", size = 12)) +
scale_fill_manual(values=c("#FF0000", "#01DF3A"), guide=FALSE)
```

<center>![02sentimentPlot.png](https://steemitimages.com/DQmZMM2KrThep7b7WAqWKYJHeoACf2m4sZnVhHef1VgWmJe/02sentimentPlot.png)
</center>

The next lines of code features a plot with words and their sentiment scores. As in the first example, sentiment scores takes the word counts multiplied by the `AFINN` lexicon score. (If the word wonderful had a word count of 3 and a score of +3, the score would be 3 x 3 = +9).


```{r}
# Assign AFINN lexicon scores to words in the book:

armin_words_AFINN_scores <- armin_wordcounts %>%
inner_join(get_sentiments("afinn"), by = "word") %>%
mutate(sentiment_score = n * score, is_positive = sentiment_score > 0)
```

We can plot the results with a plot from the `ggplot2` package.


```{r}
# Plot - Sentiment Scores Of All Words

armin_words_AFINN_scores %>%
ggplot(aes(x = word, y = sentiment_score, fill = is_positive)) +
geom_bar(stat = "identity", position = "identity") +
labs(x = "\n Word \n", y = "Sentiment Score \n", title = "Sentiment Scores Of Words \n") +
theme(plot.title = element_text(hjust = 0.5),
axis.title.x = element_text(face="bold", colour="darkblue", size = 12),
axis.title.y = element_text(face="bold", colour="darkblue", size = 12)) +
scale_fill_manual(values=c("#FF0000", "#01DF3A"), guide=FALSE)
```

<center>![03sentimentScoresplot.png](https://steemitimages.com/DQmNgYbxjUc54YDjHS9UdaqftGK1P9EDSuEAuZ5fLMWwK6H/03sentimentScoresplot.png)
</center>

The most positive word is `brightest` while the most negative word is `lost`. This sentiment score plot is different than the word counts plot earlier in the sense there were more negative words than positive words.


### Example Two: Linkin Park - New Divide
---

<center><img src="http://farm4.staticflickr.com/3620/3545284779_926eda3146.jpg" /></center>
<center><a href="http://farm4.staticflickr.com/3620/3545284779_926eda3146.jpg">Image Source</a></center>

In the second example, I have chosen to look at the song New Divide by Linkin Park as featured in the Transformers 2 movie. The code here is very similar to the code from the first example.

To make this post a bit shorter, I will only place the output plots for this example. (No code is shown for this example.)

<center>![01linkinPark_newdividePlot.png](https://steemitimages.com/DQmR294d2LsMp8N6VYozRSdU5forc9dFMiugbkD8aLruPby/01linkinPark_newdividePlot.png)</center>

<center>![sentimentPlot_newdivide.png](https://steemitimages.com/DQmbqr3eN7HyvY2QWcxDExmnDA92oDDjEasnNXhDe4MuGHG/sentimentPlot_newdivide.png)</center>

<center>![sentimentPlot03.png](https://steemitimages.com/DQmdXapEwfDHNDu6wif6tFqqZiL2F56yroRbi9qWxx8pfts/sentimentPlot03.png)
</center>

It appears that from the plots, Linkin Park - New Divide is a track with negative sentiment. This needs to further examined by looking at the full lyrics and listening to the songs.

### Notes
---


* Song lyrics do not have a lot of words in general relative to articles and books. 
* Many song lyrics repeat certain phrases or words for emphasis.
* Not all songs have vocals or lyrics as some of them are instrumentals. You would have to hear those instrumentals and judge whether a song is positive or not with your own ears.
* I do plan on analyzing a music album with text mining and analysis.

References include Datacamp courses, R Graphics Cookbook by Winston Chang, Text Mining With R: A Tidy Approach By Julia Silge and David Robinson (Website version: https://www.tidytextmining.com/).

👍 steemstem, pharesim, cristi, kevinwong, abigail-dantes, justtryme90, msp-music, gimperion, lemouth, kerriknox, masterofcoin, remlaps1, mobbs, grandpere, fredrikaa, mrs.agsexplorer, anarchyhasnogods, somethingburger, timsaid, fabio2614, thevenusproject, mountain.phil28, steemstem-bot, dber, ertwro, kryzsec, gra, mountainwashere, lisa.palmer, kenadis, lamouthe, jamhuery, nitesh9, de-stem, gentleshaid, carloserp-2000, the-devil, amavi, churchboy, ovij, arconite, channel64.net, foundation, leczy, borislavzlatanov, himal, deutsch-boost, alexander.alexis, hadji, sco, rachelsmantra, blessing97, rjbauer85, pangoli, robotics101, zeeshan003, banjiruang

`author`	dkmathstats
`permlink`	r-programming-for-text-analysis-on-song-lyrics
`category`	programming
`json_metadata`	{"tags":["programming","music","steemstem","datascience","statistics"],"image":["http://freedesignfile.com/upload/2014/10/Hand-drawn-colored-musical-instruments-vector-01.jpg","https://i.ytimg.com/vi/7YpaAR077xA/hqdefault.jpg","https://steemitimages.com/DQmaZVY38huP63XUEds7XeZJDAZQW4J6kLqadnsqLf8iT1D/01arminWordCounts.png","https://steemitimages.com/DQmZMM2KrThep7b7WAqWKYJHeoACf2m4sZnVhHef1VgWmJe/02sentimentPlot.png","https://steemitimages.com/DQmNgYbxjUc54YDjHS9UdaqftGK1P9EDSuEAuZ5fLMWwK6H/03sentimentScoresplot.png","http://farm4.staticflickr.com/3620/3545284779_926eda3146.jpg","https://steemitimages.com/DQmR294d2LsMp8N6VYozRSdU5forc9dFMiugbkD8aLruPby/01linkinPark_newdividePlot.png","https://steemitimages.com/DQmbqr3eN7HyvY2QWcxDExmnDA92oDDjEasnNXhDe4MuGHG/sentimentPlot_newdivide.png","https://steemitimages.com/DQmdXapEwfDHNDu6wif6tFqqZiL2F56yroRbi9qWxx8pfts/sentimentPlot03.png"],"links":["http://dkmathstats.com/using-r-for-text-analysis-on-a-few-song-lyrics/","http://freedesignfile.com/upload/2014/10/Hand-drawn-colored-musical-instruments-vector-01.jpg","https://i.ytimg.com/vi/7YpaAR077xA/hqdefault.jpg","http://farm4.staticflickr.com/3620/3545284779_926eda3146.jpg","https://www.tidytextmining.com/"],"app":"steemit/0.1","format":"markdown"}
`created`	2018-03-05 04:15:27
`last_update`	2018-03-05 04:15:27
`depth`	0
`children`	4
`last_payout`	2018-03-12 04:15:27
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	2.378 HBD
`curator_payout_value`	0.722 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	10,202
`author_reputation`	151,162,175,039,043
`root_title`	"R Programming For Text Analysis On Song Lyrics"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	42,280,239
`net_rshares`	761,595,608,506
`author_curate_reward`	""

properties (23)vote details (57)

voter	rshares	pct
pharesim	84,622,820,259	0.02%
mrs.agsexplorer	8,597,589,826	2%
kevinwong	76,151,003,204	1.5%
justtryme90	56,839,116,541	2%
grandpere	10,522,850,474	5%
arconite	579,245,604	0.75%
timsaid	5,158,500,455	1%
cristi	83,552,851,018	20%
lemouth	23,453,565,119	5%
rjbauer85	185,644,849	5%
anarchyhasnogods	8,159,948,334	2.5%
lamouthe	1,258,937,223	5%
remlaps1	12,097,767,504	20%
steemstem	133,273,024,843	5%
foundation	485,906,906	5%
the-devil	796,591,084	5%
thevenusproject	2,705,626,846	5%
lisa.palmer	1,380,342,575	20%
borislavzlatanov	425,668,996	5%
jamhuery	1,246,828,693	5%
gimperion	27,446,717,779	100%
mobbs	11,385,646,094	4.25%
kryzsec	1,876,824,198	5%
fredrikaa	8,792,230,785	2.5%
dber	2,267,602,208	5%
kerriknox	22,109,655,970	5%
alexander.alexis	278,331,709	1%
blessing97	189,664,920	5%
ertwro	2,079,197,422	5%
nitesh9	1,246,256,757	5%
churchboy	638,184,900	5%
channel64.net	493,406,741	100%
himal	368,509,861	5%
abigail-dantes	75,104,284,979	5%
leczy	483,437,583	5%
ovij	635,917,145	5%
mountain.phil28	2,392,368,990	25%
mountainwashere	1,520,491,876	5%
somethingburger	7,120,194,417	100%
masterofcoin	16,775,383,168	100%
fabio2614	4,474,269,986	100%
zeeshan003	93,475,138	5%
carloserp-2000	808,071,045	5%
pangoli	171,890,252	5%
rachelsmantra	214,559,082	5%
msp-music	51,915,719,453	20%
gra	1,875,349,393	5%
kenadis	1,351,675,536	5%
amavi	761,373,163	1%
robotics101	171,318,897	5%
gentleshaid	849,254,517	5%
sco	222,868,574	1%
hadji	235,386,574	5%
steemstem-bot	2,329,322,830	15%
deutsch-boost	338,943,441	20%
banjiruang	58,360,184	100%
de-stem	1,025,632,586	2.5%

`author`	msp3k
`permlink`	this-post-has-been-curated-by-msp3k-com-1520256755
`category`	programming
`json_metadata`	{"tags":["minnowsupport","msp3k","minnowsupportproject","steemit","minnowsunite"],"app":"msp3k/1.0","format":"markdown+html","community":"minnowsupport"}
`created`	2018-03-05 13:32:42
`last_update`	2018-03-05 13:32:42
`depth`	1
`children`	0
`last_payout`	2018-03-12 13:32:42
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.000 HBD
`curator_payout_value`	0.000 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	352
`author_reputation`	13,793,178,213,594
`root_title`	"R Programming For Text Analysis On Song Lyrics"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	42,380,616
`net_rshares`	230,769,157
`author_curate_reward`	""