In a previous post I showed how to use data science tools to find hidden features in unstructured text and analyzed how the complexity of the lyrics of Beatles songs changed over time. In this post I do a little follow-up and compare complete works of The Beatles with that of two others using the same methodology and metrics. Comparing Beatles with other musicians may help put the original numbers into the perspective.
I downloaded lyrics from the progressive rock band ‘The Alan Parsons Project’ and Nobel laureate Bob Dylan. Alan Parsons is an audio engineer, musician and song writer, who was one of the engineering architects of The Beatles’ Abbey Road and Let It Be albums, and Pink Floyd’s Dark Side Of The Moon LP. He later created an unusual formation consisting of him and his composer partner Eric Wolfson as permanent group members, supplemented by a group of session musicians who played on their albums with more or less regularity. Their music was of a more intellectual nature with a more equal focus on lyrics and melody.
Bob Dylan can now be equally considered both as a poet and a musician, and comparing him and The Alan Parson Project to The Beatles makes an interesting text analysis. As a side note: I was also entertaining the idea to include Iron Maiden in the comparison, but the excessive work of Bruce Dickinson & Co would have made the already long download even longer, so I stayed with my original idea. (It takes quite a bit of time to download lyrics with geniusR.)
First, I did the cosine similarity comparison, but this time on the artists and not on the individual albums. According to the cosine matrix Bob Dylan is equally different from The Beatles and The Alan Parsons Project: their cosine is 0.67 and 0.62, respectively. In a two-dimensional space this would be an angle of approximately 50 degrees. The cosine between Beatles and The Alan Parsons Project, however, is only 0.41, which is ‘equivalent’ to a two-dimensional angle of 65 degrees. It looks, that, despite their partially shared history, The Beatles and Alan Parsons have less in common than any of them with Bob Dylan.
albums <- tibble(
artist = c(
rep("The Beatles", 13),
rep("Bob Dylan", 36),
rep("The Alan Parsons Project", 10)
)
,
album = c(
"Please Please Me", "With The Beatles", "A Hard Day s Night",
"Beatles For Sale", "Help", "Rubber Soul",
"Revolver", "Sgt Pepper s Lonely Hearts Club Band", "Magical Mystery Tour",
"The Beatles The White Album", "Yellow Submarine", "Abbey Road",
"Let It Be",
"Bob dylan", "The freewheelin bob dylan", "Another side of bob dylan",
"The times they are a changin", "Bringing it all back home", "Highway 61 revisited",
"Blonde on blonde", "John wesley harding", "Nashville skyline",
"New morning", "Self portrait", "Pat garrett billy the kid",
"Triplicate", "Blood on the tracks", "The basement tapes",
"Desire", "Street legal", "Slow train coming",
"Saved", "Shot of love", "Infidels",
"Empire burlesque", "Knocked out loaded", "Down in the groove",
"Oh mercy", "Under the red sky", "Good as i been to you",
"World gone wrong", "Time out of mind", "Love and theft",
"Modern times", "Together through life", "Christmas in the heart",
"Tempest", "Shadows in the night", "Fallen angels",
"Tales of mystery and imagination edgar allan poe", "I robot", "Pyramid",
"Eve", "The turn of a friendly card", "Eye in the sky",
"Ammonia avenue", "Vulture culture", "Stereotomy",
"Gaudi"
)
)
all_lyrics <- album_lyrics <- albums %>%
mutate(tracks = map2(artist, album, genius_album))
full_lyrics <- all_lyrics %>%
unnest(tracks) %>%
arrange(desc(artist))
tidy_lyrics <- full_lyrics %>%
unnest_tokens(word, text) %>%
filter(nchar(word)>2)%>%
anti_join(stop_words) %>%
group_by(artist) %>%
count(word, sort = TRUE) %>%
ungroup()
total_full_words <- tidy_lyrics %>%
group_by(artist) %>%
summarize(total = sum(n))
tidy_lyrics <- left_join(tidy_lyrics, total_full_words)
tidy_lyrics <- tidy_lyrics %>%
mutate(freq = n / total)
full_cos <- tidy_lyrics %>%
select(artist, word, freq)
full_cos_w <- spread(full_cos, key = artist, value = freq)
full_cos_w[is.na(full_cos_w)] <- 0
full_cos_w_matrix <- data.matrix(full_cos_w, rownames.force = NA)
full_cos_w_matrix <- full_cos_w_matrix[, -1]
cosine_matrix <- cosine(full_cos_w_matrix)
cm <- data.frame(cosine_matrix)
pander(cm, caption = "Cosine similarity matrix")
Next, I turned to the text complexity measures. Here I used the core album lists, which had 10 albums for The Alan Parsons Project and 36 (!) for Bob Dylan. Just as in the previous case, stop words are also included in the calculations.
artists <- full_lyrics %>%
distinct(artist)
full_text_complexity <- data.table(artist = character(),
mGunningFog = numeric(),
mARI = numeric())
for(i in 1:nrow(artists)){
artista <- artists[[1]][i]
l <- list(artista)
temp_lyr <- full_lyrics %>%
filter(artist == artista) %>%
distinct(text)
rwstat <- readability_word_stats(temp_lyr[,1])
l <- list.append(l, 0.4*(rwstat$n.words/nrow(temp_lyr)) +
100*(rwstat$n.complexes/rwstat$n.words))
l <- list.append(l, 5.89*(rwstat$n.chars/rwstat$n.words) -
0.3*(nrow(temp_lyr)/rwstat$n.words) - 15.8)
full_text_complexity <- rbind(full_text_complexity, l)
}
ggplot(data = full_text_complexity, aes(mARI, mGunningFog)) +
geom_point(color = "darkblue") +
geom_text(aes(x = mARI, y = mGunningFog, label = artist), hjust=1, vjust=-0.5) +
theme_bw() + labs(title = "Text complexity comparison") +
theme(plot.title = element_text(size = rel(1.25))) +
xlim(5.5,7.5) + ylim(5,7)
Beatles fares quite poorly compared to both Bob Dylan and The Alan Parsons Project: the group’s lyrics is less complex on average than that of the other two musicians. But if we go back to the previous post, we’ll see that on an individual album basis the Beatles has nothing to feel bad about. Their late albums (Abbey Road, Let It Be and Sgt. Pepper) meet Bob Dylan or Alan Parsons standards.
This does not imply that Paul McCartney should have been given the Nobel Prize as these complexity metrics are only one kind of the many measures of text quality. Nevertheless, it is still interesting to see the numbers and it is a lot of fun to put together an analysis of this kind.
Codes are at the usual place.