cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
The Discovery Summit 2025 Call for Content is open! Submit an abstract today to present at our premier analytics conference.
Choose Language Hide Translation Bar
View Original Published Thread

AtsukoI_Japan
Community Manager Community Manager
Text Mining in JMP of the lyrics of QUEEN's songs.

Text Mining in JMP of the lyrics of QUEEN's songs.

Japanese Title: Band "Queen (Queen) If you try to text mining the music of ""

 

Written by: Naohiro Masukawa, System Engineer at JMP Japan

Writer's name in Japanese: JMP System Engineer Naohiro Masukawa

 

Did you see the movie "Bohemian Rhapsody" released at the end of last year?The goodness of the movie became a big hit immediately after the release and it became a big hit, and it seems that a wide generation is being watched from the generation who was listening to the time when Queen was active, and the younger one who does not know much about Queen itself.

 

I also admired, but in addition to the story, I was struck by a song that I did not fade even if I listened to it now.After that, is it a profession? " Suddenly (? From the desire to know more of Queen's poetry?) I tried to do text mining.

 

Before I analyzed, I only knew the typical songs of Queen, "Bicycle" Like the song "Bohemian Rhapsody" "Killed a man" I thought that I could get some interesting results from the lyrics (which killed the men) and the characteristic lyrics of the opera part.

 

Among the text mining, the following 3 I will introduce one result.

  • What words are used a lot?
  • When I classify songs by the words they use?
  • Can you identify the lyricist by the words contained in the lyrics?

 

Analysis target is 1973 From the year 1991 For the songs included in the album of the year, except for the instrumental without lyrics.1991 It was up to the year because the vocalist Freddie Mercury wanted to target the songs that were released in his lifetime.

 

■ What kind of words are used often?

What words are often used for the songs to be analyzed?The answer Word cloud It shows by.Word Cloud is a visualization method that is often used in text mining, and the larger the frequency of occurrence of a word is, the larger the character is displayed.

 

Word cloud

https://public.jmp.com/packages/What-kind-of-words-are-used-a-lot/js-p/5c3d904633868300a4314b06-1

 

The following about word cloud results 3 Please note that.

 

  1. Do stemming, frequency is Ten The above words are targeted for display.

        Stemming is a method of removing changes in word endings and putting together words that have a common stem.
        The word stemming is at the end of the word " " It is marked with.
        For example, in a word cloud “Know " Although it is displayed, this is "Know" , "Knowing" , "Knows" of 3 Contains one word.

 

  1. Articles, auxiliary verbs, pronouns, be Words that appear to be verbs or prepositions are excluded from analysis.

 

  1. Words are graded from blue (average release year early) to red (average release year late).In other words, blue words are more frequently used in earlier songs, and red words are more late (1991 With the year last ) Indicates that the word is often used in

 

■ If you classify songs by the words they use?

Let's classify (group) the target songs into songs that use similar words.First of all, Document word matrix We will introduce the concept of

 

A document-word matrix is a matrix representation of whether each word appeared or not.A row is a song, a column is each word, but for a word column, the song (line) in which the word appears 1 The value of, for songs that have not appeared 0 Assign the value of.in this way 1 Or 0 There is a simple way to create a document word matrix that takes the value of, but here "TD IDF" Create a weighted document word matrix taking into account the word importance.

 

Although detailed definition is omitted, weight is given to lower the importance of words appearing in many songs and raise the importance of words appearing only in some songs.

I extracted the stem as an example, "Life "(Life, life's) , "Bicycl "(Bicycle, bicycling) of 2 Let's consider one.The table below is part of the weighted document word matrix.

 

undefined

 

 

 

 

Where a value other than 0 is included, the corresponding word (word of column name) is included in the music, and the value increases if the word is used many times.

For example "Life " Is a small value because it is used in many songs.on the other hand, "Bicycl " Is "Bicycle Race" When" More Of That Jazz "of 2 It is a big value because it is only used in songs.further, "Bicycle Race" Many times "Bicycle" Because it is a song that is refining, so only once "Bicycle" The word does not appear " More Of That Jazz "Greater value (46.557) I am taking

 

Singular value decomposition to this document word matrix (SVD; The method called Singular Value Decomposition) can be used to reduce matrix information to lower dimensions, and can be grouped visually by mapping songs using similar words or words themselves. .

 

The left figure below visualizes the music similarity, and the right figure visualizes the word similarity.

 

undefined

 

In the upper left figure, it shows that the music in the near position is similar and the one in the distance is not similar.origin (0, 0) Many points are located in the vicinity of, but we label the points (music) that are farther than their collection.The music with these labels will have unique words in the whole Queen's music, but here "Bohemian Rhapsody" When "Great King Rat" of 2 I will focus on the song.

 

Both are located far from the origin, but looking at the word similarity from the right figure, 2 About the same distance from one song "Mama " , "Tell " , "Die " Is located. (* 1)

Because these words are located a little further from the origin, they are somewhat unique words in the sense that they are not often used in other songs, "Bohemian Rhapsody" When "Great King Rat" These words are used to some extent.Related to these things, 2 One song is located in such a place.(* 2)

By the way, Freddie Mercury is writing the lyrics for both songs.

 

* 1; Although the minimum value and maximum value of each coordinate are different in the left figure and the right figure, you can ignore it.

* 2; In fact, we also consider the use of other words.

 

Furthermore, as related analysis, using singular value decomposed information (singular vector), (hierarchical) Cluster analysis I will try.In cluster analysis, songs that use similar words (the songs with the closest distance exactly) are grouped in order to create a tree diagram.

 

From this result, you can decide the number of groups (clusters) to group songs, but here 7 Show the results of coloring each group.

 

Mapping by singular value decomposition of document word matrix, cluster analysis

https://public.jmp.com/packages/Classifying-songs-with-SVD-and-clusterin/js-p/5c3d904633868300a4314b06-2

 

■ Can you determine the lyricist by the words contained in the lyrics?

At the time of this analysis 1973 To 1991 The members of the Queen are Freddie Mercury, Brian May, Roger Taylor, and John Deacon Four As for the name, everyone is involved in songwriting and composition, and each sends a hit song to the world.As is the case with sentences, there may be words that lyricists frequently use for poetry, and words that are seldom used by others but specific to the lyricist.What about the queen?

 

Here the lyricist is specific 1 Focus on songs that are only names, Discriminant analysis Let's see if we can identify the lyricist by using the document word matrix described above using.

 

Top frequency of use 100 Word document word matrix (weight TD IDF Use) to see if the expected lyricsters match the actual lyricsters.

 

Discriminant analysis

https://public.jmp.com/packages/Can-we-distinguish-the-author-by-the-lyr/js-p/5c3d904633868300a4314b06-3

 

The results displayed here read as follows:

 

Below the results are the following tables, which are true lyricians (Actual) And the lyricist predicted by discriminant analysis (Predicted) Crosstabulation table.

 

Actual

Predicted Count

Author

Mercury

May

Taylor

Deacon

Mercury

42

1

0

0

May

0

36

0

0

Taylor

0

0

17

0

Deacon

1

0

0

11

 

The first line is Freddie Mercury, but the top 100 When you use the word information to predict the lyricist, 42 There is a song, 1 Only the song was actually written by Freddie Mercury, but it predicted Brian May, that is, the prediction was missed.

 

It is displayed above the table "Percent Misclassified" Is an indicator of misclassification rate, which is the rate incorrectly predicted.( Incorrectly predicted number of songs 数 number of songs to be analyzed ) It is calculated by

 

"Canonic Plot" is called a canonical plot, 2 A visual representation of the discriminant situation in dimensional coordinates.On the graph Four People are positioned but Roger Taylor 3 It is positioned far from people.Therefore, the tendency of words used in Roger Taylor's poems 3 It can be said that it differs greatly from the name.In fact, if you check the cross-tabulation table presented above, the song that Roger Taylor wrote (17 Song ) Are all predicted and others 3 You can see from the fact that there is nothing wrong with the song written by the name that he wrote the song with Roger Taylor.By the way, the labeled point on the graph is the misidentified song.

 

My analysis report is stored in the following place, including the analysis introduced this time.

https://public.jmp.com/users/259

 

I will continue to post interesting analysis results if I feel better, so look forward to it.

 

 

Report sharing site JMP Public Guidance of

In JMP, analysis report Web Share on site JMP Public There is a site called.

 

JMP Public

http://public.jmp.com/

 

Released at the end of last year JMP Is the latest version of JMP 14.2 In the report window [ File ]> [ Issue ] By selecting the menu of JMP Public Interactive on your site HTML It is now possible to upload a format analysis report.Uploaded reports can be viewed so that only the uploaded person can view it, or everyone can view it.However, to do the upload, SAS Using profile (register email address and password information) JMP Public You need to sign in to the site of.

 

There are a lot of analysis reports posted on JMP Public, but there are reports posted by engineers of our company (Japan).

 

This post originally written in Japanese and has been translated for your convenience. When you reply, it will also be translated back to Japanese.