This study explores the evolving landscape of popular music using JMP Pro 17 to analyze trends in lyrical sentiment, complexity, and musical structure from Billboard Hot 100 songs spanning 2000-2022. Our objective was to identify which factors drive chart longevity and popularity, using a combination of audio features, lyrical analysis, and advanced statistical modeling.
We began by integrating and cleaning three large data sets sourced from Kaggle: Billboard chart rankings, audio features from Spotify, and full song lyrics. Using JMP’s data preparation, visualization, and modeling tools, we conducted sentiment analysis, engineered features for lyrical complexity, and examined genre-level differences using ANOVA and correlation analysis techniques.
The core of our analysis involved building two predictive models using bootstrap forests in JMP: one to estimate how long a song stays on the chart, and one to predict whether it will reach the Top 10. These models revealed that features like lyrical complexity, acousticness, valence, tempo, and speechiness are strong predictors of commercial success. Our chart longevity model achieved an R² of 0.814, while our Top 10 classification model reached an AUC of 0.982 with only a 5.1% misclassification rate – highlighting the practical power of JMP in real-world entertainment analytics.
By leveraging JMP interactively throughout – from sentiment scoring to predictive modeling – this project demonstrates how text analysis and music metadata can be combined to predict and forecast success in a creative industry. Our findings offer useful insights for artists, producers, and labels aiming to align content with evolving listener preferences.

Good morning and good afternoon. I'm here to present the research project, Analyzing 21st Century Musical Trends Using Sentiment and Text Analysis at the JMP Discovery Summit. My name is Srichandrika Gadde, and I'm here with my research partner, Delaney Carroll. We are graduate students from Oklahoma State University pursuing the course, Business Analytics and Data Science.
Just an overview on this project. This project was conducted as part of a graduate course at Oklahoma State University. The goal of this project is to analyze sentiment and text features of lyrics to basically identify shifts in musical trends over time, as well as to check what features influence commercial success of songs. For this project, we use JMP Pro 18 Student's edition to finish this project. To come to the workflow, we followed a pretty standard workflow where we pre-processed our data to make it ready for our analysis. We basically cleaned up a few columns, performed some imputations, I will get to it shortly. Then we moved on to the lyrics analysis, where we perform sentiment and text analysis, and move on to EDA, where we explore these features to identify shifts and trends. Finally, we come to building a couple of predictive models to predict our commercial success features such as popularity and longevity on the charts. Finally, we get to validation.
Just an overview on this data. We've sourced our data sets from Kaggle, and we're basically exploring Billboard's Hot 100 charts, which is an industry-standard chart for the success of songs in America. We are using three data sets, mainly which has the Billboard Weekly Chart data set, Spotify audio features, as well as the lyrics data set for the Billboard Hot 100 songs from the years 2020 to 2022. Using these three data sets, we've merged them, and we've explored and identified shifts and trends. This basically is our backbone to check what musical trends were present for the most popular songs.
In data pre-processing, we basically cleaned up a couple of text columns, song and artist names using Python to make them all homogeneous, which can make it easier for merging our data sets. Then we identified a couple of missing features, missing records in a few audio features which we imputed using mean imputation, using JMP. Then we finally feature engineered a few variables such as total weeks on charts and peak positions and top_ten, et cetera. We will get into what they are shortly. That was data pre-processing.
Now we begin with lyrics analysis. In this phase, we are basically trying to answer a few questions such as, how did sentiment or lyrical complexity change over time since 2000-2022? Or we'll ask questions such as, how did the genres or the trends in genre happen over the years? For this analysis, we've used Python scripts within JMP to conduct the sentiment analysis as well as the text analysis. In the sentiment analysis, we used Vader, which is basically... It computes Sentiment Scores: Positive and Negative. It's a lexicon, a rule-based sentiment analysis tool which can help us identify the overall sentiment of a text. We use this to identify the sentiment for the lyrics. We created four features, the Vader positive, Vader negative, Vader neutral, and Vader compound.
The next part is the text analysis. In this, we also created four features, TTR, Hapax Legomena, Lexical Density, and Lyrical Complexity. TTR is basically the number of unique words to the total words. Hapax is basically the count to check how many words appear only once within the text content. Lexical Density is basically the ratio of the content to it, such as the noun, verbs, adjectives, et cetera. This basically filters out any catchy phrases or the filler phrases that do not actually add to the lyrical complexity. Lyrical complexity is basically just aggregate of the top three. The same goes with Vader compound, which gives us the overall sentiment, positive to negative positive. These are the eight features that we created. We can move on to JMP to see how these features change over time.
Firstly, when we come to the Vader features, we see the Vader positive. It has remained relatively the same. It does have some fluctuations between 2005 and 2015, but then by the end of 2022, it pretty much remained the same as 2000. Coming to Vader Neutral. This Vader Neutral is how much neutral content is in the text. We see a slight decrease and then a slight increase in 2021, but we do see a slight decrease in the overall neutral component of the lyrics from 2000 to 2020. Coming to Vader Negative, This has definitely increased. We can see a significant increase from 2000 to 2020, and then we see a drop, but then the overall Vader negative component has increased. Vader compound is basically the overall sentiment of it, which we see that there is definitely a slight decrease.
It once again fluctuated between 2006 and 2015, but then it has slightly decreased by the end of 2022. Coming to text features as well over the years, TTR has definitely risen after 2020 and around 2020. Hapax, we see, follows a pretty similar trend like TTR. This is mainly because they're measuring similar vocabulary diversity. Hapax is almost directly proportional to TTR, so they follow a pretty similar trend. Lexical density, which is the content word, it has definitely increased since 2000 to 2022, which means more vocabulary was used in the lyrics, and it became more diverse. Lyrical complexity score has definitely, which is an aggregate of the top three, has increased by the end of 2020. We are now looking at the mean of the Vader compound across genre. This is the overall sentiment across different genres. We see that adult standards or comedy or K-pop or soul and worship genres have a pretty high sentiment that aligns with their genre theme. The least Vader compound has rap, punk, Latin. These are the ones with the least Vader compound. From this, we can identify that genre plays a role in the overall sentiment of the lyrics. Genres like punk and rock will definitely have less overall sentiment because they're more negative in nature.
Moving on to the next one. Here we see that we are trying to visualize the mean of Vader negative and Vader positive across genres. Visually, we can immediately identify that the Vader positive is higher than the Vader negative in all these genres, which means that even though genres are highly negative, they also have some positive components, positive words in their songs. Immediately, we can identify that punk and rap have the highest negative component, whereas children's music or adult standards has the least negative music. This makes sense because punk and rap are more... They have a lot of lyrics related to anger and rebellion and struggles, so they're more negative in nature compared to children's music, which should be less than more positive. This is just a summary table which shows the means of all the features that we created across years and genres to explore how the musical trends shifted across years and genres.
With this, we created this heat map, and the sizes are based on the number of counts of songs. Immediately we can see that pop is a pretty dominant genre because of the number of songs it has on the Billboard Top 100 charts. It is pretty consistently positive, I would say. It has a high Vader compound across all years, and this probably has a higher reach as well, which ends up on the Billboard Hot 100 charts. We can see that rap has a few fluctuations. By the end of 2020, it has become more negative in nature, and Hip hop or country music, they also have a pretty consistent positive sentiment. These are some of the insights that we can get using this. This is another heat map where we're checking TTR across genre and year. In this TTR, we see that pop or rap or RNB or hip hop or country music as well, these have pretty high TTRs, and they're pretty consistent since 2000 to 2022. Worship or adult standards or children music, basically, they have the least TTRs and without any fluctuations as well. This shows that they have very less lyrical complexity or very simpler lyrics because they're not trying to express deeper emotions in a way. This is an insight that we can get from this.
Another question that we're trying to answer is, post and pre-2010, based on the culture shifts, was there any significant shifts in the music as well? Before and after 2010. For this, we created a separate column which groups all the years, songs before and after 2010, which we're using in our ANOVA analysis. In this one-way analysis, we were checking this against lexical density for pre and post 2010 to see if there was any major change or shift in lexical density. For this, we see the significance value is less than 0.001, which means that there was a significant change and that either from the initial visualizations that we saw that a lexical density has definitely increased by the end of 2020. This change was significant in nature and the lyrics lexical density was simpler before 2010 compared to after. We tried to do the same analysis with Vader compound to check if there was any major shift in the overall sentiment before and after 2010. The P-Value was 0.09, which means that there was not much change in the overall sentiment, even though there were fluctuations, so compared to before and after, not much change.
The same analysis we did with TTR to see if there was any major shift with the p-value of 0.033. Once again, I would say that TTR also not a very huge difference between the TTR before 2010 and after. That said, we also checked if the three features together had any significant shift. For that, we use a MANOVA analysis. In this, we see that the Vader compound has actually decreased after 2010, which is overall sentiment has decreased, and the lexical density has increased, whereas the TTR has pretty much remained the same. We do see it visually that there were changes, but then together, if you have to check, as a combined defined set, was there a significant change? For that, when we tested it, we see that in the pre and post, the F-test was less than 0.001, which means together, these three did shift before and after 2010. That concludes the first phase, the lyrical analysis. The overall findings would be that genre and overall sentiment work together, that the overall sentiment increases or decreases based on the genre, as well as we observe that the lyrical complexity, by the end of 2020, it has increased.
Now, I will hand it over to Delaney to take us through the next half of the project where we build models.
Thank you, Srichandrika, for that demo over the lyrical analysis. Now, we are going to be seeing how we can use the text analysis in predictive models. We went with the bootstrap forest model for this particular analysis due to its ability to handle complex data. Musical features like sentiment, lyrical complexity, genre, and year may interact in nonlinear ways that simple models like linear regression can't capture. JMP also produces a variable importance ranking, which helps us see which factors influence popularity or retarget longevity the most. Also bootstrap forest models prevent overfitting by combining results from different trees. I'm going to get right into it and start with a live demo.
Along with our weekly billboard data and music audio features, we have our variables representing the text sentiment and analysis of the Lyric data. This data set that I'm working in includes the weekly Billboard Chart data. There's quite a bit of records. There's roughly over 130,000.
First, we're going to want to start an exploratory data analysis to examine the trends and patterns within the data. We'll start by viewing the distributions of our variables to see where the data lies. We can simply just come up to analyze, do distribution. Then I already have all of my variables in here, but just hit OK and then look in the stack format that's much easier to view. But then we could see the distribution of our variables. See, here's our genres. We have roughly 25 genres, and we can even order it by counts. Pop by far is the most popular out of the other genres with country and rap following. Next, we'll create a time series visualization using the Graph Builder. I'd like to view how the count of genres changes over time. We can create that in Graph Builder, but I already have all my things in there. We're going to put the week_ID for the X-axis, and then we're going to color it by genre and then put genre and overlay, so then we could see them all at the same time. Then we'll go ahead and hit OK. Here, it's a lot of genres, but this is over the years. This one will be pop. This one is wrap. You could see here after 2015, pop and wrap are fighting each other for that most popular spot on the chart.
In our EDA, I also like to do a correlation matrix with all of my numerical variables to see if there might be any predictor variables that could be highly correlated to one another. We can build this by doing the multivariate tool and multivariate methods—I'll just relaunch my analysis—and put all of your numerical variables in there, and we'll hit go. It highlights highly correlated relationships. Here we can see there's a highly correlated relationship between energy and loudness. We would want to make sure to only include one of those in our model to reduce multicollinearity. A statistical test that I like to run to view the relationships between different variables is a one-way ANOVA test.
Earlier we saw that there's about 25 main genres in the data set. I'd like to see if there's any significant differences between the means of the different genres. I started first with seeing how longevity or the total weeks on chart may differ between genres. We could do this by opening up a fit Y by X, and then we'll put genre in for X, and then we do total weeks on chart. Not that one. Total weeks on chart, then hit OK, and then we'll do means ANOVA, and then compare means, and then each pair. Down here we can see the P-Value is showing that it's a significant.
Coming back to the Connecting Letters report, this shows us how the different groups or genres are different from one another. Here we can see that has the highest chart longevity mean and is statistically different than every other genre. The same goes for the following genres, indie pop, pop and post-grunge. They don't share any letters with any other genre in therefore are statistically greater. We can complete the same process for looking at a peak position by genre, except, so we could just build it the same way except for use the peak position variable for our Y. coming down to the connecting letters, we could see, or we'll be looking for the smaller mean as that shows that those genres peak higher on the chart. Interestingly enough, K-pop has the smallest mean and shows that it is more likely to chart higher, which is It also interesting because looking at the longevity ANOVA test, K-pop isn't very high here on the list. This could indicate that K-pop has a strong ability to go viral and peak initially, but doesn't have very strong lasting power and quickly falls down the chart rankings.
We could also complete ANOVA test like Srichandrika did with the text and sentiment variables to see the differences between genres and other groups. Done with our EDA, we're ready to move on to predictive modeling. But first we need to create a validation column for each response variable. JMP Pro easily allows you to do this. You can go into predictive modeling and then go down to validation column. We're going to create one for each. We're going to build two models, one for longevity and a classification model for predicting whether a song could land in the top 10 or not. We're going to stratify on total weeks on chart, which is one of our responses. We'll do a 60, 20, 20 split. Since we're making multiple columns, we're going to name them, so we can know which is which. We'll just do a random seed of 1, 2, 3, 4, 5. That will create it.
I've already created one for the top 10, so I won't go through that again. Now that we have our validation columns, we can start building our predictive models. JMP has a bootstrap forest option that you could easily use. I'll just relaunch my analysis. Putting our response variable in the Y and then all of our predictor variables, and then making sure to put in the validation column that we created. To save the program from a long run time, I'm choosing to leave some categorical variables with many different groups out of the model. For example, I'd like to see how a performer might influence chart longevity, but there's just far too many in the data set for an efficient model. Including the variables as inputs, we're also keeping in mind the variables we want to make sure not to include due to multicollinearity. Here I've included energy instead of loudness to avoid that. Make sure we have our validation. We're going to hit OK.
The Bootstrap forest specification box will pop up asking you how many trees in the model and how many predictor variables are randomly considered at each split. You can specify how many trees are grown, how deep they can go, how much data each tree sees, and how random the splits are. Playing around with these settings can affect the balance between accuracy, run time, and overfitting. But to start off, we're just going to go with the suggested values for each setting and hit OK to see what our results are. I hit OK, and this step usually takes a couple of seconds since JMP is building multiple decision trees behind the scenes. We'll just give it a little bit. This is also a pretty big data set, so it's also going to contribute to the run time. Here we go. Here our r² is 0.89, which is already pretty good. If we think that the results could be improved, we could rerun the model with different specifications and see where we stand there. But here we can click the red triangle, do column contributions, and that'll show us how our variables influence the chart longevity. Here peak position is our highest influencing variable, which makes sense because songs that peak higher generally tend to last longer on the chart.
Now we're going to do the same thing for our second model. Where is it? Except we're going to be using our binary variable as our Y, and then we're going to make sure and put our validation for that response variable in. Then also here, I chose to leave out peak position because I used that variable to create the top 10 variable. Again, we're just going to leave this all the same and see what it gives us. Classification model output gives us a correlation matrix for the different validation date or columns. We could also come up here and view the ROC and the lift curve for this model and for assessing the model performance and how well it distinguishes between classes. 0.98 is pretty good in that aspect. Now that concludes our walkthrough of the JMP workflow we used in our project. In practice, this modeling could help artists, producers, or record labels understand which features of songs most influence success and potentially forecast trends in the industry. Our project is just one example of how JMP can be applied to text and sentiment data in predictive modeling and in creative fields.
We'd like to give a special thank you to Dr. Chakraborty and Dr. McGaugh for their mentorship and guidance throughout our Bands program. Their encouragement and insights have shaped our experience and helped us grow technically and professionally. We are grateful for the time and support they've given us and truly appreciate the opportunity to learn under their direction. Thank you for everyone listening. That concludes our presentation.
Presenters
Skill level
- Beginner
- Intermediate
- Advanced