Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
In my previous blog post about iTunes data, I showed a simple JMP Scripting Language (JSL) program to parse the iTunes XML file. I ended the post wondering whether the code would perform well on my full 13.3MB iTunes Library.xml file.
The JSL program ran in less than a second on my 7KB test file. How long did it take for the entire 13.3MB? It took a full 90 seconds. Where was the time being spent? The answer to that is the focus of this post, which is part two of a series about my project collecting iTunes data.
How do I easily improve my script’s performance?
Before JMP 11, this was not an easy question to answer. JMP 11 introduced the JSL Profiler. This is a great tool to assess where a JSL program is spending time on a line-by-line basis.
To use the JSL Profiler, I clicked the “Debug Script” icon at the top of the JSL editor, and then clicked on the “timer” icon to put the debugger into profiler mode. Running my script on the full 13.3MB file showed that the code spent all its time in the Parse XML line, which includes the Add Row() command. My guess is that most of the time is being spent updating the table state.
The JSL Profiler Report in JMP
I realized I could add a JSL message to the data table to hold off all update messages. The message is:
dt << Begin Data Update;
At the end of the data table processing the message, I would need to add this:
dt << End Data Update;
After adding these messages, the program took only about 13 seconds to run, rather than 90 seconds – an 85 percent reduction in processing time!
Now I need to clean up some of the data that I do not want for my analysis. My data has a lot of records that detail which tracks are part of playlists. I need to add some code to my JSL to not process tags that contain playlist data. In my data, this was about 10 percent of the total.
iTunes Playlist Data
Thanks to JMP developer Michael Hecht for suggesting the following code that I added to the JSL. Now any “key” tags that had “Playlist” as text values were not processed as data.
txt = XML Text();
txt == "Playlists", process tag = 0,
Is Missing( Num( txt ) ),
raw dt << Add Rows( 1 );
raw dt:Key[Row()] = txt;
txt == "Tracks", process tag = 1
You can download my JSL program from the JMP File Exchange and investigate your own iTunes data. (Download requires a free JMP User Community account.)
I’ll save the full analysis of the data for my last post, but I’ll give you a preview. Below is a Treemap showing my listening by genre and artist. I do love Bruce Springsteen and have “forced” my kids to listen to him a lot over the years, even taking them to many live concerts. I also love opera, jazz and blues. As you can see, I just love music.
Treemap of Play Count by Genre and Artist using the JMP Treemap platform.