I was looking for good data to demonstrate the new Query Builder, which is coming in JMP 12 in March. The Query Builder is a modern SQL database query tool that helps users discover interesting data and make correct queries that are repeatable and sharable. The data needed to be different from the sample data that ships with JMP. It needed to be a set of tables, which is how many of our customers store their data. And this set had to have a defined relationship among the tables, each describing a different topic of interest.
Where should I look for data that contained multiple topics and was rich enough to show how the software could speed up statistical discovery? It occurred to me that many of my co-workers use iTunes to listen to music and that they may be generating interesting data from their listening.
With a little research on the web, I found that iTunes generates an XML file to make music and playlists available to other applications. This sounded like a good fit for JMP. And because I am a data analysis geek, looking at iTunes data in JMP seemed fun, and I could get the data I needed.
I wrote an XML parser for the iTunes file using JSL. I also wrote a brief survey in JSL to collect demographic data from the people who were willing to share their iTunes data. The project broke neatly into four parts:
Reading XML with JMP
We often get the question, “Does JMP open XML Files?” This is a tricky question; XML is a text file that contains information organized in an arbitrary tree of “tags.”
Let’s take a look at a simple XML example containing data for two variables, X and Y, with three rows of data.
Here's an XML data example:
Here's a simple JSL example to parse the XML data:
example = load text file("xml data example.xml");
Start Tag( New Table( XML Attr( "name" ) ) )
End Tag ( New Column( XML Attr( "name" ),
Set Values( Parse( XML Text() ) ) ) )
Notice the three key elements used in the JSL above. The On Element syntax indicates what JSL expression to execute when the “table” or “col” string is found within the XML “tag” denoted by the symbols. The Start Tag and End Tag indicate what JSL expression should be executed at the start of the “table” or “col” tags or at the end of the tags. In this case, a New Table expression is executed whenever a “table” tag is started, and a New Column expression is executed whenever a “col” tag is ended (by a tag). Running the New Column and Set Values at the end tag ensures that the values for the column have all been read before setting values.
Here’s the resultant JMP data table:
Apple iTunes XML
Now let’s take a look at how Apple stores iTunes data in XML. The XML is in a format based upon Apple’s “plist” data structure. I found a document on the Apple website that gave the basics of the file elements.
Here is a snippet of the iTunes Library.xml file that is generated by my use of iTunes. The key tag is used as a generic trigger for what could be a column name or to indicate that a new track is coming. The other tags (integer, string, etc.) denote observations in the data and give the types of data.
Here is a short JSL program I wrote that seemed to work fine with a reduced set of my iTunes Library.xml file. To create the reduced set of data, I simply opened the .xml file (mine is 14MB) in a text editor and copied the first few tracks worth of data.
iTunespath = "~/Music/iTunes/iTunes Library short.xml";
// Load the xml file into memory in the Global 'iTunespath'
cd_file_contents = Load Text File( iTunespath );
//Create an empty data table to hold the raw data
raw dt = New Table( "iTunes Data raw" );
raw dt << New Column( "names", character );
raw dt << New Column( "Values", character );
//Parse the iTunes xml
Parse XML( cd_file_contents,
OnElement( "key", Start Tag( If( Row() > 0, raw dt << Add Rows( 1 ) ) ), end tag( Column( raw dt, "names" )[Row()] = XML Text() ) ),
onElement( "integer", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
onElement( "real", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
onElement( "date", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
onElement( "data", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
onElement( "string", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
onElement( "true", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
onElement( "false", end tag( Column( raw dt, "values" )[Row()] = XML Text() ) ),
End Tag( row++ )
); //end parse of the iTunes xml
Here is what the program produced:
Notice that there is some data at the beginning that I do not need, but I can clean that up later.
Now, the big question is, will the script scale to my 14MB file? Execution time, by my watch, was 1 minute 45 seconds. In my next blog post, I will show how to use the JSL Debugger’s Performance Profiler to find out how to speed up my code.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.