For most of us, the data we analyze in JMP starts out somewhere else: in a relational database, Excel, a CSV file or perhaps SAS. The need to seamlessly move such data into JMP and prepare it for analysis led us to introduce the Query Builder feature in JMP 12. Query Builder helps you select multiple tables from an external data source and join them. Then, you can interactively filter (creating a prompting filter if desired), sample and set column names, formats and modeling types for the imported data.
The feedback we’ve gotten from users about Query Builder suggests that you are finding it useful. We have also gotten suggestions for fixes and enhancements, both for Query Builder and other aspects of data access. With JMP 13, we are delivering a boatload of such fixes and enhancements. The 13 most important such fixes and enhancements are detailed below.
The first four enhancements all relate to filtering data.
Careful, my data could be huge – When you create a filter for a categorical column, Query Builder retrieves the values to display in a list. With large tables, this can take a long time. In JMP 12, value retrieval was unconditional, plus there was not a way to cancel it. In JMP 13, we have made several changes to prevent long waits:
List filters are now invertible – All of the list-type filters (List Box, Manual List, Check Box List and Match Column Values) now have a Not in List check box. This allows you to select a couple items and retrieve all rows that do not match the selected values. For example, this filter will return all movies rated something other than “G”:
List filters can now be conditional – This one is sort of a big deal. Using the red-triangle menu on a list-type filter, you can set the filter to be conditional. Conditional filters only display values that match other filters that precede them in the list. Below is an example using movie Rating and movie Genre. In this example, I have asked for the Genre filter to be conditional. When I select G in the list for Rating, the Genre filter changes to list only genres that contain at least one G-rated film:
JMP Query Builder
After using Query Builder, some users would ask us, “What if I just have a folder full of JMP data tables. Can I use Query Builder on them?” In JMP 13, the answer is a resounding “Yes!” Or perhaps you use ODBC Query Builder, Text Import or the Excel Import Wizard to import several tables. It would nice to be able to use Query Builder to join the results. With JMP 13, you can!
To use Query Builder on JMP data tables, first open the tables, and then select JMP Query Builder from the Tables menu.
For example, JMP has two sample data tables, SATByYear.jmp and CrimeData.jmp, that both have State and Year columns. Another sample table, US Demographics.jmp, has a State column. I can easily join these three tables with JMP Query Builder:
All of the other features of Query Builder, such as filters and prompting, are also available with JMP Query Builder.
We built a SQL engine into JMP to allow Query Builder to work on JMP data tables. A new JSL function, Query(), gives you direct access to that SQL engine. You can use the Query() function to manipulate JMP data tables using SQL statements. Here is an example using SATByYear and CrimeData sample data tables:
Run on Open
In JMP 13, you can configure a query to immediately run when you open it instead of opening the Query Builder window. Simply check the Run on Open option on the red-triangle menu at the top of the Query Builder window:
This is especially useful for queries that have prompted filters. You can send these queries to others (or incorporate them into a JMP add-in), and when the other user opens them, they will just see the filter prompt. This allows them to make their filter selections without having to wade through the complexities of Query Builder.
When a query has been set to Run on Open, but you need to open it into Query Builder to make changes, you have a few options. If you hold down the Ctrl key while opening the query, it will open into the Query Builder window. Alternatively, you can right-click on the query file in the JMP Home Window and select Edit Query.
One caveat to all these neat new JMP 13 Query Builder features – if you create queries that use these features, you will not be able to open them in JMP 12. At the same time, you may get JMP 13 earlier than your co-workers so that you can try out other new features.
To help with this scenario, we have added a preference in JMP 13 that hides all of the new JMP 13 features of Query Builder so that the queries you build will still be compatible with JMP 12. The preference is on the Query Builder Preferences page:
Any ODBC or SAS queries you build after setting that preference will only allow features that are compatible with JMP 12. If you want to relax that rule for a particular query, there is an option on Query Builder’s red-triangle menu that you can uncheck to allow JMP 13 features for that query:
When querying large tables from databases, sometimes it is helpful to retrieve just the first thousand or so rows of data for a query to experiment with before you spend the time and resources to retrieve all the data.
In JMP 12, First N Rows sampling was supported for the Oracle and SQL Server databases. In JMP 13, support has been added for most other databases, including PostgreSQL, MySQL, Microsoft Access, SQLite, Apache Hive, and Cloudera Impala.
More and more data is being stored in “big data” databases these days. JMP 13 improves date support for sources like Apache Hive, Cloudera Impala and HortonWorks. Also, saving tables with File > Database > Save Table did not work well with some of these data sources. That has been improved in JMP 13, with the caveat that using ODBC to save data to Hadoop-based data sources is not a very efficient way to get data to them.
If you do a lot with CSV files, support for the Microsoft Text Files ODBC driver has been improved in JMP 13.
Keeping data in a database makes it convenient to provide access to whoever needs it. For many releases, JMP has supported saving JMP data tables to databases via the File > Database > Save Table feature. However, with data sizes getting larger and larger, we have had reports that saving JMP tables to a database was taking much longer than people felt that it should. We listened and investigated, and we are happy to report that, in JMP 13, the performance of saving JMP tables to databases has significantly improved, in some cases dramatically. Please try this feature again and let us know what you experience.
With JMP, all of the data you are analyzing has to fit in memory. When you join JMP data tables with either Tables > Join or the new JMP Query Builder, data tends to get duplicated from smaller “look-up” tables into the larger join result. To help prevent this duplication, the Virtual Join feature has been added in JMP 13. For example, a DVD store might have an inventory table that knows where all the DVD’s are and a film table with details about each title. In the film table, I can set the film_id column to be the Link ID for the table:
Then, in the inventory table, I can set film_id to be a link reference to the film table. This action effectively joins the two tables based on the film_id column.
Once I’ve set that up, columns from the film table now appear in the column list for inventory. They are designated "referenced columns" and are initially hidden. I can unhide whichever columns I want to appear in the inventory table, in this case title[film_id]:
This just scratches the surface of Virtual Join, which is worthy of a blog post all on its own.
So, there you have it – a look at the many enhancements for accessing and manipulating data in JMP 13. Which feature is your favorite? What feature were you hoping to see that was not mentioned? Let me know in the comments.
For more information on using Query Builder for JMP data tables, check out my Discovery Summit poster presentation in the JMP User Community. While you're there, you can also see the slides from my Discovery Summit tutorial titled, "Wrangling All Your Data With Query Builder in JMP 13."