Subscribe Bookmark RSS Feed
Eric_Hill

Staff

Joined:

Oct 1, 2013

13 reasons data access is better than ever in JMP 13

For most of us, the data we analyze in JMP starts out somewhere else: in a relational database, Excel, a CSV file or perhaps SAS. The need to seamlessly move such data into JMP and prepare it for analysis led us to introduce the Query Builder feature in JMP 12. Query Builder helps you select multiple tables from an external data source and join them. Then, you can interactively filter (creating a prompting filter if desired), sample and set column names, formats and modeling types for the imported data.

The feedback we’ve gotten from users about Query Builder suggests that you are finding it useful. We have also gotten suggestions for fixes and enhancements, both for Query Builder and other aspects of data access. With JMP 13, we are delivering a boatload of such fixes and enhancements. The 13 most important such fixes and enhancements are detailed below.

Filtering

The first four enhancements all relate to filtering data.

 

Careful, my data could be huge – When you create a filter for a categorical column, Query Builder retrieves the values to display in a list. With large tables, this can take a long time. In JMP 12, value retrieval was unconditional, plus there was not a way to cancel it. In JMP 13, we have made several changes to prevent long waits:

  • Cancelable value retrieval – JMP 13 puts up a progress bar with a Cancel button when retrieving categorical column values. This is supported for all ODBC drivers we have tested when JMP is running on Windows. It is not supported when connecting to SAS or for most ODBC drivers available for the Macintosh.Too big to attempt – If there are more than 1,000,000 rows in a table, JMP will not even attempt to retrieve unique column values. The 1,000,000 value can be changed via a preference.
  • Simpler list – In JMP 12, the Check Box List was the only type of filter available for selecting from a list of values. In JMP 13, we have added a plain List Box filter type. The List Box filter is less resource-intensive than the Check Box List filter. This makes it better-suited for larger lists. The default filter type for categorical columns is the List Box in JMP 13.

New filter types – In addition to the new, simpler List Box filter type, two more filter types have been added for categorical columns in JMP 13:

 

  • Contains filter – Enter some text, and JMP will match all rows that contain that text. You can also ask to match rows that do not contain the text.Manual List filter – Allows you to create a list of selections yourself to avoid the need for values to be looked up.

List filters are now invertible – All of the list-type filters (List Box, Manual List, Check Box List and Match Column Values) now have a Not in List check box. This allows you to select a couple items and retrieve all rows that do not match the selected values. For example, this filter will return all movies rated something other than “G”:

List filters can now be conditional – This one is sort of a big deal.  Using the red-triangle menu on a list-type filter, you can set the filter to be conditional. Conditional filters only display values that match other filters that precede them in the list. Below is an example using movie Rating and movie Genre. In this example, I have asked for the Genre filter to be conditional. When I select G in the list for Rating, the Genre filter changes to list only genres that contain at least one G-rated film:

This symbol indicates that the filter is conditional. Only filters for columns from the same table affect the values displayed in a conditional filter.

JMP Query Builder

 After using Query Builder, some users would ask us, “What if I just have a folder full of JMP data tables. Can I use Query Builder on them?” In JMP 13, the answer is a resounding “Yes!” Or perhaps you use ODBC Query Builder, Text Import or the Excel Import Wizard to import several tables. It would nice to be able to use Query Builder to join the results. With JMP 13, you can!

To use Query Builder on JMP data tables, first open the tables, and then select JMP Query Builder from the Tables menu.

tables_jmp_query_builder

For example, JMP has two sample data tables, SATByYear.jmp and CrimeData.jmp, that both have State and Year columns. Another sample table, US Demographics.jmp, has a State column. I can easily join these three tables with JMP Query Builder:

JMP Query Builder allows up to 64 tables to be joined together. If you ever get that many tables into one query, please send me a screenshot.

All of the other features of Query Builder, such as filters and prompting, are also available with JMP Query Builder.

Query() JSL Function

We built a SQL engine into JMP to allow Query Builder to work on JMP data tables. A new JSL function, Query(), gives you direct access to that SQL engine. You can use the Query() function to manipulate JMP data tables using SQL statements. Here is an example using SATByYear and CrimeData sample data tables:

 Run on Open

In JMP 13, you can configure a query to immediately run when you open it instead of opening the Query Builder window. Simply check the Run on Open option on the red-triangle menu at the top of the Query Builder window:

This is especially useful for queries that have prompted filters. You can send these queries to others (or incorporate them into a JMP add-in), and when the other user opens them, they will just see the filter prompt. This allows them to make their filter selections without having to wade through the complexities of Query Builder.

When a query has been set to Run on Open, but you need to open it into Query Builder to make changes, you have a few options. If you hold down the Ctrl key while opening the query, it will open into the Query Builder window. Alternatively, you can right-click on the query file in the JMP Home Window and select Edit Query.

Creating queries that will work in JMP 12

One caveat to all these neat new JMP 13 Query Builder features – if you create queries that use these features, you will not be able to open them in JMP 12. At the same time, you may get JMP 13 earlier than your co-workers so that you can try out other new features.

To help with this scenario, we have added a preference in JMP 13 that hides all of the new JMP 13 features of Query Builder so that the queries you build will still be compatible with JMP 12. The preference is on the Query Builder Preferences page:

Any ODBC or SAS queries you build after setting that preference will only allow features that are compatible with JMP 12. If you want to relax that rule for a particular query, there is an option on Query Builder’s red-triangle menu that you can uncheck to allow JMP 13 features for that query:

New features on the Tables panel

The Tables panel on the Query Builder window in JMP 12 did not have much functionality other than showing you the list of tables in your query.  In JMP 13, that panel gains a number of features:

  • Selecting one or more tables in the Tables panel restricts the columns listed in the Available Columns panel to just columns from the selected tables, making columns easier to find.
  • The Tables panel now displays the Venn diagram icon corresponding to the join type for each table, and you can edit the join, change the table alias, or remove the table from the query from the context menu.
  • When querying JMP data tables, double-clicking a table in the Tables panel makes the table visible and brings it to the front (or select the View item on the context menu).

“First N Rows” Sampling

When querying large tables from databases, sometimes it is helpful to retrieve just the first thousand or so rows of data for a query to experiment with before you spend the time and resources to retrieve all the data.

In JMP 12, First N Rows sampling was supported for the Oracle and SQL Server databases. In JMP 13, support has been added for most other databases, including PostgreSQL, MySQL, Microsoft Access, SQLite, Apache Hive, and Cloudera Impala.

Improved Hadoop and Text File Support

More and more data is being stored in “big data” databases these days. JMP 13 improves date support for sources like Apache Hive, Cloudera Impala and HortonWorks. Also, saving tables with File > Database > Save Table did not work well with some of these data sources. That has been improved in JMP 13, with the caveat that using ODBC to save data to Hadoop-based data sources is not a very efficient way to get data to them.

If you do a lot with CSV files, support for the Microsoft Text Files ODBC driver has been improved in JMP 13.

Saving JMP data to a database is much faster

Keeping data in a database makes it convenient to provide access to whoever needs it. For many releases, JMP has supported saving JMP data tables to databases via the File > Database > Save Table feature. However, with data sizes getting larger and larger, we have had reports that saving JMP tables to a database was taking much longer than people felt that it should. We listened and investigated, and we are happy to report that, in JMP 13, the performance of saving JMP tables to databases has significantly improved, in some cases dramatically. Please try this feature again and let us know what you experience.

Virtual Join

With JMP, all of the data you are analyzing has to fit in memory. When you join JMP data tables with either Tables > Join or the new JMP Query Builder, data tends to get duplicated from smaller “look-up” tables into the larger join result. To help prevent this duplication, the Virtual Join feature has been added in JMP 13. For example, a DVD store might have an inventory table that knows where all the DVD’s are and a film table with details about each title. In the film table, I can set the film_id column to be the Link ID for the table:

Then, in the inventory table, I can set film_id to be a link reference to the film table. This action effectively joins the two tables based on the film_id column.

Once I’ve set that up, columns from the film table now appear in the column list for inventory. They are designated "referenced columns" and are initially hidden. I can unhide whichever columns I want to appear in the inventory table, in this case title[film_id]:

Virtual Join allows me to see the values from the film table in the inventory table. However, they have not been physically copied. They are looked up as needed, which saves memory.

This just scratches the surface of Virtual Join, which is worthy of a blog post all on its own.

So, there you have it – a look at the many enhancements for accessing and manipulating data in JMP 13. Which feature is your favorite? What feature were you hoping to see that was not mentioned? Let me know in the comments.

For more information on using Query Builder for JMP data tables, check out my Discovery Summit poster presentation in the JMP User Community. While you're there, you can also see the slides from my Discovery Summit tutorial titled, "Wrangling All Your Data With Query Builder in JMP 13."