Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
I now present a simple case study utilising many of the functionalities for data access and management presented in that prior blog post.
The data for this scenario comes from a tablet manufacturing process from the pharmaceutical industry. There are two data sets, the first contains Y or response measurements of average tablet dissolution in 90 lots: For each lot, 12 tablets are individually dissolved in water according to a specific measurement protocol, and the average amount of each tablet that is dissolved after 60 minutes is then recorded. This information was stored in text format in a file called Dissolution.txt.
The second data set contains X or predictor measures of process input measurements and supplier characteristics that may be potentially related to the lot-to-lot variation in average dissolution. This second data set is stored in Excel format in the file ProcessInput.xls.
In order to analyse the relationship between dissolution and potential predictors, we need to have a single, integrated and clean JMP table containing all relevant information. This blog post should be quite illustrative of the process, but I encourage you to play around with JMP using the files after reading this post. All you need is to download the two files from the JMP File Exchange and have JMP 9 installed on your computer (if you do not have JMP, you can try it free for 30 days by downloading it from our website).
Firstly, you need to access and import each file into JMP. In both cases, you can click File -> Open from the JMP Home window, navigate to the folder where the two data files are stored and select each file: Depending on the file type, you will get different additional prompts prior to importing into JMP.
Let’s start with Dissolution.txt. Here is a partial view of it in Notepad:
It is a plain .txt file with two columns (Lot ID and Dissolution), delimited by a semicolon, with column names in row 1 and values from row 2 onward.
This is a pretty standard way to deal with such data, and JMP provides a simple interface for importing such data: There are four options for importing text data via File > Open into JMP. I prefer importing text data by selecting the Data with Preview radio button in the File Open dialog:
Via Data with Preview, you get a two-screen wizard in which you select various file and column properties. In the first wizard screen, you select the field delimiters appropriate to importing the file correctly:
After selecting the appropriate field delimiters and clicking Next, you get the second wizard screen, which allows you to change variable names and change column format:
In this simple example, there is no need to change the default options automatically recognized by JMP. But depending upon the situation, you have the option of selecting field delimiters, charset and file length in the first wizard screen, and of changing column format or type (i.e., a numeric grouping variable that automatically is set to Numeric but you want to treat it as Character) in the second wizard screen.
After taking the default selections and clicking on the Import button, we get the data file in JMP table format:
The second file is an Excel file with following structure:
It has the standard format expected by JMP, as I explained in the previous article (Row 1 for column names and rows 2 onward containing data). In this case, I prefer importing into JMP by dragging it from the Explorer window and dropping to the JMP Home window. Alternatively, if you prefer, you can use File -> Open in JMP and then navigate to and open the Excel file and choose the default options.
The result is the equivalent JMP data table with all rows correctly inserted and columns converted to the right JMP type:
Now you have both tables open in JMP. Currently, they are not stored anywhere; rather, they are contained in memory: You do not need to store them in advance, and you can save just the integrated data table after combining the two and cleaning the data.
The two tables – Dissolution and ProcessInputs – are linked by the API Lot No information, which you will use to join the tables via the Merge function.
With ProcessInputs being the active table, select Table -> Join. In the resulting Join dialog, select to join ProcessInputs with Dissolution, select API Lot No from both sources as Match columns, select the option to “Merge same name columns” and give an appropriate Output table name, as below:
After clicking on OK, you get the IntegratedData table:
This table contains all source data plus an additional MatchFlag column, stating the matching info from both columns. For our purposes, this column can be deleted: Select it, right mouse-click and choose Delete Columns.
Dissolution information is now linked with process information, but you are still not ready to analyse that relationship. Why? The reason is that the data needs to be further investigated and eventually cleansed and enriched.
The very first thing that we suggest at JMP as a validation aid before analysing data is to visually scan the data values of every variable using Analyze -> Distribution for all columns. In our example, by visually investigating all histograms, you find this situation for Sugar Supplier column:
Clearly, the Sour sugar supplier has been misspelled as Dour twice in the original Excel file. You can easily cleanse that error by selecting Sugar Supplier columns, clicking from the main menu on Cols -> Recode and typing the correct value in:
This time, you are going to take the default “In Place” action, and you just click OK, but there can be situations in which you may need to recode and derive new information.
In our example, a new column is needed: You have Exhaust Temperature in the Fahrenheit scale, but in order to compare it with standard specifications you need to have it in Celsius format as well. You accomplish that task by creating a new column with Cols -> New Column from the table Main menu. You give it the correct name, and after clicking the Column Properties button, you select Formula and click Edit Formula:
The Formula Editor pops up. For our purposes here, you only use the existing Temperature variable and the work panel in the middle to reproduce the well-known Fahrenheit to Celsius conversion formula:
By selecting variables, operators and a large number of functions, in JMP you can easily create in a point-and-click approach very complex formulas, including conditions, comparisons, statistics and more.
Some analyses in JMP require complete data – i.e., rows with missing data are excluded, while many other methods can deal with sparse tables. In either case, it is helpful to understand the extent of the missing data prior to analysis. You can achieve this by clicking on Tables -> Missing Data Patterns from table menu and selecting all columns: A new JMP table is created, with an indication of how many rows have missing values for any column or combination of columns. In our example:
It turns out that 84 rows have no missing values, while six have some columns missing as indicated in the pattern. You can decide to keep the table then, and use File -> Save to save it as a JMP table; optionally, you can save any of the intermediate tables.
Now that data has been extracted, integrated, cleansed, enriched and validated, you can start to have fun in analysing it!