Subscribe Bookmark RSS Feed

Maintaining 'Ghost' transformed columns created within platforms

ih

Community Trekker

Joined:

Sep 30, 2016

John Sall references virtual data as the one type of ghost data in his 2017 Discovery Summit Plenary Presentation. Simple examples of virtual data are transformed columns created within a platform (open graph builder, right click on a column, choose Transform -> Log). Can these columns be saved for use in other platforms?

 

What does work (in JMP 13):

Create a transformed column in the fit model platform and use it as a model effect. Until JMP is closed, that column appears in the list of columns in other platforms, italicized and at the bottom of the list.  It is possible, then, to:

  1. Make all column transformations in fit model and add them as model effects to predict any column using standard least squares.
  2. Save the fit column script, it can close itself after it runs.
    fm = Fit Model( ... );
    fm << Close window();
  3. Update the script whenever a new column transformation is needed.
  4. Run the script every time the data table is opened.

What does not work:

  • Creating the column in graph builder, distribution, partition, neural network, and other platforms.
  • Closing and re-opening JMP.
  • Saving a prediction formula column containing the transformed variable.
5 REPLIES
ih

Community Trekker

Joined:

Sep 30, 2016

Per JMP support this is not possible:

 

Transformed (and other 'in platform' variables) were designed for temporary use.  In order to keep them for future use (other JMP sessions), they must be saved to the data table.  They are not saved anywhere, as they are simply column formulas.

chris_kirchberg

Joined:

May 28, 2014

To clarify, the derived variable is saved and usable within the plaform from the saved script if it is saved as a script to the data table.

 

For instance. I open graph builder, selecte two columns and create a ratio via right click. Ghost data. I then use this ratio in the graph and then save the graph builder script to the data table. Then I save the data table. I close JMP and then open JMP and the table. No column acutally exists with the ratio in the data table, but when I run the graph builder script it contains the derived ratio and plots it even though it does not exist in the data table.

 

This derived variable is not availble for any other platform until you save it to the data table, that much is true. Is this what you mean by "in platform" variables not being availble?

 

It would make sense to me since we are specificlly defining a transformed variable within the context of the analysis and not within the context of the data table itself. One would need to make a derived column in the data table itself to be availbe to all platforms (select two columns and use New Formula Column or create a new column wiht a column formula as you have noted).

 

However, there seems to be a strange artifact that if you do this for fit model and save the results to the data table, then save the data table followed by closing JMP. Then open the saved data table and run the fit model (at least standard least squares saved script). That derived (ghost) variable suddenly becomes available to other platforms even when the standard least squares platform is closed. At least in 13.2.0 on the Mac. This does not work on the Early Adopter version, so I suspect that this maybe a "bug".

ih

Community Trekker

Joined:

Sep 30, 2016

This was my fear.  The problem is that, with the exception of graphing platforms, almost every time I try a 'temporary' transformation I use it in a couple different platforms before deciding if I want to keep it.

 

Thanks for the reply.

markbailey

Staff

Joined:

Jun 23, 2011

What is the harm in saving the temporary column in the data table the first time if you want to re-use it with other platforms? You can always delete it later if it has no persistent value.

Learn it once, use it forever!
ih

Community Trekker

Joined:

Sep 30, 2016

The biggest motivation for this is keeping track of which transformations I want to keep and which are still being evaluated.  In a manufacturing data mining example, hundreds or thousands of transformations might be generated, and if they are all saved to the data table (which I did up until a month ago and still do for complex transformations) then the user needs to keep track of which columns they created now and which they liked or used in previous analyses.

 

JMP also seems to use more memory when they are all stored in the data table.