cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
viralnerd2013
Level II

Subset scripting

Hello JMP users,

I am able to generate scripts to subset large data files, but then have to manually name these and run them all individually through a second and third subset script. Is there a way to iteratively run the subset command and save the files to include the subset definitions?

For example: Subset file "Colors" by column1 (red, blue, green) to produce and save files: Colors.red.jmp, Colors.blue.jmp, Colors.green.jmp and then subset these by column8 (bright, light, dark) to create and save files

Colors.red.bright.jmp, Colors.red.light.jmp, Colors.red.dark.jmp, Colors.blue.bright.jmp, Colors.blue.light.jmp, Colors.blue.dark.jmp, Colors.green.bright.jmp, Colors.green.light.jmp, Colors.green.dark.jmp.

The columns positions would vary.

Thank you !!!

1 ACCEPTED SOLUTION

Accepted Solutions
ms
Super User (Alumni) ms
Super User (Alumni)

Re: Subset scripting

At least In JMP 10 there is a "Subset by" checkbox. If selected, subsets for every level of selected column(s) are automatically generated. It is easy to implement that function in JSL if the the number of by-columns is known. However, to generalize this was more tricky than I imagined.

This approach, based on inserting the sequentially generated subsets in lists (within lists) seems to work though:

//Example Table

dt = New Table( "Color",

  Add Rows( 12 ),

  New Column( "Color",

  Character,

  Nominal,

  Set Values( {"red", "red", "red", "red", "blue", "blue", "blue", "blue", "blue", "green", "green", "green"} )

  ),

  New Column( "Intensity",

  Character,

  Nominal,

  Set Values(

  {"bright", "bright", "dark", "light", "light", "light", "light", "dark", "dark", "dark", "light", "bright"}

  )

  ),

  New Column( "Finish",

  Character,

  Nominal,

  Set Values(

  {"Matte", "Blank", "Matte", "Blank", "Matte", "Blank", "Matte", "Blank", "Matte", "Blank", "Matte", "Blank"}

  )

  ),

  New Column( "Colensity", Character, Nominal )

);

For Each Row( :Colensity = Color || Intensity || Finish ); // Just for checking that it's working...

// End example table

// Script starts here

cols = {Color, Intensity, Finish}; // Any nr of subset-by-columns (in hierarchical order)

nt = N Table(); //number of currently open tables

subList = Eval List( {dt << subset( by( cols[1] ) )} ); //First-level subsets

//Make lower-level subsets recursively

For( i = 2, i <= N Items( cols ), i++,

  Insert Into( subList, Eval List( {Eval( subList[i - 1] ) << subset( by( cols[i] ) )} ) )

);

// name and save the all subsets

myPath = "path"; //change to desired path

For( i = 1, i <= N Table() - nt, i++,

  tv = Data Table( i ) << get table variable names; // The levels are stored as table variables

  subname = {};

  For( j = 1, j <= N Items( tv ), j++, 

  Insert Into( subname, Data Table( i ) << get table variable( tv[j] ) )

  );

  Insert Into( subname, dt << get name );

  Reverse Into( subname );

  Data Table( i ) << save( myPath || Concat Items( subname, "." ) );

);

// End Script      //close all(data tables);

View solution in original post

8 REPLIES 8
pmroz
Super User

Re: Subset scripting

I have an application where I need to subset a large dataset in several different ways, and run the same analysis on each subset.  Rather than create separate datasets, I use a combination of SELECT and EXCLUDE to make the data in the large dataset ready for analysis.  For example:

dt = New Table( "Color Example",

    Add Rows( 12 ),

    New Column( "Color", Character, Nominal,

        Set Values(

            {"red", "red", "red", "red", "blue", "blue", "blue", "blue", "blue",

            "green", "green", "green"} ) ),

    New Column( "Intensity", Character, Nominal,

        Set Values(

            {"bright", "bright", "dark", "light", "light", "light", "light", "dark",

            "dark", "dark", "light", "bright"} ) ) );

nr = nrows(dt);

color_list = {"red", "blue", "green"};

intensity_list = {"bright", "light", "dark"};

for (i = 1, i <= nitems(color_list), i++,

    one_color = color_list[i];

    for (k = 1, k <= nitems(intensity_list), k++,

        one_intensity = intensity_list[i];

// Exclude all rows not matching this color and intensity

        dt << select where(:Color == one_color & :Intensity == one_intensity)

            << invert row selection << Exclude;

        nselected = nr - nrows(dt << Get Selected rows);

        dt << clear select;

        if (nselected > 0,

// Do your analysis on dt here.  Only the non-excluded rows will be used. 

// The only command I've found that ignores excluded rows is Tables >> Summary

            print("# rows: " || char(nselected) || " color: " || one_color ||

                    " intensity: " || one_intensity);

        );

// Clean up for the next selection

        dt << clear row states;

    );

);

viralnerd2013
Level II

Re: Subset scripting

Thank you PMroz,

The problem is that I do need to keep the intermediate files and the example I gave was made up, the column names would change each time.

Thanks,

Eric

ms
Super User (Alumni) ms
Super User (Alumni)

Re: Subset scripting

At least In JMP 10 there is a "Subset by" checkbox. If selected, subsets for every level of selected column(s) are automatically generated. It is easy to implement that function in JSL if the the number of by-columns is known. However, to generalize this was more tricky than I imagined.

This approach, based on inserting the sequentially generated subsets in lists (within lists) seems to work though:

//Example Table

dt = New Table( "Color",

  Add Rows( 12 ),

  New Column( "Color",

  Character,

  Nominal,

  Set Values( {"red", "red", "red", "red", "blue", "blue", "blue", "blue", "blue", "green", "green", "green"} )

  ),

  New Column( "Intensity",

  Character,

  Nominal,

  Set Values(

  {"bright", "bright", "dark", "light", "light", "light", "light", "dark", "dark", "dark", "light", "bright"}

  )

  ),

  New Column( "Finish",

  Character,

  Nominal,

  Set Values(

  {"Matte", "Blank", "Matte", "Blank", "Matte", "Blank", "Matte", "Blank", "Matte", "Blank", "Matte", "Blank"}

  )

  ),

  New Column( "Colensity", Character, Nominal )

);

For Each Row( :Colensity = Color || Intensity || Finish ); // Just for checking that it's working...

// End example table

// Script starts here

cols = {Color, Intensity, Finish}; // Any nr of subset-by-columns (in hierarchical order)

nt = N Table(); //number of currently open tables

subList = Eval List( {dt << subset( by( cols[1] ) )} ); //First-level subsets

//Make lower-level subsets recursively

For( i = 2, i <= N Items( cols ), i++,

  Insert Into( subList, Eval List( {Eval( subList[i - 1] ) << subset( by( cols[i] ) )} ) )

);

// name and save the all subsets

myPath = "path"; //change to desired path

For( i = 1, i <= N Table() - nt, i++,

  tv = Data Table( i ) << get table variable names; // The levels are stored as table variables

  subname = {};

  For( j = 1, j <= N Items( tv ), j++, 

  Insert Into( subname, Data Table( i ) << get table variable( tv[j] ) )

  );

  Insert Into( subname, dt << get name );

  Reverse Into( subname );

  Data Table( i ) << save( myPath || Concat Items( subname, "." ) );

);

// End Script      //close all(data tables);

viralnerd2013
Level II

Re: Subset scripting

Hello MS,

That was very close!  It created all of the subsets, but did not save any of the tables. I got this message: Unable to save file. Unknown data target.

I am using JMP 9 so I am wondering if there is a difference in a command for creating the file or am I doing something incorrectly? I put the correct path into the script.

Thanks,

Eric

ms
Super User (Alumni) ms
Super User (Alumni)

Re: Subset scripting

I just tried the script in JMP 9.0.3 and it worked perfectly, as in JMP 10.

Check your path variable. It must be a string and begin and end with /.

For example on my Mac this is working: myPath = "/Users/MS/Desktop/"

viralnerd2013
Level II

Re: Subset scripting

This is strange. The path is correct because I used it to open a dt in the same directory. It creates all of the files, but they have names like "Subset of Subset of MyFile 2 3.jmp", "Subset of Subset of MyFile 2 2.jmp", etc. These files are exactly what I want and it is amazing that this script creates them, unfortunately it is not naming them correctly and saving them. Worst case scenario, I name them manually, which still saves me a ton of time!

Thanks for your efforts.

Eric

ms
Super User (Alumni) ms
Super User (Alumni)

Re: Subset scripting

Might be a performance issue. Try to insert a Wait(0) command within both For-loops to ensure that the tables are complete before JMP tries to save them.

viralnerd2013
Level II

Re: Subset scripting

That still did not work. I added more time (30), (90), etc., too. I was able to get the files to be save protected, meaning that I could not close them without stating whether or not they should be saved, but the naming was still subset of subset of myfile 2 3.jmp, etc. I am thinking that the list of subset variables is off somehow. I will troubleshoot more over the weekend and I can seek some help here from someone used to working with our version of JMP.

Thank you so much for your help and advice and have a great weekend, MS!