- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Subset scripting
Hello JMP users,
I am able to generate scripts to subset large data files, but then have to manually name these and run them all individually through a second and third subset script. Is there a way to iteratively run the subset command and save the files to include the subset definitions?
For example: Subset file "Colors" by column1 (red, blue, green) to produce and save files: Colors.red.jmp, Colors.blue.jmp, Colors.green.jmp and then subset these by column8 (bright, light, dark) to create and save files
Colors.red.bright.jmp, Colors.red.light.jmp, Colors.red.dark.jmp, Colors.blue.bright.jmp, Colors.blue.light.jmp, Colors.blue.dark.jmp, Colors.green.bright.jmp, Colors.green.light.jmp, Colors.green.dark.jmp.
The columns positions would vary.
Thank you !!!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Subset scripting
At least In JMP 10 there is a "Subset by" checkbox. If selected, subsets for every level of selected column(s) are automatically generated. It is easy to implement that function in JSL if the the number of by-columns is known. However, to generalize this was more tricky than I imagined.
This approach, based on inserting the sequentially generated subsets in lists (within lists) seems to work though:
//Example Table
dt = New Table( "Color",
Add Rows( 12 ),
New Column( "Color",
Character,
Nominal,
Set Values( {"red", "red", "red", "red", "blue", "blue", "blue", "blue", "blue", "green", "green", "green"} )
),
New Column( "Intensity",
Character,
Nominal,
Set Values(
{"bright", "bright", "dark", "light", "light", "light", "light", "dark", "dark", "dark", "light", "bright"}
)
),
New Column( "Finish",
Character,
Nominal,
Set Values(
{"Matte", "Blank", "Matte", "Blank", "Matte", "Blank", "Matte", "Blank", "Matte", "Blank", "Matte", "Blank"}
)
),
New Column( "Colensity", Character, Nominal )
);
For Each Row( :Colensity = Color || Intensity || Finish ); // Just for checking that it's working...
// End example table
// Script starts here
cols = {Color, Intensity, Finish}; // Any nr of subset-by-columns (in hierarchical order)
nt = N Table(); //number of currently open tables
subList = Eval List( {dt << subset( by( cols[1] ) )} ); //First-level subsets
//Make lower-level subsets recursively
For( i = 2, i <= N Items( cols ), i++,
Insert Into( subList, Eval List( {Eval( subList[i - 1] ) << subset( by( cols[i] ) )} ) )
);
// name and save the all subsets
myPath = "path"; //change to desired path
For( i = 1, i <= N Table() - nt, i++,
tv = Data Table( i ) << get table variable names; // The levels are stored as table variables
subname = {};
For( j = 1, j <= N Items( tv ), j++,
Insert Into( subname, Data Table( i ) << get table variable( tv[j] ) )
);
Insert Into( subname, dt << get name );
Reverse Into( subname );
Data Table( i ) << save( myPath || Concat Items( subname, "." ) );
);
// End Script //close all(data tables);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Subset scripting
I have an application where I need to subset a large dataset in several different ways, and run the same analysis on each subset. Rather than create separate datasets, I use a combination of SELECT and EXCLUDE to make the data in the large dataset ready for analysis. For example:
dt = New Table( "Color Example",
Add Rows( 12 ),
New Column( "Color", Character, Nominal,
Set Values(
{"red", "red", "red", "red", "blue", "blue", "blue", "blue", "blue",
"green", "green", "green"} ) ),
New Column( "Intensity", Character, Nominal,
Set Values(
{"bright", "bright", "dark", "light", "light", "light", "light", "dark",
"dark", "dark", "light", "bright"} ) ) );
nr = nrows(dt);
color_list = {"red", "blue", "green"};
intensity_list = {"bright", "light", "dark"};
for (i = 1, i <= nitems(color_list), i++,
one_color = color_list[i];
for (k = 1, k <= nitems(intensity_list), k++,
one_intensity = intensity_list[i];
// Exclude all rows not matching this color and intensity
dt << select where(:Color == one_color & :Intensity == one_intensity)
<< invert row selection << Exclude;
nselected = nr - nrows(dt << Get Selected rows);
dt << clear select;
if (nselected > 0,
// Do your analysis on dt here. Only the non-excluded rows will be used.
// The only command I've found that ignores excluded rows is Tables >> Summary
print("# rows: " || char(nselected) || " color: " || one_color ||
" intensity: " || one_intensity);
);
// Clean up for the next selection
dt << clear row states;
);
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Subset scripting
Thank you PMroz,
The problem is that I do need to keep the intermediate files and the example I gave was made up, the column names would change each time.
Thanks,
Eric
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Subset scripting
At least In JMP 10 there is a "Subset by" checkbox. If selected, subsets for every level of selected column(s) are automatically generated. It is easy to implement that function in JSL if the the number of by-columns is known. However, to generalize this was more tricky than I imagined.
This approach, based on inserting the sequentially generated subsets in lists (within lists) seems to work though:
//Example Table
dt = New Table( "Color",
Add Rows( 12 ),
New Column( "Color",
Character,
Nominal,
Set Values( {"red", "red", "red", "red", "blue", "blue", "blue", "blue", "blue", "green", "green", "green"} )
),
New Column( "Intensity",
Character,
Nominal,
Set Values(
{"bright", "bright", "dark", "light", "light", "light", "light", "dark", "dark", "dark", "light", "bright"}
)
),
New Column( "Finish",
Character,
Nominal,
Set Values(
{"Matte", "Blank", "Matte", "Blank", "Matte", "Blank", "Matte", "Blank", "Matte", "Blank", "Matte", "Blank"}
)
),
New Column( "Colensity", Character, Nominal )
);
For Each Row( :Colensity = Color || Intensity || Finish ); // Just for checking that it's working...
// End example table
// Script starts here
cols = {Color, Intensity, Finish}; // Any nr of subset-by-columns (in hierarchical order)
nt = N Table(); //number of currently open tables
subList = Eval List( {dt << subset( by( cols[1] ) )} ); //First-level subsets
//Make lower-level subsets recursively
For( i = 2, i <= N Items( cols ), i++,
Insert Into( subList, Eval List( {Eval( subList[i - 1] ) << subset( by( cols[i] ) )} ) )
);
// name and save the all subsets
myPath = "path"; //change to desired path
For( i = 1, i <= N Table() - nt, i++,
tv = Data Table( i ) << get table variable names; // The levels are stored as table variables
subname = {};
For( j = 1, j <= N Items( tv ), j++,
Insert Into( subname, Data Table( i ) << get table variable( tv[j] ) )
);
Insert Into( subname, dt << get name );
Reverse Into( subname );
Data Table( i ) << save( myPath || Concat Items( subname, "." ) );
);
// End Script //close all(data tables);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Subset scripting
Hello MS,
That was very close! It created all of the subsets, but did not save any of the tables. I got this message: Unable to save file. Unknown data target.
I am using JMP 9 so I am wondering if there is a difference in a command for creating the file or am I doing something incorrectly? I put the correct path into the script.
Thanks,
Eric
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Subset scripting
I just tried the script in JMP 9.0.3 and it worked perfectly, as in JMP 10.
Check your path variable. It must be a string and begin and end with /.
For example on my Mac this is working: myPath = "/Users/MS/Desktop/"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Subset scripting
This is strange. The path is correct because I used it to open a dt in the same directory. It creates all of the files, but they have names like "Subset of Subset of MyFile 2 3.jmp", "Subset of Subset of MyFile 2 2.jmp", etc. These files are exactly what I want and it is amazing that this script creates them, unfortunately it is not naming them correctly and saving them. Worst case scenario, I name them manually, which still saves me a ton of time!
Thanks for your efforts.
Eric
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Subset scripting
Might be a performance issue. Try to insert a Wait(0) command within both For-loops to ensure that the tables are complete before JMP tries to save them.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Subset scripting
That still did not work. I added more time (30), (90), etc., too. I was able to get the files to be save protected, meaning that I could not close them without stating whether or not they should be saved, but the naming was still subset of subset of myfile 2 3.jmp, etc. I am thinking that the list of subset variables is off somehow. I will troubleshoot more over the weekend and I can seek some help here from someone used to working with our version of JMP.
Thank you so much for your help and advice and have a great weekend, MS!