cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
rverma
Level III

JSL - Find % of missing data

I have a data table with 3 columns

rverma_0-1614358487254.png

I need to create a table showing % of missing data in the HEIGHT column grouped by ID and SUB_ID. In the case above the result should be:

rverma_3-1614358927484.png

 

But it is possible in some cases that there are no rows with missing data. In that case Percent Missing should be 0.

rverma_2-1614358872417.png

 

I need help in writing a JSL script for this. I will appreciate any guidance. I have three issues in my script below:

 

1. How can I track handle/reference to a data table so that I can recall it any time. I have always used "dt = CurrentDataTable"  in the beginning of the script. But after having created few summary tables/ subset tables, I lose reference to the original table and don't know how to make it current.

2. I need this to work even when there are no missing rows. Right now script just stops in the middle.

3. I need all tables to close except the final table with Percent Missing.

// Read the original table 
dt = CurrentDataTable();
main_table = Data Table( dt ) << get name; // get the table name

// Create a summary table with no of rows grouped by ID and SUB_ID
dt_summary = dt<< Summary(Group(:ID, :SUB_ID), Freq("None"), Weight("None"), output table name( "Main Table Summary"));
:N Rows << Set Name( "Total No of Rows" );

// Create a subset of the original table with only the missing rows
Current Data Table(Data Table(main_table));
dt = CurrentDataTable();
dt_missing = dt << select where( Is Missing( :HEIGHT ) == 1 ) << Subset(Output Table Name("Missing Rows"));

If( !Is Empty( dt_missing ),
	dt_missing_summary = Data Table( dt_missing ) << Summary(
		Group( :ID, :SUB_ID ),Freq("None"), Weight("None"), output table name( "Missing Rows Table Summary"));
	:N Rows << Set Name( "No of Rows with Missing Data" );
		
	dt_missing_summary << Update(
		With( Data Table( dt_summary ) ),
		Match Columns( :ID = ID, :SUB_ID = :SUB_ID ),
		Add Columns from Update table( :Total No of Rows )
		);
		
	New Column( "PercentMissing",
		Numeric,
		"Continuous",
		Format( "Percent", 12, 2 ),
		Formula( :No of Rows with Missing Data / :Total No of Rows )
		);
	
	);
//Close( dt_missing, nosave );
//Close( dt_summary, nosave );

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Re: JSL - Find % of missing data

PLEASE RE-READ if you saw this earlier. There was a mistake.

 

Using JMP's built-in function, Col N Missing ( ), simplifies this task. I will show 2 approaches.

 

SOLUTION #1:

  • Create a column to count missings by group in the first table.
  • Summarize the first table.
  • Change the name of the N Rows column (if you want) and make a formula column for the ratio.

 

Names Default To Here( 1 );

//these 2 lines just create a table with overall missing rate of 35% for the weight varable.
dt = Open( "$Sample_Data\Car Physical Data.jmp" );
dt:weight[Loc( J( 116, 1, Random Binomial( 1, .35 ) ) == 0 )] = .;

//add a formula column to compute n missing rows.
dt << New Column( "Missing Rows", formula( Col N Missing( :weight, :country, :type ) ) );

//create summary table, rename N Rows column, and add a % missing formula column.
dtSum = dt << Summary( Group( :Country, :Type, :missing rows));
dtSum:N Rows << set name("Total Rows");
dtSum << new column ("Pct Missing", formula(:missing rows / :total rows));

 

SOLUTION #2:

  • Create all of the formula columns in the first table, then take a straight summary.
  • Optionally, delete the N Rows column from the Summary table.

** The "gotcha" in this method is in the Total Rows ( ) function. You need to lead off with a "1" for the first argument, to serve as a dummy column, because the first argument cannot be used as a grouping column. Omit the 1, and you'll see that the counts reflect the number of rows in the "type" group only... not the combination of country and type.

 

Cheers, and apologies for the error the first time through.

 

Names Default To Here( 1 );

//don't worry about these; these are just to get a table present with some missing values @ an
//overall missing rate of 35%
dt = Open( "$Sample_Data\Car Physical Data.jmp" );
dt:weight[Loc( J( 116, 1, Random Binomial( 1, .35 ) ) == 0 )] = .;

//create formula columns using built-in functions to get what you're after.
dt << New Column( "Missing Rows", formula( Col N Missing( :weight, :country, :type ) ) );
dt << New Column( "Total Rows", formula( Col Number( 1, :country, :type ) ) );
dt << New Column( "Group Pct Missing", formula( :missing rows / :total rows ) );

//view summary table.
dtSum = dt << Summary( Group( :Country, :Type, :missing rows, :total rows, :group pct missing));

 

View solution in original post

8 REPLIES 8

Re: JSL - Find % of missing data

Hi @rverma,

 

I see you are able to take advantage of Summary within Tables.  I can produce most of what you are doing in that section and then add a column with a formula to calculate the ratio(%). 

I could make the  script like this which will still work if there are no missing data (addesses item 2):

sumdt = dt << Summary( Group( :ID, :SUB_ID ), N Missing( :HEIGHT ), Freq( "None" ), Weight( "None" ) );
sumdt << New Column( "Percent Missing",
	Numeric,
	Continuous,
	Format( Percent ),
	Formula( :"N Missing(HEIGHT)"n / :N Rows )
);

Then you could subset on this. And still close the other windows.

Its a start

Chris Kirchberg, M.S.2
Data Scientist, Life Sciences - Global Technical Enablement
JMP Statistical Discovery, LLC. - Denver, CO
Tel: +1-919-531-9927 ▪ Mobile: +1-303-378-7419 ▪ E-mail: chris.kirchberg@jmp.com
www.jmp.com
jthi
Super User

Re: JSL - Find % of missing data

First I strongly suggest checking out JMP Scripting guide: JMP15 Scripting Guide and Scripting Index found from Jmp / Help / Scripting Index. These are very good sources for JSL scripting.

 

To do this task you won't need any jsl:

1. Let's start with the example table you provided us with two missing values

2. Create a summary of the table with N Missing with Height as statistics and SUB_ID and ID as Group

jthi_7-1614361294291.png

jthi_8-1614361328172.png

3. Create new Formula by selecting N Rows and N Missing(HEIGHT) columns

jthi_9-1614361377007.png

4. Change format to Percent from Column Properties and rename columns as needed

 

Based on these steps you could also fairly easily use JMPs provided functionality to show scripts needed for steps to change this manual process to automated process (small changes should be done to make it more robust).

 

And for your questions 1 & 3:

1. It depends a lot how you create/open/get your datatables. If you always have the same name, you could reference that but I wouldn't trust that in long run as it is so easy to create multiple datatables with same name and JMP will add indexing after the names. But these can be referenced by Datatable(nameofdatatable).

3. You can get list of datatables with Get Data Table List(). Loop over datatables and close unnecessary ones. But in my opinion you should always close the datatable you won't need as soon as it is possible with Close(reference to datatable, No save).

 

Example script to perform the task:

 

Names Default To Here(1);

//create example data and add reference to dt variable
dt = New Table("exampledata",
	Add Rows(8),
	Compress File When Saved(1),
	New Column("ID",
		Numeric,
		"Nominal",
		Format("Best", 12),
		Set Values([1, 1, 1, 1, 2, 2, 2, 2])
	),
	New Column("SUB_ID",
		Character,
		"Nominal",
		Set Values({"A", "A", "B", "B", "A", "A", "B", "B"})
	),
	New Column("HEIGHT",
		Numeric,
		"Continuous",
		Format("Best", 12),
		Set Values([10, 11, ., 9, ., 12, 8, 12])
	)
);


//note Link to original data table(0)
//dt_summary is reference to the new Summary table
dt_summary = dt << Summary(
	Group(:ID, :SUB_ID),
	N Missing(:HEIGHT),
	Freq("None"),
	Weight("None"),
	Link to original data table(0),
	output table name("Main Table Summary")
);

//close unnecessary datatable
Close(dt, no save);

//add new column with formula
dt_summary << New Column("Percent Missing",
	Numeric,
	"Continuous",
	Format("Percent", 12, 0),
	Formula(:Name("N Missing(HEIGHT)") / :"N Rows"),
);
//rename columns
Column(dt_summary, "N Rows") << Set Name("Total No of Rows");
Column(dt_summary, "N Missing(HEIGHT)") << Set Name("No of Rows with Missing Data");

 

 

-Jarmo

Re: JSL - Find % of missing data

@jthi ,

 

We were thinking along the same lines at the same time, I like that your answer is a bit more complete than mine and agree with items 1 and 3 suggestions.

Chris Kirchberg, M.S.2
Data Scientist, Life Sciences - Global Technical Enablement
JMP Statistical Discovery, LLC. - Denver, CO
Tel: +1-919-531-9927 ▪ Mobile: +1-303-378-7419 ▪ E-mail: chris.kirchberg@jmp.com
www.jmp.com
rverma
Level III

Re: JSL - Find % of missing data

Thank you for the response. I did not know until now that N Missing could be used to find the missing data rows. It made the job much easier and reduced few steps in the script.
txnelson
Super User

Re: JSL - Find % of missing data

  1. You need to add
    Names Default To Here( 1 );
    to limit the variables specified in your script to not be changed by variables in other scripts by the same name, changing the values in your current script 
  2. It appears that you have a misunderstanding about the Current Data Table() function.  It can be used in 2 different ways and will help you with not loosing the reference to your original data table.  The specification
    dt = Current Data Table();
    you have been using properly.  And when later in your script, you want to make the original data table the "current data table", all you have to do is
    current data table(dt);
      This will tell JMP to use "dt" as the current data table.  What you need to refrain from doing is reassigning "dt" by placing it on the left side of an "=".
  3. Sometimes it is not a good thing to combine actions in a single statement.  Your
    dt_missing = dt << select where( Is Missing( :HEIGHT ) == 1 ) << Subset(Output Table Name("Missing Rows"));
    is expecting to select rows and subset the table all in one statement.  The issue is, that it will fail in giving you what you want if there are not any missing values of :Height.  In my rework of your script(below) the dt data table has the rows selected or not, and then the check is made to see if missing rows have been found, and if so then and only then continue processing
    dt << select where( Is Missing( :HEIGHT ) == 1 ); 
    
    If( N Rows( dt << get selected rows ) > 0,
  4. The Try() function is very handy in running functions that could possibly fail, and if they do, you don't want the script to stop running.  The Close() functions you are using and great for using the Try() function
    Try( Close( dt_missing, nosave ) );
    Try( Close( dt_summary, nosave ) );

I hope this is helpful.....if you have more questions please respond back

Names Default To Here( 1 );

// Read the original table 
dt = Current Data Table();
//main_table = Data Table( dt ) << get name; // get the table name

// Create a summary table with no of rows grouped by ID and SUB_ID
dt_summary = dt << Summary(
	Group( :ID, :SUB_ID ),
	Freq( "None" ),
	Weight( "None" ),
	output table name( "Main Table Summary" )
);
:N Rows << Set Name( "Total No of Rows" );

// Create a subset of the original table with only the missing rows
//Current Data Table(Data Table(main_table));
//dt = CurrentDataTable();
dt << select where( Is Missing( :HEIGHT ) == 1 ); 

If( N Rows( dt << get selected rows ) > 0,
	dt_missing = dt << Subset( Output Table Name( "Missing Rows" ), selected rows( 1 ) );
	dt_missing_summary = Data Table( dt_missing ) <<
	Summary(
		Group( :ID, :SUB_ID ),
		Freq( "None" ),
		Weight( "None" ),
		output table name( "Missing Rows Table Summary" )
	);
	:N Rows << Set Name( "No of Rows with Missing Data" );
		
	dt_missing_summary << Update(
		With( Data Table( dt_summary ) ),
		Match Columns( :ID = ID, :SUB_ID = :SUB_ID ),
		Add Columns from Update table( :Total No of Rows )
	);
		
	New Column( "PercentMissing",
		Numeric,
		"Continuous",
		Format( "Percent", 12, 2 ),
		Formula( :No of Rows with Missing Data / :Total No of Rows )
	);
	
);
Try( Close( dt_missing, nosave ) );
Try( Close( dt_summary, nosave ) );

Jim
rverma
Level III

Re: JSL - Find % of missing data

Thank you for the step by step explanation. Step 1, 2 and 4 are the new tricks I learnt and are very helpful. No wonder why my script was failing if there was no missing data. I will remember to split the statement when applying multiple conditions.

Re: JSL - Find % of missing data

PLEASE RE-READ if you saw this earlier. There was a mistake.

 

Using JMP's built-in function, Col N Missing ( ), simplifies this task. I will show 2 approaches.

 

SOLUTION #1:

  • Create a column to count missings by group in the first table.
  • Summarize the first table.
  • Change the name of the N Rows column (if you want) and make a formula column for the ratio.

 

Names Default To Here( 1 );

//these 2 lines just create a table with overall missing rate of 35% for the weight varable.
dt = Open( "$Sample_Data\Car Physical Data.jmp" );
dt:weight[Loc( J( 116, 1, Random Binomial( 1, .35 ) ) == 0 )] = .;

//add a formula column to compute n missing rows.
dt << New Column( "Missing Rows", formula( Col N Missing( :weight, :country, :type ) ) );

//create summary table, rename N Rows column, and add a % missing formula column.
dtSum = dt << Summary( Group( :Country, :Type, :missing rows));
dtSum:N Rows << set name("Total Rows");
dtSum << new column ("Pct Missing", formula(:missing rows / :total rows));

 

SOLUTION #2:

  • Create all of the formula columns in the first table, then take a straight summary.
  • Optionally, delete the N Rows column from the Summary table.

** The "gotcha" in this method is in the Total Rows ( ) function. You need to lead off with a "1" for the first argument, to serve as a dummy column, because the first argument cannot be used as a grouping column. Omit the 1, and you'll see that the counts reflect the number of rows in the "type" group only... not the combination of country and type.

 

Cheers, and apologies for the error the first time through.

 

Names Default To Here( 1 );

//don't worry about these; these are just to get a table present with some missing values @ an
//overall missing rate of 35%
dt = Open( "$Sample_Data\Car Physical Data.jmp" );
dt:weight[Loc( J( 116, 1, Random Binomial( 1, .35 ) ) == 0 )] = .;

//create formula columns using built-in functions to get what you're after.
dt << New Column( "Missing Rows", formula( Col N Missing( :weight, :country, :type ) ) );
dt << New Column( "Total Rows", formula( Col Number( 1, :country, :type ) ) );
dt << New Column( "Group Pct Missing", formula( :missing rows / :total rows ) );

//view summary table.
dtSum = dt << Summary( Group( :Country, :Type, :missing rows, :total rows, :group pct missing));

 

rverma
Level III

Re: JSL - Find % of missing data

Thank you for the simple and compact solution. It does add an extra column to the original table but that is not an issue for me.