Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
Choose Language Hide Translation Bar
Highlighted
Ducky
Level I

How do I delete rows that are missing data in all columns of my dataset?

Apologies for how basic my question is, I am just starting to use JMP for the first time. 

 

I have a massive dataset (14 columns x 20,000 rows) and I am trying to write a script that will identify and delete all rows that that do not have either a 0 or 1 for a specific range of columns (columns 4-14). This dataset is from an even larger dataset, so there are some rows in my subset that are completely empty. Here's an example of what I was hoping to do:

 

- - - - - 
0 0 1 - 0
- - - - -
1 - - - -

becomes:

 

 

0 0 1 - 0
1 - - - -

 

 

 

I've started writing the code, but I'm stuck–I'm not sure what to do after the For Each Row. I was hoping that this script would look at one row at a time, and if a particular cell did not have a 0 or a 1, it would look for a 0 or a 1 in the next column, and if it got to the end of the row with no 0s or 1s, then it would delete that entire row. Here is my code so far:

 

dt=Open("File Name")

colnames = dt << get column names( numeric,string );
For( i = 4, i <= N Items( colnames ), i++,
	For Each Row(If(:colnames != 0 | :colnames != 1), Continue()
		
	
	)
	
	
);

I was thinking about using

<< Delete Rows

, which I've seen on some other questions on this forum. If this is too complicated with a script, is there a better way to do this?

 

5 REPLIES 5
Highlighted

Re: How do I delete rows that are missing data in all columns of my dataset?

Does this need to be a script? 

If you find just one row that is missing those specific cells, select the missing cells.

Now go to the Rows menu and select Row Selection > Select Matching Cells.

Now every row that is missing data in those columns will be selected. Choose Rows > Delete Rows.

Dan Obermiller
Highlighted
txnelson
Super User

Re: How do I delete rows that are missing data in all columns of my dataset?

I think you need to look at the Missing Data Pattern platform:

     Tables=>Missing Data Pattern

From your descripting, it sounds like it will meet your needs.  It will analyze the columns you specify and then from the data table it produces, you can select the columns will all missing values, and then delete them.

Jim
Highlighted
ThuongLe
Level IV

Re: How do I delete rows that are missing data in all columns of my dataset?

or if you want to script it, this is my suggestion:

dt << New Column("Selected", Character, Nominal, Formula(
   If(Is Missing(:Column4) | Is Missing(:Column5)|...until column14,
        "Yes",
        "No"
   )
));

dt << Select Where(:Selected == "Yes");
dt << Delete Rows;
Highlighted

Re: How do I delete rows that are missing data in all columns of my dataset?

This script will take care of your request. It is not entirely unique of the other replies.

 

Names Default to Here( 1 );

// get reference to target data table
dt = Current Data Table();

// get all the columns for missing value check
cols = dt << Get Column Names;

// obtain missing value patterns
missing = Current Data Table() << Missing Data Pattern(
	Columns( Eval( cols ) ),
	Output Table( "Missing Data Pattern" )
);

// make character string representing missing values in all columns
n missing = N Items( cols );
all missing = Repeat( "1", n missing );

// find the row of patterns where all columns are missing.
missing << Select Where( :Patterns == all missing );

// delete the linked rows in the original table
dt << Delete Rows;

// clean up
Close( missing, No Save );
Learn it once, use it forever!
Highlighted
ms
Super User ms
Super User

Re: How do I delete rows that are missing data in all columns of my dataset?

As shown by earlier replies, there are often many ways to perform a task in JMP.

 

If all columns 4-14 are numeric, this script should also work to delete rows with missing values only.

 

dt = Current Data Table();

dt << Delete Rows(Loc(Is Missing(V Mean(dt[0, 4 :: 14]`))));
Article Labels