Subscribe Bookmark RSS Feed

Selecting cells with empty data

steven_w_daniel

Community Trekker

Joined:

Jun 25, 2015

I have a table that looks like the table below:

     

     

operatorM typedescriptiondescription 2M1M2M3M4M5M6
415NxyzA hotHot clean 1415415415415415415
615NxyzB coldCold clean 1615615615 615615
815NxyzA hotHot clean 2815815815815815815
1015NxyzB coldCold clean 21015 1015101510151015
1215NxyzA hotHot clean 1121512151215121512151215
1415NxyzB coldCold clean 114151415141514151415
1615NxyzA hotHot clean 2161516151615161516151615
1815NxyzB coldCold clean 2181518151815 1815

I want to select only the rows that is empty. Since the columns M1-M6 can vary each time I pull the data I can't use the column names in the JSL script. I'm trying to use a FOR loop below to select the rows of empty data, but this only selects the rows where the last cell is empty.


dt = Current Data Table();

nc = dt << get column names( Numeric );

For( i = 1, i <= N Items( nc ), i++,

dt <<select where( Is Missing( nc[] )));

Eventually I want to get the output table to look like the following


    

operatorM typedescriptiondescription 2Machine
615NxyzB coldCold clean 1M4
1015NxyzB coldCold clean 2M2
1415NxyzB coldCold clean 1M6
1815NxyzB coldCold clean 2M4
1815NxyzB coldCold clean 2M5

The big trick to this is to do this without coding in the column names for (M1,M2 etc...) as they are going to change.

Any Ideas

1 ACCEPTED SOLUTION

Accepted Solutions
Solution

If I'm reading your message correctly, I think you've actually got two questions.

  1. How to find the rows that are all missing for a given set of columns.
  2. How to make your code not dependent on the names of those columns and how many there are.

Not surprisingly there are lots of different ways to do 1. PMroz , MS and Craige@JMP have given you some options there.

I'll throw in one more. You could use the Missing Data Pattern utility to find them.


dt = Open( "$SAMPLE_DATA\Cities.jmp" );



//delete some columns from this table to make it fit your example


dt << delete columns( "PM10", "X", "Y" );



//change name of data table to ensure this modified version doesn't get saved


//over the sample data


dt << set name( "CitiesModified" );



//I want to use the columns starting at column 9,


//so I build a list of references to them



mycols = {};



For( i = 9, i <= N Cols( dt ), i++,


  Insert Into( mycols, Column( dt, i ) )


);




mdp = Dt << Missing Data Pattern( columns( Eval( mycols ) ), Output Table( "Missing Data Pattern" ) );



//The patterns columns shows the pattern of missing values


//the Missing Data Pattern data table is sorted,


//so if there are rows with all missing data


//the last row of the Missing Data Pattern data table will be all 1s.


//The Repeat() function just builds a string of 1s for as many columns as I'm working with


If( Column( mdp, "Patterns" )[N Rows( mdp )] == Repeat( "1", N Items( mycols ) ),



//The Missing Data Pattern data table is linked to the source table so selecting a row


  //in it, selects all the rows that it represents in the source table


  mdp << Select Rows( N Rows( mdp ) );



//now I can delete the rows in the source table


  dt << delete rows;


,


  Show( "No all missing rows" )


);


I've tried to show how to make this code work for any number of columns. In this case, because I needed a list of columns to pass to the Missing Data Pattern utility, I built a list based on the column numbers.

Even having written all of this and having shown how to use the Missing Data Pattern Utility, I think my favorite solution here would be PMroz's to use Stack Columns. You really want to end up with the data stacked anyway, so doing that first makes it easy to simply select the missing rows in the stacked column and delete them.


dt = Open( "$SAMPLE_DATA\Cities.jmp" );



//delete some columns from this table to make it fit your example


dt << delete columns( "PM10", "X", "Y" );



//change name of data table to ensure this modified version doesn't get saved


//over the sample data


dt << set name( "CitiesModified" );



//again I need a list of columns to use in the Stack()



mycols = {};



For( i = 9, i <= N Cols( dt ), i++,


  Insert Into( mycols, Column( dt, i ) )


);



//stack those columns creating a new data table


stacked_dt = dt << Stack(


  columns( eval(mycols) ),


  Source Label Column( "Gas" ),


  Stacked Data Column( "PPM" )


);



missing_rows=stacked_dt<<get rows where (is missing(:PPM));


stacked_dt<<delete rows(missing_rows);


-Jeff

-Jeff
7 REPLIES
pmroz

Super User

Joined:

Jun 23, 2011

You can use the STACK command to stack your data, including the values of M1-Mn .  Then select rows that are empty and delete them.

Or try this code:

dt = Current Data Table();

col_list = dt << get column names( Numeric, string );

where_clause = "";

For( i = 1, i <= N Items( nc ), i++,

     one_col = col_list[i];

     if (i == 1,

     // then

           where_clause =evalinsert("\[dt << select where(is missing(as column(dt, "^one_col^"))]\");

     // else

           ,

           where_clause = evalinsert("\[^where_clause^ | is missing(as column(dt, "^one_col^"))]\");

     );

);

where_clause = where_clause || ")";

eval(parse(where_clause));

Craige_Hales

Staff

Joined:

Mar 21, 2013

Same answer as PMroz using expression manipulation:


dt = Current Data Table();


e = Expr( Or() ); // the OR of the ismissing(columns) will be built here


nc = dt << get column names( Numeric );


For( i = 1, i <= N Items( nc ), i++,


    Insert Into( e, Eval Expr( Is Missing( Expr( nc ) ) ) )


);


dt << selectwhere( e );


the expression "e" looks like

nameexpr(e)

/*:

Is Missing( :m1 ) | Is Missing( :m2 ) | Is Missing( :m3 ) | Is Missing( :m4 ) | Is Missing( :m5 ) | Is Missing( :m6 )

Head Arg Recurse Formula

Introspection

Craige
pmroz

Super User

Joined:

Jun 23, 2011

Very slick.  Didn't know you could insert into an expression (e).  I thought insert into needed a list.

ms

Super User

Joined:

Jun 23, 2011

Yet another way to select rows with empty cells:

dt << select where(!Contains(Loc Nonmissing(dt << get as matrix), Row()));

pmroz

Super User

Joined:

Jun 23, 2011

MS that's even slicker - now let's see you do it in 0 lines of code!  Ha ha just goes to show there are several ways to do the same thing in JSL.

Solution

If I'm reading your message correctly, I think you've actually got two questions.

  1. How to find the rows that are all missing for a given set of columns.
  2. How to make your code not dependent on the names of those columns and how many there are.

Not surprisingly there are lots of different ways to do 1. PMroz , MS and Craige@JMP have given you some options there.

I'll throw in one more. You could use the Missing Data Pattern utility to find them.


dt = Open( "$SAMPLE_DATA\Cities.jmp" );



//delete some columns from this table to make it fit your example


dt << delete columns( "PM10", "X", "Y" );



//change name of data table to ensure this modified version doesn't get saved


//over the sample data


dt << set name( "CitiesModified" );



//I want to use the columns starting at column 9,


//so I build a list of references to them



mycols = {};



For( i = 9, i <= N Cols( dt ), i++,


  Insert Into( mycols, Column( dt, i ) )


);




mdp = Dt << Missing Data Pattern( columns( Eval( mycols ) ), Output Table( "Missing Data Pattern" ) );



//The patterns columns shows the pattern of missing values


//the Missing Data Pattern data table is sorted,


//so if there are rows with all missing data


//the last row of the Missing Data Pattern data table will be all 1s.


//The Repeat() function just builds a string of 1s for as many columns as I'm working with


If( Column( mdp, "Patterns" )[N Rows( mdp )] == Repeat( "1", N Items( mycols ) ),



//The Missing Data Pattern data table is linked to the source table so selecting a row


  //in it, selects all the rows that it represents in the source table


  mdp << Select Rows( N Rows( mdp ) );



//now I can delete the rows in the source table


  dt << delete rows;


,


  Show( "No all missing rows" )


);


I've tried to show how to make this code work for any number of columns. In this case, because I needed a list of columns to pass to the Missing Data Pattern utility, I built a list based on the column numbers.

Even having written all of this and having shown how to use the Missing Data Pattern Utility, I think my favorite solution here would be PMroz's to use Stack Columns. You really want to end up with the data stacked anyway, so doing that first makes it easy to simply select the missing rows in the stacked column and delete them.


dt = Open( "$SAMPLE_DATA\Cities.jmp" );



//delete some columns from this table to make it fit your example


dt << delete columns( "PM10", "X", "Y" );



//change name of data table to ensure this modified version doesn't get saved


//over the sample data


dt << set name( "CitiesModified" );



//again I need a list of columns to use in the Stack()



mycols = {};



For( i = 9, i <= N Cols( dt ), i++,


  Insert Into( mycols, Column( dt, i ) )


);



//stack those columns creating a new data table


stacked_dt = dt << Stack(


  columns( eval(mycols) ),


  Source Label Column( "Gas" ),


  Stacked Data Column( "PPM" )


);



missing_rows=stacked_dt<<get rows where (is missing(:PPM));


stacked_dt<<delete rows(missing_rows);


-Jeff

-Jeff
steven_w_daniel

Community Trekker

Joined:

Jun 25, 2015

Thanks everyone.....Everything worked except the Loc Nonmissing.....I only have JMP9 at the moment (too lazy to upgrade) and that feature doesn't work in JMP9......I'm still a JSL newbie.....Thanks again