cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
sam_t
Level III

JSL delete the entire column based on header and all rows with specific value

Hi,

 

I want to delete all columns that contain "apple" and delete all rows (regardless of columns) with specific value -9999. I had start the code to delete column but I'm not getting it.

 

Thanks.

 

 

 

dt = Open("file.csv"); 
//dt = Open( "$SAMPLE_DATA/Cities.jmp" );
col_list = dt << get column names( string );
For( i = N Items( col_list ), i > 0, i--,
	If(Contains( col_list[i], "apple" ),
		Remove From( col_list, i ),
	)
);





//dt << Select Where( );
//dt << Delete Rows; 



dt << save( "testOUT.txt" );
2 ACCEPTED SOLUTIONS

Accepted Solutions
txnelson
Super User

Re: JSL delete the entire column based on header and all rows with specific value

Here is a script that will do what you want.  A couple of points about the script:

1.  The Open() function has the "|" added as a delimiter in addition to the ",".  This was easily created by using the "Data Preview" when opening the .csv file.  It allowed for the selection of additional characters to be used as delimiters.  Then all that had to be done, is to open the file, and then go to the Source entry in the Tables Panel, and Edit it and copy out the Source Code that JMP created to open the file.

2. The Deletion of the columns containing Apple had to be generalized, since in one column it was spelled Apple and in another column it was spelled apple.

3. I really don't think you should be deleting a complete row, based upon finding an occurrence of -9999.  I really think you should be just setting the value -9999 to a missing value.  I included a piece of code that can handle your data that way, if you want.

names default to Here(1);
dt=Open(
	"file.csv",
	columns(
		New Column( "a", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "b", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "c", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "d", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "e", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "g", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "apple", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "orange", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "candy", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "pizza", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "Apple2_2", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "orange2", Numeric, "Continuous", Format( "Best", 12 ) )
	),
	Import Settings(
		End Of Line( CRLF, CR, LF ),
		End Of Field( Comma, Other( "|" ), CSV( 0 ) ),
		Strip Quotes( 1 ),
		Use Apostrophe as Quotation Mark( 0 ),
		Use Regional Settings( 0 ),
		Scan Whole File( 1 ),
		Treat empty columns as numeric( 0 ),
		CompressNumericColumns( 0 ),
		CompressCharacterColumns( 0 ),
		CompressAllowListCheck( 0 ),
		Labels( 1 ),
		Column Names Start( 1 ),
		Data Starts( 2 ),
		Lines To Read( "All" ),
		Year Rule( "20xx" )
	)
);

// Create a list to place columns to be deleted into
delList = {};

For( i = 1, i<= N Cols(dt), i++,
	If( Contains( Uppercase( as Column(  i ) << get name ), "APPLE" ) > 0,
		insert into(delList, column(i)<<get name)
	)
);

dt << delete columns( delList);

// Loop through the data and delete any rows that have
// the value -9999 in any of the columns
For( i = 1, i <= N Cols( dt ), i++,
	If( Column( dt, i ) << get datatype == "Numeric",
		dt << select where( As Column( dt, i ) == -9999 );
		If( N Rows(dt << get selected rows) > 0,
			dt << delete rows
		);
	)
);


/*// Substitute the code below for the last For Loop in the above 
// script, if what you want to do is to just change the -9999 
// values to missing values
For( i = 1, i <= N Cols( dt ), i++,
	If( Column( dt, i ) << get datatype == "Numeric",
		dt << select where( As Column( dt, i ) == -9999 );
		If( N Rows(dt << get selected rows) > 0,
			column(dt,i)[dt << get selected rows] = .;
		);
	)
);*/
Jim

View solution in original post

txnelson
Super User

Re: JSL delete the entire column based on header and all rows with specific value

It is a very simple change to the script.  All of the special options for the inputting of the data, can be eliminated.  The Open() function will just use the defaults.

Secondly, all that has to be done to the row deletions, is to search for a string value of "-9999" when the input column is determined to be character in nature.

names default to Here(1);
dt=Open(
 "file (3).csv");
// Create a list to place columns to be deleted into
delList = {};
For( i = 1, i<= N Cols(dt), i++,
 If( Contains( Uppercase( as Column(  i ) << get name ), "APPLE" ) > 0,
  insert into(delList, column(i)<<get name)
 )
);
dt << delete columns( delList);
// Loop through the data and delete any rows that have
// the value -9999 in any of the columns
For( i = 1, i <= N Cols( dt ), i++,
 If( Column( dt, i ) << get datatype == "Numeric",
  dt << select where( As Column( dt, i ) == -9999 ),
  dt << select where( Contains( As Column( dt, i ), "-9999" ) == 1 |
   Contains( As Column( dt, i ), "99" ) == 1
  )
 );
 If( N Rows( dt << get selected rows ) > 0,
  dt << delete rows
 );
);
Jim

View solution in original post

4 REPLIES 4
sam_t
Level III

Re: JSL delete the entire column based on header and all rows with specific value

My data value could have pipe data such as

-9999|123,1,1,1,1,1,2,30,40,1,10,99
1,1,1,1,1,1,2,30,40,1,10,99
1,1,1,1,1,1,2,30,40,1,10,99
1,1,1,1|87,1,-9999,2,30,40,1,10,99
txnelson
Super User

Re: JSL delete the entire column based on header and all rows with specific value

Here is a script that will do what you want.  A couple of points about the script:

1.  The Open() function has the "|" added as a delimiter in addition to the ",".  This was easily created by using the "Data Preview" when opening the .csv file.  It allowed for the selection of additional characters to be used as delimiters.  Then all that had to be done, is to open the file, and then go to the Source entry in the Tables Panel, and Edit it and copy out the Source Code that JMP created to open the file.

2. The Deletion of the columns containing Apple had to be generalized, since in one column it was spelled Apple and in another column it was spelled apple.

3. I really don't think you should be deleting a complete row, based upon finding an occurrence of -9999.  I really think you should be just setting the value -9999 to a missing value.  I included a piece of code that can handle your data that way, if you want.

names default to Here(1);
dt=Open(
	"file.csv",
	columns(
		New Column( "a", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "b", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "c", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "d", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "e", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "g", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "apple", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "orange", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "candy", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "pizza", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "Apple2_2", Numeric, "Continuous", Format( "Best", 12 ) ),
		New Column( "orange2", Numeric, "Continuous", Format( "Best", 12 ) )
	),
	Import Settings(
		End Of Line( CRLF, CR, LF ),
		End Of Field( Comma, Other( "|" ), CSV( 0 ) ),
		Strip Quotes( 1 ),
		Use Apostrophe as Quotation Mark( 0 ),
		Use Regional Settings( 0 ),
		Scan Whole File( 1 ),
		Treat empty columns as numeric( 0 ),
		CompressNumericColumns( 0 ),
		CompressCharacterColumns( 0 ),
		CompressAllowListCheck( 0 ),
		Labels( 1 ),
		Column Names Start( 1 ),
		Data Starts( 2 ),
		Lines To Read( "All" ),
		Year Rule( "20xx" )
	)
);

// Create a list to place columns to be deleted into
delList = {};

For( i = 1, i<= N Cols(dt), i++,
	If( Contains( Uppercase( as Column(  i ) << get name ), "APPLE" ) > 0,
		insert into(delList, column(i)<<get name)
	)
);

dt << delete columns( delList);

// Loop through the data and delete any rows that have
// the value -9999 in any of the columns
For( i = 1, i <= N Cols( dt ), i++,
	If( Column( dt, i ) << get datatype == "Numeric",
		dt << select where( As Column( dt, i ) == -9999 );
		If( N Rows(dt << get selected rows) > 0,
			dt << delete rows
		);
	)
);


/*// Substitute the code below for the last For Loop in the above 
// script, if what you want to do is to just change the -9999 
// values to missing values
For( i = 1, i <= N Cols( dt ), i++,
	If( Column( dt, i ) << get datatype == "Numeric",
		dt << select where( As Column( dt, i ) == -9999 );
		If( N Rows(dt << get selected rows) > 0,
			column(dt,i)[dt << get selected rows] = .;
		);
	)
);*/
Jim
sam_t
Level III

Re: JSL delete the entire column based on header and all rows with specific value

Jim,

The code works.

 

When I use the "|" pipe. It will create extra column(s).

I do not want any extra my column header in this case, it will be "12" columns total.

But in each column, I can have several pipe data. I only care the first value and ignore the rest.

 

Example:

"99|23|abc|x", I only care for "99" and ignore the rest.

"-9999|123", I only care for "-9999" and that row will get delete it later.

 

 

1	1	1	1	1	1	2	30	40	1	10	99
1	1	1	1	1	1	2	30	40	1	10	99
1	-9999|999|231	1	1	1	1	2	30	40	1	10	99
1	1	1	1	1	1	2	30	40	1	10	99
1	1|xyz|abc	1	1	1	1	2	30	40	1	10	99
1	1	1	1	1	1	2	30	40	1	10	99
1	1	1	1	1	1	2	30	40	1	10	99
1	1	1	1	1	1	2	30	40	1	10	99
1	1	1	1	1	1	-9999	30	40	1	10	99
1	1	1	1	1	1	2	30	40	1	10	99
1	1	1	1	1	1	2	30	40	1	10	99
1	1	1	1	1	1	2	30	40	1	10	99
1	99|23|abc|x	1	1	1	1	2	30	40	1	10	99
1	1	1	1	1	1	2	30	40	1	10	99
-9999|123	1	1	1	1	1	2	30	40	1	10	99
1	1	1	1	1	1	2	30	40	1	10	99
1	1	1	1	1	1	2	30	40	1	10	99
1	1	1	1|87	1	-9999	2	30	40	1	10	99
txnelson
Super User

Re: JSL delete the entire column based on header and all rows with specific value

It is a very simple change to the script.  All of the special options for the inputting of the data, can be eliminated.  The Open() function will just use the defaults.

Secondly, all that has to be done to the row deletions, is to search for a string value of "-9999" when the input column is determined to be character in nature.

names default to Here(1);
dt=Open(
 "file (3).csv");
// Create a list to place columns to be deleted into
delList = {};
For( i = 1, i<= N Cols(dt), i++,
 If( Contains( Uppercase( as Column(  i ) << get name ), "APPLE" ) > 0,
  insert into(delList, column(i)<<get name)
 )
);
dt << delete columns( delList);
// Loop through the data and delete any rows that have
// the value -9999 in any of the columns
For( i = 1, i <= N Cols( dt ), i++,
 If( Column( dt, i ) << get datatype == "Numeric",
  dt << select where( As Column( dt, i ) == -9999 ),
  dt << select where( Contains( As Column( dt, i ), "-9999" ) == 1 |
   Contains( As Column( dt, i ), "99" ) == 1
  )
 );
 If( N Rows( dt << get selected rows ) > 0,
  dt << delete rows
 );
);
Jim