cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
  • Learn how to build custom Python data connectors and further customize JMP’s Data Connector Framework with the Python Data Connector Demo, available now in the JMP Marketplace!
  • See how to create experiments to support product design and ID useful product features. Register for June 12 webinar, 2pm US Eastern Time.

JMP Wish List

We want to hear your ideas for improving JMP. Share them here.
Choose Language Hide Translation Bar
0 Kudos

efficient data cleaning- for simple replacements

Actually I came on the community to see if there was already something new about this, still not certain whether I'm missing something now built-in...

I want to, using JSL, replace certain values (like -999 or >998) with missing across many columns (possibly all numeric columns). It's not hard to script this up. nice example of what I mean and what I'm currently doing here:

https://community.jmp.com/t5/Discussions/Deleting-cell-values/td-p/46957

 

But (1) this seems like something that could be fairly common so possibly worth it's own built-in function (or ability to script the recode ability)  and (2) it's not lightning fast on large data so if it's possible maybe a more resource efficient underlying method could be part of such a solution.(?)

 

3 Comments
gzmorgan0
Super User (Alumni)

A wish list item requested a while ago was to enable column properties be assigned to a defined table group column( a ist of columns).

 

An alternative that is pretty simple is to select the columns needing "cleaning". Copy to the clipboard a missing value code property, then send the the JSL message Paste Multi Columns Properties to the data table. The attached script shows a matrix method for replacing values without a for-loop, however, I find the Missing Values Code more useful: the database (original) data values are not modified, more than one value can be set, and original values can be recovered.

Names Default to Here(1);

//create example table
dt = New Table( "example",
	Add Rows( 3 ),
	New Column( "A",
		Character,
		"Nominal",
		Set Values( {"", "", ""} )
	),
	New Column( "B",
		Character,
		"Nominal",
		Set Values( {"", "", ""} )
	),
	New Column( "C",
		Character,
		"Nominal",
		Set Values( {"", "", ""} )
	),
	New Column( "x_1",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [1, 2, 8] )
	),
	New Column( "x_2",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [8, 8, 8] )
	),
	New Column( "x_3",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [2, 8, 2] )
	),
	New Column( "x_4",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [7, 8, 5] )
	),
	New Column( "x_5",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [2, 8, 3] )
	),
	New Column( "x_6",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [2, 8, 8] )
	),
	New Column( "E",
		Character,
		"Nominal",
		Set Values( {"", "", ""} )
	)
);

// this replaces values 
cnme = dt << get column names(numeric);
dt << select Columns(cnme);
mat = dt << get as matrix;
mat[loc(mat==8)]=empty();
dt[1::nrow(dt),cnme]=mat;

//Close dt and rerun the code to create dt
set clipboard("{Add Column Properties(Set Property(\!"Missing Value Codes\!", {8}))}");
dt << Paste Multi Columns Properties;
//The value of 8 appears, but is treated as missing.
//There is a request that column property settings can be assigned to a column group 
XanGregg
Staff

In case it's faster, data table subscripting is also an option:

dt = open("$SAMPLE_DATA/Big Class.jmp");
r = loc(dt[0, 4]==61)
dt[r, 4] = .;
Ryan_Gilmore
Community Manager
Status changed to: Archived
We are archiving this request. If this is still important please comment with additional details and we will reopen. Thank you!