cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
0 Kudos

efficient data cleaning- for simple replacements

Actually I came on the community to see if there was already something new about this, still not certain whether I'm missing something now built-in...

I want to, using JSL, replace certain values (like -999 or >998) with missing across many columns (possibly all numeric columns). It's not hard to script this up. nice example of what I mean and what I'm currently doing here:

https://community.jmp.com/t5/Discussions/Deleting-cell-values/td-p/46957

 

But (1) this seems like something that could be fairly common so possibly worth it's own built-in function (or ability to script the recode ability)  and (2) it's not lightning fast on large data so if it's possible maybe a more resource efficient underlying method could be part of such a solution.(?)

 

3 Comments
gzmorgan0
Super User (Alumni)

A wish list item requested a while ago was to enable column properties be assigned to a defined table group column( a ist of columns).

 

An alternative that is pretty simple is to select the columns needing "cleaning". Copy to the clipboard a missing value code property, then send the the JSL message Paste Multi Columns Properties to the data table. The attached script shows a matrix method for replacing values without a for-loop, however, I find the Missing Values Code more useful: the database (original) data values are not modified, more than one value can be set, and original values can be recovered.

Names Default to Here(1);

//create example table
dt = New Table( "example",
	Add Rows( 3 ),
	New Column( "A",
		Character,
		"Nominal",
		Set Values( {"", "", ""} )
	),
	New Column( "B",
		Character,
		"Nominal",
		Set Values( {"", "", ""} )
	),
	New Column( "C",
		Character,
		"Nominal",
		Set Values( {"", "", ""} )
	),
	New Column( "x_1",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [1, 2, 8] )
	),
	New Column( "x_2",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [8, 8, 8] )
	),
	New Column( "x_3",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [2, 8, 2] )
	),
	New Column( "x_4",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [7, 8, 5] )
	),
	New Column( "x_5",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [2, 8, 3] )
	),
	New Column( "x_6",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [2, 8, 8] )
	),
	New Column( "E",
		Character,
		"Nominal",
		Set Values( {"", "", ""} )
	)
);

// this replaces values 
cnme = dt << get column names(numeric);
dt << select Columns(cnme);
mat = dt << get as matrix;
mat[loc(mat==8)]=empty();
dt[1::nrow(dt),cnme]=mat;

//Close dt and rerun the code to create dt
set clipboard("{Add Column Properties(Set Property(\!"Missing Value Codes\!", {8}))}");
dt << Paste Multi Columns Properties;
//The value of 8 appears, but is treated as missing.
//There is a request that column property settings can be assigned to a column group 
XanGregg
Staff

In case it's faster, data table subscripting is also an option:

dt = open("$SAMPLE_DATA/Big Class.jmp");
r = loc(dt[0, 4]==61)
dt[r, 4] = .;
Ryan_Gilmore
Community Manager
Status changed to: Archived
We are archiving this request. If this is still important please comment with additional details and we will reopen. Thank you!