cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
pcarroll1
Level IV

Convert data to numeric, Only If Appropriate!

   I work with data stored in a table as Parameter and value.  Since the value may be a character for some Parameters the data type in the database is Character.  Pulling the data into JMP with an SQL script is easy and Splitting the data by Parameter is also easy.  But now I have a table of columns where I'd like to convert some of the columns to numeric, but others I do not. I tried this but to my surprise JMP happily converted all columns to numeric, throwing out the information that was truly character.

 

Tcols = dt << Get Column Names(String);
For(i=2, i<=n items(Tcols), i++,
try(column(dt,i) << Data Type(Character);
    column(dt,i) << Modeling Type("Nominal"));
);

 

How can I do a test to convert only columns where the data is appropriately numeric.

1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: Convert data to numeric, Only If Appropriate!

Here is a simple example that tests the first 10 rows of each column, and if no non numeric values are found, it converts the column to numeric

Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA/Big Class.jmp" );

// set all columns to character
For( i = 1, i <= N Cols( dt ), i++,
	Column( i ) << set data type( character )
);

// Convert only numerics
For( i = 1, i <= N Cols( dt ), i++,
	isNumeric = "NO";
	For( theRow = 1, theRow <= 10, theRow++,
		If( Column( i )[theRow] != "" & Is Missing( Num( Column( i )[theRow] ) ) == 1,
			isNumeric = "NO";
			Break();
		,
			isNumeric = "YES"
		)
	);
	If( isNumeric == "YES",
		Column( i ) << set data type( numeric )
	);
);
Jim

View solution in original post

5 REPLIES 5
txnelson
Super User

Re: Convert data to numeric, Only If Appropriate!

Here is a simple example that tests the first 10 rows of each column, and if no non numeric values are found, it converts the column to numeric

Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA/Big Class.jmp" );

// set all columns to character
For( i = 1, i <= N Cols( dt ), i++,
	Column( i ) << set data type( character )
);

// Convert only numerics
For( i = 1, i <= N Cols( dt ), i++,
	isNumeric = "NO";
	For( theRow = 1, theRow <= 10, theRow++,
		If( Column( i )[theRow] != "" & Is Missing( Num( Column( i )[theRow] ) ) == 1,
			isNumeric = "NO";
			Break();
		,
			isNumeric = "YES"
		)
	);
	If( isNumeric == "YES",
		Column( i ) << set data type( numeric )
	);
);
Jim
pcarroll1
Level IV

Re: Convert data to numeric, Only If Appropriate!

Thanks. This worked.
And I copied in the wrong code above. It should have been.

Tcols = dt << Get Column Names(String);

For(i=2, i<=n items(Tcols), i++,

try(column(dt,i) << Data Type(Numeric);

column(dt,i) << Modeling Type("Continuous"));

);
pcarroll1
Level IV

Re: Convert data to numeric, Only If Appropriate!

For the most part it worked, but upon further inspection, there is a case where strange things happen.
I have some codes that are numbers with dashes in them.  I would expect them to remain as character.

But this is what JMP does. 

num("100-0001-01") results in -56928614400

I do not know how it comes up with that, but because it's not missing the code accepts this as a number.

vince_faller
Super User (Alumni)

Re: Convert data to numeric, Only If Appropriate!

It's treating it as a date

MDYHMS(num("100-0001-01"))
//returns
//"01/01/0100 00:00:00"

you could maybe try to parse the characters into a matrix with quotes replaced with empty strings

 

names default to here(1);
dt = New Table( "Test",
	Add Rows( 4 ),
	New Column( "Column 1",
		Character,
		"Nominal",
		Set Values( {"A", "B", "C", "D"} )
	),
	New Column( "Column 2",
		Character,
		"Nominal",
		Set Values( {"100-0001-01", "2", "4", "8"} )
	),
	New Column( "Column 3",
		Character,
		"Nominal",
		Set Selected,
		Set Values( {"1", "2", "3", "4"} )
	)
);

for(i=1, i<=ncols(dt), i++, 
	col = Column(dt, i);
	values = col << Get Values;
	Try(
		Parse(Substitute(char(values), "\!"", "", "{", "[", "}", "]"));
		col << Data Type("Numeric") << Modeling Type("Continuous");
	);
)
Vince Faller - Predictum
pcarroll1
Level IV

Re: Convert data to numeric, Only If Appropriate!

Vince,

     Thanks for the insight.  I guess it's a tricky thing because depending upon the situation some characters can have different meanings (e.g. "-" or "e") .  Your code worked as you wrote it, but it failed on my set because of missing values.  Those could be handled by more arguments in the Substitute command to find and eliminate the string "", .

   Parse(Substitute(char(values), "\!"\!"\!, ", "", "\!"", "", "{", "[", "}", "]"));

     However all of my Column 2 are values that could be interpretted as dates so this code also interpretted them as such.

 

names default to here(1);
dt = New Table( "Test",
	Add Rows( 4 ),
	New Column( "Column 1",
		Character,
		"Nominal",
		Set Values( {"A", "B", "C", "D"} )
	),
	New Column( "Column 2",
		Character,
		"Nominal",
		Set Values( {"100-0001-01", "100-0001-01", "100-0001-01", "100-0001-01"} )
	),
	New Column( "Column 3",
		Character,
		"Nominal",
		Set Selected,
		Set Values( {"1", "2", "", "4"} )
	)
);

dt = current data table();
for(i=1, i<=ncols(dt), i++, 
	col = Column(dt, i);
	values = col << Get Values;
	Try(
		Parse(Substitute(char(values), "\!"\!"\!, ", "", "\!"", "", "{", "[", "}", "]"));
		col << Data Type("Numeric") << Modeling Type("Continuous");
	);
);

 

 

     Although not perfect, I can go with this or with txnelson's code because the column that it converts as a date is not important to me.  If I knew that it was being interpretted as a date, I could make that an exception.  I don't have dates in this set of data.

Pat