Choose Language Hide Translation Bar
Highlighted

## Convert data to numeric, Only If Appropriate!

I work with data stored in a table as Parameter and value.  Since the value may be a character for some Parameters the data type in the database is Character.  Pulling the data into JMP with an SQL script is easy and Splitting the data by Parameter is also easy.  But now I have a table of columns where I'd like to convert some of the columns to numeric, but others I do not. I tried this but to my surprise JMP happily converted all columns to numeric, throwing out the information that was truly character.

``````Tcols = dt << Get Column Names(String);
For(i=2, i<=n items(Tcols), i++,
try(column(dt,i) << Data Type(Character);
column(dt,i) << Modeling Type("Nominal"));
);``````

How can I do a test to convert only columns where the data is appropriately numeric.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

## Re: Convert data to numeric, Only If Appropriate!

Here is a simple example that tests the first 10 rows of each column, and if no non numeric values are found, it converts the column to numeric

``````Names Default To Here( 1 );
dt = Open( "\$SAMPLE_DATA/Big Class.jmp" );

// set all columns to character
For( i = 1, i <= N Cols( dt ), i++,
Column( i ) << set data type( character )
);

// Convert only numerics
For( i = 1, i <= N Cols( dt ), i++,
isNumeric = "NO";
For( theRow = 1, theRow <= 10, theRow++,
If( Column( i )[theRow] != "" & Is Missing( Num( Column( i )[theRow] ) ) == 1,
isNumeric = "NO";
Break();
,
isNumeric = "YES"
)
);
If( isNumeric == "YES",
Column( i ) << set data type( numeric )
);
);``````
Jim
5 REPLIES 5
Highlighted

## Re: Convert data to numeric, Only If Appropriate!

Here is a simple example that tests the first 10 rows of each column, and if no non numeric values are found, it converts the column to numeric

``````Names Default To Here( 1 );
dt = Open( "\$SAMPLE_DATA/Big Class.jmp" );

// set all columns to character
For( i = 1, i <= N Cols( dt ), i++,
Column( i ) << set data type( character )
);

// Convert only numerics
For( i = 1, i <= N Cols( dt ), i++,
isNumeric = "NO";
For( theRow = 1, theRow <= 10, theRow++,
If( Column( i )[theRow] != "" & Is Missing( Num( Column( i )[theRow] ) ) == 1,
isNumeric = "NO";
Break();
,
isNumeric = "YES"
)
);
If( isNumeric == "YES",
Column( i ) << set data type( numeric )
);
);``````
Jim
Highlighted

## Re: Convert data to numeric, Only If Appropriate!

Thanks. This worked.
And I copied in the wrong code above. It should have been.

``````Tcols = dt << Get Column Names(String);

For(i=2, i<=n items(Tcols), i++,

try(column(dt,i) << Data Type(Numeric);

column(dt,i) << Modeling Type("Continuous"));

);``````
Highlighted

## Re: Convert data to numeric, Only If Appropriate!

For the most part it worked, but upon further inspection, there is a case where strange things happen.
I have some codes that are numbers with dashes in them.  I would expect them to remain as character.

But this is what JMP does.

num("100-0001-01") results in -56928614400

I do not know how it comes up with that, but because it's not missing the code accepts this as a number.

Highlighted

## Re: Convert data to numeric, Only If Appropriate!

It's treating it as a date

``````MDYHMS(num("100-0001-01"))
//returns
//"01/01/0100 00:00:00"
``````

you could maybe try to parse the characters into a matrix with quotes replaced with empty strings

``````names default to here(1);
dt = New Table( "Test",
New Column( "Column 1",
Character,
"Nominal",
Set Values( {"A", "B", "C", "D"} )
),
New Column( "Column 2",
Character,
"Nominal",
Set Values( {"100-0001-01", "2", "4", "8"} )
),
New Column( "Column 3",
Character,
"Nominal",
Set Selected,
Set Values( {"1", "2", "3", "4"} )
)
);

for(i=1, i<=ncols(dt), i++,
col = Column(dt, i);
values = col << Get Values;
Try(
Parse(Substitute(char(values), "\!"", "", "{", "[", "}", "]"));
col << Data Type("Numeric") << Modeling Type("Continuous");
);
)``````
Vince Faller - Predictum
Highlighted

## Re: Convert data to numeric, Only If Appropriate!

Vince,

Thanks for the insight.  I guess it's a tricky thing because depending upon the situation some characters can have different meanings (e.g. "-" or "e") .  Your code worked as you wrote it, but it failed on my set because of missing values.  Those could be handled by more arguments in the Substitute command to find and eliminate the string "", .

Parse(Substitute(char(values), "\!"\!"\!, ", "", "\!"", "", "{", "[", "}", "]"));

However all of my Column 2 are values that could be interpretted as dates so this code also interpretted them as such.

``````names default to here(1);
dt = New Table( "Test",
New Column( "Column 1",
Character,
"Nominal",
Set Values( {"A", "B", "C", "D"} )
),
New Column( "Column 2",
Character,
"Nominal",
Set Values( {"100-0001-01", "100-0001-01", "100-0001-01", "100-0001-01"} )
),
New Column( "Column 3",
Character,
"Nominal",
Set Selected,
Set Values( {"1", "2", "", "4"} )
)
);

dt = current data table();
for(i=1, i<=ncols(dt), i++,
col = Column(dt, i);
values = col << Get Values;
Try(
Parse(Substitute(char(values), "\!"\!"\!, ", "", "\!"", "", "{", "[", "}", "]"));
col << Data Type("Numeric") << Modeling Type("Continuous");
);
);``````

Although not perfect, I can go with this or with txnelson's code because the column that it converts as a date is not important to me.  If I knew that it was being interpretted as a date, I could make that an exception.  I don't have dates in this set of data.

Pat

Article Labels

There are no labels assigned to this post.