cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
The Discovery Summit 2025 Call for Content is open! Submit an abstract today to present at our premier analytics conference.
Choose Language Hide Translation Bar
View Original Published Thread

Set Column Data Type for JSON Panda Split

robot
Level VI

I am using JSL to import a JSON file using a Panda Split format.  When I do this, all columns are imported as Character, when in practice they should be a combination of Character and Numeric.  I can write a JSL script convert each to Numeric, but this will also convert the Character columns (and delete their data).

 

Using JSL, can I detect if a Data Type should be Character or Numeric and convert it accordingly?  Or is there a way to do this during the import step?  I do not know the columns for a specific file beforehand.

 

JMP 18.1.1

 

dt_cols = dt << Get Column Names();
For Each({col, i}, dt_cols,	
	Try(
		// No warning, just converts everything.
		As Column(dt, col) << Data Type("Numeric");
		As Column(dt, col) << Modeling Type( "Continuous" );
	);
);
2 ACCEPTED SOLUTIONS

Accepted Solutions
jthi
Super User


Re: Set Column Data Type for JSON Panda Split

I don't have proper example to test this out (JMP just crashes with my JSON example I quickly got) so I cannot test if there is a way to do it in JSON Wizard.

 

You can convert just numeric values by checking values from the column first, and then convert them accordingly.

One way of performing the conversion is to first check some values of the column and then convert if they are OK. Few options you could check

  • Just check first value
  • Check first and last value
  • Check all values
  • Check random subset of values

Nowadays I would do the check using Num(<<Restrict) (earlier wasn't possible as it converted too much). If the value isn't missing (unless it was missing already before conversion) it should be valid to be changed to numeric -> change data type of column. I think you can also use InFormat() for checking the "best" match (might be even better than Num()).

-Jarmo

View solution in original post

robot
Level VI


Re: Set Column Data Type for JSON Panda Split

Here is a JSL implementation for reference:

 

rows_to_check_for_conversion = 100;

// Get column names.
dt_cols = dt << Get Column Names();

// Change data types.
For Each( {c, i}, dt_cols, 
	
	non_missing_rows = dt << Get Rows Where( !Is Missing( As Column( dt, c ) ) );
	
	row_check = 0;
	convert_to_number = 1;
	While( row_check < rows_to_check_for_conversion & row_check < N Items( non_missing_rows ) & convert_to_number == 1,
		row_check = row_check + 1;
		r = non_missing_rows[row_check];
		val = Num( dt:c[r] );
		If( Is Missing( val ),
			convert_to_number = 0
		);
	);
	
	If( convert_to_number == 1,
		As Column( dt, c ) << Data Type( "Numeric" );
		As Column( dt, c ) << Modeling Type( "Continuous" );
	);
);

View solution in original post

4 REPLIES 4
jthi
Super User


Re: Set Column Data Type for JSON Panda Split

I don't have proper example to test this out (JMP just crashes with my JSON example I quickly got) so I cannot test if there is a way to do it in JSON Wizard.

 

You can convert just numeric values by checking values from the column first, and then convert them accordingly.

One way of performing the conversion is to first check some values of the column and then convert if they are OK. Few options you could check

  • Just check first value
  • Check first and last value
  • Check all values
  • Check random subset of values

Nowadays I would do the check using Num(<<Restrict) (earlier wasn't possible as it converted too much). If the value isn't missing (unless it was missing already before conversion) it should be valid to be changed to numeric -> change data type of column. I think you can also use InFormat() for checking the "best" match (might be even better than Num()).

-Jarmo
robot
Level VI


Re: Set Column Data Type for JSON Panda Split

Yes, that works.  Thanks Jarmo.

robot
Level VI


Re: Set Column Data Type for JSON Panda Split

Here is a JSL implementation for reference:

 

rows_to_check_for_conversion = 100;

// Get column names.
dt_cols = dt << Get Column Names();

// Change data types.
For Each( {c, i}, dt_cols, 
	
	non_missing_rows = dt << Get Rows Where( !Is Missing( As Column( dt, c ) ) );
	
	row_check = 0;
	convert_to_number = 1;
	While( row_check < rows_to_check_for_conversion & row_check < N Items( non_missing_rows ) & convert_to_number == 1,
		row_check = row_check + 1;
		r = non_missing_rows[row_check];
		val = Num( dt:c[r] );
		If( Is Missing( val ),
			convert_to_number = 0
		);
	);
	
	If( convert_to_number == 1,
		As Column( dt, c ) << Data Type( "Numeric" );
		As Column( dt, c ) << Modeling Type( "Continuous" );
	);
);
jthi
Super User


Re: Set Column Data Type for JSON Panda Split

Few changes which might be good to do: Use Column() instead of As Column() and check if adding << Restrict into your Num() call is something you want to do (I have done it for my version of something like this).

-Jarmo