- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Set Column Data Type for JSON Panda Split
I am using JSL to import a JSON file using a Panda Split format. When I do this, all columns are imported as Character, when in practice they should be a combination of Character and Numeric. I can write a JSL script convert each to Numeric, but this will also convert the Character columns (and delete their data).
Using JSL, can I detect if a Data Type should be Character or Numeric and convert it accordingly? Or is there a way to do this during the import step? I do not know the columns for a specific file beforehand.
JMP 18.1.1
dt_cols = dt << Get Column Names();
For Each({col, i}, dt_cols,
Try(
// No warning, just converts everything.
As Column(dt, col) << Data Type("Numeric");
As Column(dt, col) << Modeling Type( "Continuous" );
);
);
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Set Column Data Type for JSON Panda Split
I don't have proper example to test this out (JMP just crashes with my JSON example I quickly got) so I cannot test if there is a way to do it in JSON Wizard.
You can convert just numeric values by checking values from the column first, and then convert them accordingly.
One way of performing the conversion is to first check some values of the column and then convert if they are OK. Few options you could check
- Just check first value
- Check first and last value
- Check all values
- Check random subset of values
Nowadays I would do the check using Num(<<Restrict) (earlier wasn't possible as it converted too much). If the value isn't missing (unless it was missing already before conversion) it should be valid to be changed to numeric -> change data type of column. I think you can also use InFormat() for checking the "best" match (might be even better than Num()).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Set Column Data Type for JSON Panda Split
Here is a JSL implementation for reference:
rows_to_check_for_conversion = 100;
// Get column names.
dt_cols = dt << Get Column Names();
// Change data types.
For Each( {c, i}, dt_cols,
non_missing_rows = dt << Get Rows Where( !Is Missing( As Column( dt, c ) ) );
row_check = 0;
convert_to_number = 1;
While( row_check < rows_to_check_for_conversion & row_check < N Items( non_missing_rows ) & convert_to_number == 1,
row_check = row_check + 1;
r = non_missing_rows[row_check];
val = Num( dt:c[r] );
If( Is Missing( val ),
convert_to_number = 0
);
);
If( convert_to_number == 1,
As Column( dt, c ) << Data Type( "Numeric" );
As Column( dt, c ) << Modeling Type( "Continuous" );
);
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Set Column Data Type for JSON Panda Split
I don't have proper example to test this out (JMP just crashes with my JSON example I quickly got) so I cannot test if there is a way to do it in JSON Wizard.
You can convert just numeric values by checking values from the column first, and then convert them accordingly.
One way of performing the conversion is to first check some values of the column and then convert if they are OK. Few options you could check
- Just check first value
- Check first and last value
- Check all values
- Check random subset of values
Nowadays I would do the check using Num(<<Restrict) (earlier wasn't possible as it converted too much). If the value isn't missing (unless it was missing already before conversion) it should be valid to be changed to numeric -> change data type of column. I think you can also use InFormat() for checking the "best" match (might be even better than Num()).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Set Column Data Type for JSON Panda Split
Yes, that works. Thanks Jarmo.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Set Column Data Type for JSON Panda Split
Here is a JSL implementation for reference:
rows_to_check_for_conversion = 100;
// Get column names.
dt_cols = dt << Get Column Names();
// Change data types.
For Each( {c, i}, dt_cols,
non_missing_rows = dt << Get Rows Where( !Is Missing( As Column( dt, c ) ) );
row_check = 0;
convert_to_number = 1;
While( row_check < rows_to_check_for_conversion & row_check < N Items( non_missing_rows ) & convert_to_number == 1,
row_check = row_check + 1;
r = non_missing_rows[row_check];
val = Num( dt:c[r] );
If( Is Missing( val ),
convert_to_number = 0
);
);
If( convert_to_number == 1,
As Column( dt, c ) << Data Type( "Numeric" );
As Column( dt, c ) << Modeling Type( "Continuous" );
);
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Set Column Data Type for JSON Panda Split
Few changes which might be good to do: Use Column() instead of As Column() and check if adding << Restrict into your Num() call is something you want to do (I have done it for my version of something like this).