Hi lwx228,
Below you will find a script which will import the data, recode appropriately, put texct to columns, stack, split to get the final table.
Some remarks as this is quite a weird data format:
- there is inconsistency with the delimiters and compared to usual txt files:
- comma is used in the fields and as seperator,
- quotation marks are used as well, but not for each element,
- brackets are used in several ways
- Sometimes there is a 'null' statement for a missing value, but sometimes not!
- There is no first row with column names, they are within the data fields. That makes it necessary to extract those and do the stack/split operation to get the columns as you need them for analysis (may someone has an easier approach - would be welcome to hear about it ;) )
- the default settings won't work therefore as you will get wrong number of fields per row!
- Best is to read in the whole string per row and do the rest in JMP using recode and column utilitites. Hopefully these can be used with a similar file as well.
- To script these things you have to test each step and if the step seems to be ok, save/copy the script into a script window and do the next step.
- That's how the script below was created.
Names Default to Here(1);
//Open your txt file using special settings, like } as end of line, uncheck column names, data starts row 1, ...
dt = Open(
"YOURPATH\U.txt", // exchange YOURPATH with your path to the file
columns( New Column( "c000001", Character, "Nominal" ) ),
Import Settings(
End Of Line( Semicolon, Other( "}" ) ),
End Of Field( CSV( 0 ) ),
Strip Quotes( 1 ),
Use Apostrophe as Quotation Mark( 0 ),
Use Regional Settings( 0 ),
Scan Whole File( 1 ),
Treat empty columns as numeric( 0 ),
CompressNumericColumns( 0 ),
CompressCharacterColumns( 0 ),
CompressAllowListCheck( 0 ),
Labels( 0 ),
Column Names Start( 1 ),
Data Starts( 1 ),
Lines To Read( "All" ),
Year Rule( "20xx" )
)
);
// Now you have one column with in each row all information after "name"
// Remove last two lines as they are just filled with brackets
dt << Clear Selected Row States;
m = matrix({N Rows(dt)-1, N Rows(dt)});
dt << Select Rows( m);
wait(2);
dt << Delete Rows();
// Best to use Recode and Column utility functions to seperate all information
dt << New Column("recoded","Character",Formula(Recode(
:c000001,
{Substitute( _rcNow, ",{", "" ), Substitute( _rcNow, "({[{", "" ),
Substitute( _rcNow, ":", "," ), Substitute( _rcNow, "\!",\!"", ";" ),
Substitute( _rcNow, ";;", ";null;" ),
Substitute( _rcNow, "\!"", "" ), Substitute( _rcNow, ",null,", ";null;" ),
Substitute( _rcNow, ",null", ";null" )
}
)));
// Get all text seperated by ; in seperated columns
dt << Text To Columns( delimiter( ";" ), columns( :recoded ) );
dt << Delete Columns(Column(1),Column(2));
// Now you have to stack the data appropriately
colNames = dt << Get Column Names();
dt_stack = dt << Stack(
Columns(colNames),
Source Label Column( "ColName" ),
Stacked Data Column( "Data" ),
Number of Series( 2 )
);
dt_stack << Delete Columns(1,3);
// Just to Split it again to get the columns with the correct names as well the data respectively
dt_final = dt_stack << Split(
Split By( :Data ),
Split( :Data 2 ),
Sort by Column Property
);
// Close data tables you do not need anymore
close(dt_stack, NoSave);
close(dt, NoSave);
// Finally you get an alphabetic order and all data is in character. So you now need to change this appropriately.
It was fun to go through this, but took some time to overcome the weird data challenges described above. However due to the built-in scripts and formulas not too much :)
One more tip: To get the recode formulas, use recode and save it as formula column, look into the formula editor and there you go.
Best,
Martin
/****NeverStopLearning****/