I have multiple csv files in the attached format in a particular folder , how do I generate a new file which contains the summary of the average of all the columns A, B, C, D of each csv file in the below format:
filename | A | B | C | D |
avg_colA_file1 | avg_colB_file1 | avg_colC_file1 | avg_colC_file1 | |
avg_colA_file2 | avg_colB_file2 | avg_colC_file2 | avg_colC_file2 | |
I know to generate the average of each csv file using table summary but how can we do the same for multiple csv files using a single script?
Use "Multiple File Import()".
Documentation is provided in the Scripting Guide, available in the JMP Documentation Library under the Help pull down menu.
The Multiple File Import has the ability to add a new column which contains the file name of file the data came from(Add File Name Column). This means that you can easily add a By() clause to your summary, which will provide separate averages for each file imported.
I am trying the below script but this does not run, could you please advise what is the issue here?
dt = Multiple File Import(
<<Set Folder( "C:\Users\my_files\" ),
<<Set Name Filter( "*.*;" ),
<<Set Name Enable( 0 ),
<<Set Size Filter( {37111019, 38896395} ),
<<Set Size Enable( 0 ),
<<Set Date Filter( {3689368844.046, 3689369157.819} ),
<<Set Date Enable( 0 ),
<<Set Add File Name Column( 1 ),
<<Set Add File Size Column( 0 ),
<<Set Add File Date Column( 0 ),
<<Set Import Mode( "CSVData" ),
<<Set Charset( "Best Guess" ),
<<Set Stack Mode( "Stack Similar" ),
<<Set CSV Has Headers( 1 ),
<<Set CSV Allow Numeric( 1 ),
<<Set CSV First Header Line( 1 ),
<<Set CSV Number Of Header Lines( 1 ),
<<Set CSV First Data Line( 2 ),
<<Set CSV EOF Comma( 1 ),
<<Set CSV EOF Tab( 0 ),
<<Set CSV EOF Space( 0 ),
<<Set CSV EOF Spaces( 0 ),
<<Set CSV EOF Other( "" ),
<<Set CSV EOL CRLF( 1 ),
<<Set CSV EOL CR( 1 ),
<<Set CSV EOL LF( 1 ),
<<Set CSV EOL Semicolon( 0 ),
<<Set CSV EOL Other( "" ),
<<Set CSV Quote( "\!"" ),
<<Set CSV Escape( "" )
) << Import Data;
dt << Get Rows Where( :A <= 20 );
dt << delete rows( sel );
dt << Summary( Group( :File Name ), Mean( :A ), Freq( "None" ), Weight( "None" ) );
MFI always returns a list of data tables (not one table, but a { list } holding one or more tables). We'd need to see some messages from the log to give a better answer--not sure about the delete rows syntax--but either of these:
for(i=1,i<=nitems(dt),i+=1,
dtx = dt[i];
dtx << Get Rows Where ( dtx:A <= 20);
dtx << delete rows(sel);
dtx<<
Summary(
Group( dtx:File Name ),
Mean( dtx:A ),
Freq( "None" ),
Weight( "None" )
)
)
or if you know there is only one
dtx = dt[1];
dtx << Get Rows Where ( dtx:A <= 20);
dtx << delete rows(sel);
dtx<<
Summary(
Group( dtx:File Name ),
Mean( dtx:A ),
Freq( "None" ),
Weight( "None" )
)
using dtx to be the current table from the list in dt.
@Craige_Hales has the correct answer, with one error brought forward from the original code. The variable "sel" in:
dtx << delete rows(sel);
is not defined. The code needs to be changed to the below two work correctly
dtx = dt[1];
sel = dtx << Get Rows Where ( dtx:A <= 20);
dtx << delete rows(sel);
Thanks a lot for all the valuable suggestions, please find below the final working script for reference:
dt= Multiple File Import(
<<Set Folder(
"C:\Users\my_files\"
),
<<Set Name Filter( "*.*;" ),
<<Set Name Enable( 0 ),
<<Set Size Filter( {37111019, 38896395} ),
<<Set Size Enable( 0 ),
<<Set Date Filter( {3689368844.046, 3689369157.819} ),
<<Set Date Enable( 0 ),
<<Set Add File Name Column( 1 ),
<<Set Add File Size Column( 0 ),
<<Set Add File Date Column( 0 ),
<<Set Import Mode( "CSVData" ),
<<Set Charset( "Best Guess" ),
<<Set Stack Mode( "Stack Similar" ),
<<Set CSV Has Headers( 1 ),
<<Set CSV Allow Numeric( 1 ),
<<Set CSV First Header Line( 1 ),
<<Set CSV Number Of Header Lines( 1 ),
<<Set CSV First Data Line( 2 ),
<<Set CSV EOF Comma( 1 ),
<<Set CSV EOF Tab( 0 ),
<<Set CSV EOF Space( 0 ),
<<Set CSV EOF Spaces( 0 ),
<<Set CSV EOF Other( "" ),
<<Set CSV EOL CRLF( 1 ),
<<Set CSV EOL CR( 1 ),
<<Set CSV EOL LF( 1 ),
<<Set CSV EOL Semicolon( 0 ),
<<Set CSV EOL Other( "" ),
<<Set CSV Quote( "\!"" ),
<<Set CSV Escape( "" )
) << Import Data;
dtx = dt[1];
sel = dtx << Get Rows Where ( dtx:A <= 20);
dtx << delete rows(sel);
dtx<<
Summary(
Group( dtx:File Name ),
Mean( dtx:A ),
Freq( "None" ),
Weight( "None" )
)
close( dtx, nosave );
I strongly that you take the time to read the Scripting Guide, and to familiarize yourself with the Scripting Index. This will make your journey into JSL more efficient and enjoyable.