cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
jojmp
Level III

How to generate a summary average of multiple csv files?

I have multiple csv files in the attached format in a particular folder , how do I generate a new file which contains the summary of the average of all the columns A, B, C, D of each csv file in the below format:

filenameABCD
 avg_colA_file1avg_colB_file1avg_colC_file1avg_colC_file1
 avg_colA_file2avg_colB_file2avg_colC_file2avg_colC_file2
     

I know to generate the average of each csv file using table summary but how can we do the same for multiple csv files using a single script?

13 REPLIES 13
txnelson
Super User

Re: How to generate a summary average of multiple csv files?

Use "Multiple File Import()".

 

Documentation is provided in the Scripting Guide, available in the JMP Documentation Library under the Help pull down menu.

 

Jim
jojmp
Level III

Re: How to generate a summary average of multiple csv files?

Thanks but this will only import multiple files into a single data table. I wanted to know how can I calculate the average of all columns in each file using a single script
txnelson
Super User

Re: How to generate a summary average of multiple csv files?

The Multiple File Import has the ability to add a new column which contains the file name of file the data came from(Add File Name Column). This means that you can easily add a By() clause to your summary, which will provide separate averages for each file imported.

Jim
jojmp
Level III

Re: How to generate a summary average of multiple csv files?

I am trying the below script but this does not run, could you please advise what is the issue here?

dt = Multiple File Import(
    <<Set Folder( "C:\Users\my_files\" ),
    <<Set Name Filter( "*.*;" ),
    <<Set Name Enable( 0 ),
    <<Set Size Filter( {37111019, 38896395} ),
    <<Set Size Enable( 0 ),
    <<Set Date Filter( {3689368844.046, 3689369157.819} ),
    <<Set Date Enable( 0 ),
    <<Set Add File Name Column( 1 ),
    <<Set Add File Size Column( 0 ),
    <<Set Add File Date Column( 0 ),
    <<Set Import Mode( "CSVData" ),
    <<Set Charset( "Best Guess" ),
    <<Set Stack Mode( "Stack Similar" ),
    <<Set CSV Has Headers( 1 ),
    <<Set CSV Allow Numeric( 1 ),
    <<Set CSV First Header Line( 1 ),
    <<Set CSV Number Of Header Lines( 1 ),
    <<Set CSV First Data Line( 2 ),
    <<Set CSV EOF Comma( 1 ),
    <<Set CSV EOF Tab( 0 ),
    <<Set CSV EOF Space( 0 ),
    <<Set CSV EOF Spaces( 0 ),
    <<Set CSV EOF Other( "" ),
    <<Set CSV EOL CRLF( 1 ),
    <<Set CSV EOL CR( 1 ),
    <<Set CSV EOL LF( 1 ),
    <<Set CSV EOL Semicolon( 0 ),
    <<Set CSV EOL Other( "" ),
    <<Set CSV Quote( "\!"" ),
    <<Set CSV Escape( "" )
) << Import Data;

dt << Get Rows Where( :A <= 20 );
dt << delete rows( sel );

dt << Summary( Group( :File Name ), Mean( :A ), Freq( "None" ), Weight( "None" ) );
Craige_Hales
Super User

Re: How to generate a summary average of multiple csv files?

MFI always returns a list of data tables (not one table, but a { list } holding one or more tables). We'd need to see some messages from the log to give a better answer--not sure about the delete rows syntax--but either of these:

for(i=1,i<=nitems(dt),i+=1,
dtx = dt[i];

dtx << Get Rows Where ( dtx:A <= 20);
dtx << delete rows(sel);

dtx<<
Summary(
Group( dtx:File Name ),
Mean( dtx:A ),
Freq( "None" ),
Weight( "None" )
)

)

or if you know there is only one

dtx = dt[1];
dtx << Get Rows Where ( dtx:A <= 20);
dtx << delete rows(sel);

dtx<<
Summary(
Group( dtx:File Name ),
Mean( dtx:A ),
Freq( "None" ),
Weight( "None" )
)

using dtx to be the current table from the list in dt.

Craige
txnelson
Super User

Re: How to generate a summary average of multiple csv files?

@Craige_Hales has the correct answer, with one error brought forward from the original code.  The variable "sel" in:

dtx << delete rows(sel);

is not defined.  The code needs to be changed to the below two work correctly

dtx = dt[1];
sel = dtx << Get Rows Where ( dtx:A <= 20);
dtx << delete rows(sel);
Jim
jojmp
Level III

Re: How to generate a summary average of multiple csv files?

Thanks a lot for all the valuable suggestions, please find below the final working script for reference:

dt= Multiple File Import(
	<<Set Folder(
		"C:\Users\my_files\"
	),
	<<Set Name Filter( "*.*;" ),
	<<Set Name Enable( 0 ),
	<<Set Size Filter( {37111019, 38896395} ),
	<<Set Size Enable( 0 ),
	<<Set Date Filter( {3689368844.046, 3689369157.819} ),
	<<Set Date Enable( 0 ),
	<<Set Add File Name Column( 1 ),
	<<Set Add File Size Column( 0 ),
	<<Set Add File Date Column( 0 ),
	<<Set Import Mode( "CSVData" ),
	<<Set Charset( "Best Guess" ),
	<<Set Stack Mode( "Stack Similar" ),
	<<Set CSV Has Headers( 1 ),
	<<Set CSV Allow Numeric( 1 ),
	<<Set CSV First Header Line( 1 ),
	<<Set CSV Number Of Header Lines( 1 ),
	<<Set CSV First Data Line( 2 ),
	<<Set CSV EOF Comma( 1 ),
	<<Set CSV EOF Tab( 0 ),
	<<Set CSV EOF Space( 0 ),
	<<Set CSV EOF Spaces( 0 ),
	<<Set CSV EOF Other( "" ),
	<<Set CSV EOL CRLF( 1 ),
	<<Set CSV EOL CR( 1 ),
	<<Set CSV EOL LF( 1 ),
	<<Set CSV EOL Semicolon( 0 ),
	<<Set CSV EOL Other( "" ),
	<<Set CSV Quote( "\!"" ),
	<<Set CSV Escape( "" )
) << Import Data;

dtx = dt[1];
sel = dtx << Get Rows Where ( dtx:A <= 20);
dtx << delete rows(sel);

dtx<<
Summary(
Group( dtx:File Name ),
Mean( dtx:A ),
Freq( "None" ),
Weight( "None" )
)
jojmp
Level III

Re: How to generate a summary average of multiple csv files?

Is there any way where we close all the imported tables and view only the summary table once the script has run?
txnelson
Super User

Re: How to generate a summary average of multiple csv files?

close( dtx, nosave );

I strongly that you take the time to read the Scripting Guide, and to familiarize yourself with the Scripting Index.  This will make your journey into JSL more efficient and enjoyable.

Jim