How to generate a summary average of multiple csv files?

jojmp · Jun 10, 2023 4:22 PM

I have multiple csv files in the attached format in a particular folder , how do I generate a new file which contains the summary of the average of all the columns A, B, C, D of each csv file in the below format:

filename	A	B	C	D
	avg_colA_file1	avg_colB_file1	avg_colC_file1	avg_colC_file1
	avg_colA_file2	avg_colB_file2	avg_colC_file2	avg_colC_file2

I know to generate the average of each csv file using table summary but how can we do the same for multiple csv files using a single script?

txnelson · Nov 28, 2020 10:31 AM

Use "Multiple File Import()".

Documentation is provided in the Scripting Guide, available in the JMP Documentation Library under the Help pull down menu.

Jim

jojmp · Nov 28, 2020 11:04 AM

Thanks but this will only import multiple files into a single data table. I wanted to know how can I calculate the average of all columns in each file using a single script

txnelson · Nov 28, 2020 9:31 AM

The Multiple File Import has the ability to add a new column which contains the file name of file the data came from(Add File Name Column). This means that you can easily add a By() clause to your summary, which will provide separate averages for each file imported.

Jim

jojmp · Nov 30, 2020 6:22 AM

I am trying the below script but this does not run, could you please advise what is the issue here?

dt = Multiple File Import(
    <<Set Folder( "C:\Users\my_files\" ),
    <<Set Name Filter( "*.*;" ),
    <<Set Name Enable( 0 ),
    <<Set Size Filter( {37111019, 38896395} ),
    <<Set Size Enable( 0 ),
    <<Set Date Filter( {3689368844.046, 3689369157.819} ),
    <<Set Date Enable( 0 ),
    <<Set Add File Name Column( 1 ),
    <<Set Add File Size Column( 0 ),
    <<Set Add File Date Column( 0 ),
    <<Set Import Mode( "CSVData" ),
    <<Set Charset( "Best Guess" ),
    <<Set Stack Mode( "Stack Similar" ),
    <<Set CSV Has Headers( 1 ),
    <<Set CSV Allow Numeric( 1 ),
    <<Set CSV First Header Line( 1 ),
    <<Set CSV Number Of Header Lines( 1 ),
    <<Set CSV First Data Line( 2 ),
    <<Set CSV EOF Comma( 1 ),
    <<Set CSV EOF Tab( 0 ),
    <<Set CSV EOF Space( 0 ),
    <<Set CSV EOF Spaces( 0 ),
    <<Set CSV EOF Other( "" ),
    <<Set CSV EOL CRLF( 1 ),
    <<Set CSV EOL CR( 1 ),
    <<Set CSV EOL LF( 1 ),
    <<Set CSV EOL Semicolon( 0 ),
    <<Set CSV EOL Other( "" ),
    <<Set CSV Quote( "\!"" ),
    <<Set CSV Escape( "" )
) << Import Data;

dt << Get Rows Where( :A <= 20 );
dt << delete rows( sel );

dt << Summary( Group( :File Name ), Mean( :A ), Freq( "None" ), Weight( "None" ) );

Craige_Hales · Nov 29, 2020 04:55 AM

MFI always returns a list of data tables (not one table, but a { list } holding one or more tables). We'd need to see some messages from the log to give a better answer--not sure about the delete rows syntax--but either of these:

for(i=1,i<=nitems(dt),i+=1,
dtx = dt[i];

dtx << Get Rows Where ( dtx:A <= 20);
dtx << delete rows(sel);

dtx<<
Summary(
Group( dtx:File Name ),
Mean( dtx:A ),
Freq( "None" ),
Weight( "None" )
)

)

or if you know there is only one

dtx = dt[1];
dtx << Get Rows Where ( dtx:A <= 20);
dtx << delete rows(sel);

dtx<<
Summary(
Group( dtx:File Name ),
Mean( dtx:A ),
Freq( "None" ),
Weight( "None" )
)

using dtx to be the current table from the list in dt.

Craige

txnelson · Nov 29, 2020 05:20 AM

@Craige_Hales has the correct answer, with one error brought forward from the original code. The variable "sel" in:

dtx << delete rows(sel);

is not defined. The code needs to be changed to the below two work correctly

dtx = dt[1];
sel = dtx << Get Rows Where ( dtx:A <= 20);
dtx << delete rows(sel);

Jim

jojmp · Nov 29, 2020 11:27 AM

Thanks a lot for all the valuable suggestions, please find below the final working script for reference:

dt= Multiple File Import(
	<<Set Folder(
		"C:\Users\my_files\"
	),
	<<Set Name Filter( "*.*;" ),
	<<Set Name Enable( 0 ),
	<<Set Size Filter( {37111019, 38896395} ),
	<<Set Size Enable( 0 ),
	<<Set Date Filter( {3689368844.046, 3689369157.819} ),
	<<Set Date Enable( 0 ),
	<<Set Add File Name Column( 1 ),
	<<Set Add File Size Column( 0 ),
	<<Set Add File Date Column( 0 ),
	<<Set Import Mode( "CSVData" ),
	<<Set Charset( "Best Guess" ),
	<<Set Stack Mode( "Stack Similar" ),
	<<Set CSV Has Headers( 1 ),
	<<Set CSV Allow Numeric( 1 ),
	<<Set CSV First Header Line( 1 ),
	<<Set CSV Number Of Header Lines( 1 ),
	<<Set CSV First Data Line( 2 ),
	<<Set CSV EOF Comma( 1 ),
	<<Set CSV EOF Tab( 0 ),
	<<Set CSV EOF Space( 0 ),
	<<Set CSV EOF Spaces( 0 ),
	<<Set CSV EOF Other( "" ),
	<<Set CSV EOL CRLF( 1 ),
	<<Set CSV EOL CR( 1 ),
	<<Set CSV EOL LF( 1 ),
	<<Set CSV EOL Semicolon( 0 ),
	<<Set CSV EOL Other( "" ),
	<<Set CSV Quote( "\!"" ),
	<<Set CSV Escape( "" )
) << Import Data;

dtx = dt[1];
sel = dtx << Get Rows Where ( dtx:A <= 20);
dtx << delete rows(sel);

dtx<<
Summary(
Group( dtx:File Name ),
Mean( dtx:A ),
Freq( "None" ),
Weight( "None" )
)

jojmp · Nov 29, 2020 12:07 PM

Is there any way where we close all the imported tables and view only the summary table once the script has run?

txnelson · Nov 29, 2020 12:20 PM

close( dtx, nosave );

I strongly that you take the time to read the Scripting Guide, and to familiarize yourself with the Scripting Index. This will make your journey into JSL more efficient and enjoyable.

Jim

How to generate a summary average of multiple csv files?

Re: How to generate a summary average of multiple csv files?

Re: How to generate a summary average of multiple csv files?

Re: How to generate a summary average of multiple csv files?

Re: How to generate a summary average of multiple csv files?

Re: How to generate a summary average of multiple csv files?

Re: How to generate a summary average of multiple csv files?

Re: How to generate a summary average of multiple csv files?

Re: How to generate a summary average of multiple csv files?

Re: How to generate a summary average of multiple csv files?