BookmarkSubscribe
Choose Language Hide Translation Bar
Highlighted
jojmp
Occasional Contributor

Post processing of data in a csv file using script

I have a folder with mutiple csv files having similar data structure  whose data I need to process and store the results in a new file.
Each csv file has 2 columns, I need to process the data in the second column as follows:

  1. Name the second column as items
  2. Perform the following operation on all rows creating a new column x with the following formula :36 + -19.5 * :items+ 49.99 *:items* :items* :items
  3. Perform the following operation on all rows of column x creating column y:
    Mean( :x[Index( Row() - 4, Row() )] )
  4. Find the mean and max of column y
  5. Store the value mean and max into the file and repeat the same procedure for other files as wellPfa sample csv file.
    I am tring something as below but since just a beginner so culd yu please help

    // Read all the file names
    files = Files In Directory( "C:\Users\MyUserName\Documents\My Data\" );
    X = N Items( files );
    Y = "C:\Users\MyUserName\Documents\My Data\";
    //stores the mean and max value of each file
    dt = New Table( "My data file");
    For( i = 1, i < X, i++,
    //open jmp files for each csv files.
    dt1 = Open(
    Y || files[i],
    
     //Code to add column name and the find mean and max
    );
    dt1 << Set Name( files[i] );
     
     
    //I concatenated all the files in one jmp file and close it each time.
    dt << Concatenate( Data Table( dt1 ), "Append to first table" );
    Close( dt1 );
     
     
    );
0 Kudos
2 ACCEPTED SOLUTIONS

Accepted Solutions
ian_jmp
Staff

Re: Post processing of data in a csv file using script

If you are just starting out, you should take this in stages, understanding how each part works. Use 'Help > Scripting Index' and 'Help > Books > Scripting Guide' to look things up.

 

The code below should get you started, and you may be able to do the rest. You will need to be more presise about what you mean by 'store the value . . . into the file'.

NamesDefaultToHere(1);

// Use a function to open a .CSV file: Given the path to such a file, returns a reference to the resulting table
openMyCSV = 
function({path2File}, {Default Local},
			dt = Open(
					path2File,
					columns(
						New Column( "column 1", Numeric, "Continuous", Format( "Best", 12 ) ),
						New Column( "items", Numeric, "Continuous", Format( "Best", 12 ) )
						),
					Import Settings(
						End Of Line( CRLF, CR, LF ),
						End Of Field( Comma, CSV( 0 ) ),
						Strip Quotes( 1 ),
						Use Apostrophe as Quotation Mark( 0 ),
						Use Regional Settings( 0 ),
						Scan Whole File( 1 ),
						Treat empty columns as numeric( 0 ),
						CompressNumericColumns( 0 ),
						CompressCharacterColumns( 0 ),
						CompressAllowListCheck( 0 ),
						Labels( 0 ),
						Column Names Start( 1 ),
						Data Starts( 1 ),
						Lines To Read( "All" ),
						Year Rule( "20xx" )
					)
				)
	);

// Put all the .csv files to process into a list
dir = pickDirectory("Directory containing your csv files", "$DESKTOP");
files = filesInDirectory(dir);
for(f = NItems(files), f>=1, f--, if(!endsWith(uppercase(files[f]), ".CSV"), removeFrom(files, f)));

// Process each file - Incomplete!
for(f = 1, f <= NItems(files), f++,
	thisTable = openMyCSV(dir||files[f]);
	thisTable << newColumn("x", Numeric, Continuous, Formula(36 + -19.5 * :items + 49.99 *:items* :items* :items));
	);
txnelson
Super User

Re: Post processing of data in a csv file using script

You are oh so close.......

You appeared to be moving towards putting into a list, the statistics that you want after each csv is read in.  Your syntax for doing that is incorrect, and in my opinion, you do not have to put the values into a list, but rather, put them directly into your summary table.  Also, when calculating statistics on a column, you need to use the column form of the statistic (i.e. Col Max() ).  Finally, if once you have finished processing of the csv file, you do not have a need for it to continue to hang around, you can simple close it.  I added in a line that would do that, but I commented it out.  Just remove the // if you want it to take effect.

Names Default To Here( 1 );

// Use a function to open a .CSV file: Given the path to such a file, returns a reference to the resulting table
openMyCSV = Function( {path2File},
	{Default Local},
	dt = Open(
		path2File,
		columns(
			New Column( "column 1", Numeric, "Continuous", Format( "Best", 12 ) ),
			New Column( "items", Numeric, "Continuous", Format( "Best", 12 ) )
		),
		Import Settings(
			End Of Line( CRLF, CR, LF ),
			End Of Field( Comma, CSV( 0 ) ),
			Strip Quotes( 1 ),
			Use Apostrophe as Quotation Mark( 0 ),
			Use Regional Settings( 0 ),
			Scan Whole File( 1 ),
			Treat empty columns as numeric( 0 ),
			CompressNumericColumns( 0 ),
			CompressCharacterColumns( 0 ),
			CompressAllowListCheck( 0 ),
			Labels( 0 ),
			Column Names Start( 1 ),
			Data Starts( 1 ),
			Lines To Read( "All" ),
			Year Rule( "20xx" )
		)
	)
);
	
// Start your summary table
dt = New Table( "My summary file" );
dt << New Column( "MovAvgMean", Numeric );
dt << New Column( "MovAvgMax", Numeric );

// Put all the .csv files to process into a list
dir = Pick Directory(
	"Directory containing your csv files",
	"C:\Users\shilpajo\Documents\PV\KT\LIMC\iccMax\DIADEM\JMP\files\"
);
files = Files In Directory( dir );
For( f = N Items( files ), f >= 1, f--,
	If( !Ends With( Uppercase( files[f] ), ".CSV" ),
		Remove From( files, f )
	)
);

// Process each file - Incomplete!
For( f = 1, f <= N Items( files ), f++,
	thisTable = openMyCSV( dir || files[f] );
	thisTable << New Column( "x",
		Numeric,
		Continuous,
		Formula( 36 + -19.5 * :items + 49.99 * :items * :items * :items )
	);
	thisTable << New Column( "av", Numeric, Continuous, Formula( Mean( :x[Index( Row() - 9, Row() )] ) ) );
	
	// This will insure that all formula processing is complet before the next statement is run
	thisTable << run formulas; 
	
	// Add the summary statistics to the summary table
	dt << Add Rows( 1 );
	dt:MovAvgMean[N Rows( dt )] = Col Mean( thisTable:av );
	dt:MovAvgMax[N Rows( dt )] = Col Max( thisTable:av );
	
	// Close( thisTable, nosave );
);
	

 

Jim
5 REPLIES 5
ian_jmp
Staff

Re: Post processing of data in a csv file using script

If you are just starting out, you should take this in stages, understanding how each part works. Use 'Help > Scripting Index' and 'Help > Books > Scripting Guide' to look things up.

 

The code below should get you started, and you may be able to do the rest. You will need to be more presise about what you mean by 'store the value . . . into the file'.

NamesDefaultToHere(1);

// Use a function to open a .CSV file: Given the path to such a file, returns a reference to the resulting table
openMyCSV = 
function({path2File}, {Default Local},
			dt = Open(
					path2File,
					columns(
						New Column( "column 1", Numeric, "Continuous", Format( "Best", 12 ) ),
						New Column( "items", Numeric, "Continuous", Format( "Best", 12 ) )
						),
					Import Settings(
						End Of Line( CRLF, CR, LF ),
						End Of Field( Comma, CSV( 0 ) ),
						Strip Quotes( 1 ),
						Use Apostrophe as Quotation Mark( 0 ),
						Use Regional Settings( 0 ),
						Scan Whole File( 1 ),
						Treat empty columns as numeric( 0 ),
						CompressNumericColumns( 0 ),
						CompressCharacterColumns( 0 ),
						CompressAllowListCheck( 0 ),
						Labels( 0 ),
						Column Names Start( 1 ),
						Data Starts( 1 ),
						Lines To Read( "All" ),
						Year Rule( "20xx" )
					)
				)
	);

// Put all the .csv files to process into a list
dir = pickDirectory("Directory containing your csv files", "$DESKTOP");
files = filesInDirectory(dir);
for(f = NItems(files), f>=1, f--, if(!endsWith(uppercase(files[f]), ".CSV"), removeFrom(files, f)));

// Process each file - Incomplete!
for(f = 1, f <= NItems(files), f++,
	thisTable = openMyCSV(dir||files[f]);
	thisTable << newColumn("x", Numeric, Continuous, Formula(36 + -19.5 * :items + 49.99 *:items* :items* :items));
	);
jojmp
Occasional Contributor

Re: Post processing of data in a csv file using script

This code was very helpful for me to get started. The below code perfectly works. By storing I mean I need to create a summary table that contains the mean and max of the column av of each csv file, how can I do that?

I tried the using : dt << Add Rows( {MovAvgMean = Mean(:av), MovAvgMax = Max(:av)} ); // single list

But it does not work as I get error 

Name Unresolved: av{1} in access or evaluation of 'av' , :av/*###*/

NamesDefaultToHere(1);

// Use a function to open a .CSV file: Given the path to such a file, returns a reference to the resulting table
openMyCSV = 
function({path2File}, {Default Local},
			dt = Open(
					path2File,
					columns(
						New Column( "column 1", Numeric, "Continuous", Format( "Best", 12 ) ),
						New Column( "items", Numeric, "Continuous", Format( "Best", 12 ) )
						),
					Import Settings(
						End Of Line( CRLF, CR, LF ),
						End Of Field( Comma, CSV( 0 ) ),
						Strip Quotes( 1 ),
						Use Apostrophe as Quotation Mark( 0 ),
						Use Regional Settings( 0 ),
						Scan Whole File( 1 ),
						Treat empty columns as numeric( 0 ),
						CompressNumericColumns( 0 ),
						CompressCharacterColumns( 0 ),
						CompressAllowListCheck( 0 ),
						Labels( 0 ),
						Column Names Start( 1 ),
						Data Starts( 1 ),
						Lines To Read( "All" ),
						Year Rule( "20xx" )
					)
				)
	);
	
dt = New Table( "My data file");
dt << New Column( "MovAvgMean", Numeric );
dt << New Column( "MovAvgMax", Numeric );

// Put all the .csv files to process into a list
dir = pickDirectory("Directory containing your csv files", "$DESKTOP");
files = filesInDirectory(dir);
for(f = NItems(files), f>=1, f--, if(!endsWith(uppercase(files[f]), ".CSV"), removeFrom(files, f)));

// Process each file - Incomplete!
for(f = 1, f <= NItems(files), f++,
	thisTable = openMyCSV(dir||files[f]);
	thisTable << newColumn("x", Numeric, Continuous, Formula(36 + -19.5 * :items + 49.99 *:items* :items* :items));
	thisTable << newColumn("av", Numeric, Continuous, Formula(Mean( :x[Index( Row() - 9, Row() )] )));
	dt << Add Rows( {MovAvgMean =Mean(:av), MovAvgMax = Max(:av)} ); // single list
	);
	
// Make the summary table



	


 

0 Kudos
txnelson
Super User

Re: Post processing of data in a csv file using script

You are oh so close.......

You appeared to be moving towards putting into a list, the statistics that you want after each csv is read in.  Your syntax for doing that is incorrect, and in my opinion, you do not have to put the values into a list, but rather, put them directly into your summary table.  Also, when calculating statistics on a column, you need to use the column form of the statistic (i.e. Col Max() ).  Finally, if once you have finished processing of the csv file, you do not have a need for it to continue to hang around, you can simple close it.  I added in a line that would do that, but I commented it out.  Just remove the // if you want it to take effect.

Names Default To Here( 1 );

// Use a function to open a .CSV file: Given the path to such a file, returns a reference to the resulting table
openMyCSV = Function( {path2File},
	{Default Local},
	dt = Open(
		path2File,
		columns(
			New Column( "column 1", Numeric, "Continuous", Format( "Best", 12 ) ),
			New Column( "items", Numeric, "Continuous", Format( "Best", 12 ) )
		),
		Import Settings(
			End Of Line( CRLF, CR, LF ),
			End Of Field( Comma, CSV( 0 ) ),
			Strip Quotes( 1 ),
			Use Apostrophe as Quotation Mark( 0 ),
			Use Regional Settings( 0 ),
			Scan Whole File( 1 ),
			Treat empty columns as numeric( 0 ),
			CompressNumericColumns( 0 ),
			CompressCharacterColumns( 0 ),
			CompressAllowListCheck( 0 ),
			Labels( 0 ),
			Column Names Start( 1 ),
			Data Starts( 1 ),
			Lines To Read( "All" ),
			Year Rule( "20xx" )
		)
	)
);
	
// Start your summary table
dt = New Table( "My summary file" );
dt << New Column( "MovAvgMean", Numeric );
dt << New Column( "MovAvgMax", Numeric );

// Put all the .csv files to process into a list
dir = Pick Directory(
	"Directory containing your csv files",
	"C:\Users\shilpajo\Documents\PV\KT\LIMC\iccMax\DIADEM\JMP\files\"
);
files = Files In Directory( dir );
For( f = N Items( files ), f >= 1, f--,
	If( !Ends With( Uppercase( files[f] ), ".CSV" ),
		Remove From( files, f )
	)
);

// Process each file - Incomplete!
For( f = 1, f <= N Items( files ), f++,
	thisTable = openMyCSV( dir || files[f] );
	thisTable << New Column( "x",
		Numeric,
		Continuous,
		Formula( 36 + -19.5 * :items + 49.99 * :items * :items * :items )
	);
	thisTable << New Column( "av", Numeric, Continuous, Formula( Mean( :x[Index( Row() - 9, Row() )] ) ) );
	
	// This will insure that all formula processing is complet before the next statement is run
	thisTable << run formulas; 
	
	// Add the summary statistics to the summary table
	dt << Add Rows( 1 );
	dt:MovAvgMean[N Rows( dt )] = Col Mean( thisTable:av );
	dt:MovAvgMax[N Rows( dt )] = Col Max( thisTable:av );
	
	// Close( thisTable, nosave );
);
	

 

Jim
jojmp
Occasional Contributor

Re: Post processing of data in a csv file using script

Thanks for optimizing and solving the error, very helpful!!
0 Kudos
jojmp
Occasional Contributor

Re: Post processing of data in a csv file using script

This gives a good foundation to solve the problem
0 Kudos