Solved: Re: Cleaning Up CSV File with Different Blocks of Data

Artemio · Jul 11, 2017 08:06 PM

I have several csv files that need to be clean up. The files in question have two basic data block structures. The first block consist of 1 column with 7 rows of information. The second segment contains 6 columns with 5+ rows of data.

For the first block, the ideal transformation is to break the column into two pieces. One piece will provide the header or name of the column. And, the second piece will supply data information about that specific experiment.

The second block needs minimum intervention.

So far, I have managed to get close to the desired state, with the exception of a couple of nagging points. First, breaking the csv file into two pieces –two tables- is not a problem. The first data table –dt1- is the original csv file from which data is extracted and some rows deleted to create a second data table –dt2. The second data table contains the 1 column with 7 rows; which need further treatment.

In order to do this, I start with a new data table -dt3. I have succeeded in creating the column headings with a For Loop and taking the first element of a list. This information comes from dt2. The next portion is the one that is giving me a problem. I need to pass or send the second element of the list to the first row of dt3.

First question:

Why the following line is able to change the first cell of column 1 in the original data table (dt1) but the next piece of code does not work on the new data table?

dt1[1,1]=300; //this works

dt3[1,1]=300; //this does not work. It produces an error.

invalid subscript (must be number or list of numbers) in access or evaluation of 'Assign' , dt3[1, 1] = /*###*/300 /*###*/

Second question

I using : as the character identifier to separate the first seven rows into header information and row data. The downside of this approach is that Date/Time row has 3 of these characters and data is lost after the second found :. How can I prevent this from happening?

Any ideas or suggestions? The attached Excel file is an example of data table.

Thank you so much for any assistance

Example of original csv file

Food Recipe: 1-Thick-sauce
Blender Recipe: 3-speed
Temperature Recipe: 100 Steam
Cooking Hardware: Stainless
Batch ID: Mini-Cake-100
Sample ID: Mini-Cake-100.1
Date/Time: 6/30/2017 4:12:44 PM


Point	Density	Sugar	Ratio	X	Y
1	5020.607	98.6872	0.9976	145	0
2	4976.009	98.1275	0.9955	108.75	0
3	4977.177	98.3677	0.9958	72.5	0
4	4982.821	98.2569	0.9957	36.25	0
5	4989.282	98.2105	0.9977	0	0

Example of data table 2 (dt2). 1 Column with 7 rows. The : separates the column name from the experimental information.

Food Recipe: 1-Thick-sauce

Blender Recipe: 3-speed

Temperature Recipe: 100 Steam

Cooking Hardware: Stainless

Batch ID: Mini-Cake-100

Sample ID: Mini-Cake-100.1

Date/Time: 6/30/2017 4:12:44 PM

Desired transformation of dt2.

Food Recipe	Blender Recipe	Temperature Recipe	Cooking Hardware	Batch ID	Sample ID	Date/Time
1-Thick-sauce	3-speed	100 Steam	Stainless	Mini-Cake-100	Mini-Cake-100.1	#########

JSL Code



Names Default To Here( 1 );
Close All(data tables);
Clear Log();
Clear Symbols();
Close All(Journals, No Save);


//***********************************************************************************************************************************************************
/*Open collected data from a share drive
and assign working data table(s) -->dt1
*/
dt1 = Open(
	"C:\Users\...................filename.csv",
	columns(
		Column( "c000001", Character, "Nominal" ),
		Column( "c000002", Character, "Nominal" ),
		Column( "c000003", Character, "Nominal" ),
		Column( "c000004", Character, "Nominal" ),
		Column( "c000005", Character, "Nominal" ),
		Column( "c000006", Character, "Nominal" )
	),
	Import Settings(
		End Of Line( CRLF, CR, LF ),
		End Of Field( Comma, CSV( 0 ) ),
		Strip Quotes( 1 ),
		Use Apostrophe as Quotation Mark( 0 ),
		Use Regional Settings( 0 ),
		Scan Whole File( 1 ),
		Treat empty columns as numeric( 0 ),
		CompressNumericColumns( 0 ),
		CompressCharacterColumns( 0 ),
		CompressAllowListCheck( 0 ),
		Labels( 0 ),
		Column Names Start( 1 ),
		Data Starts( 1 ),
		Lines To Read( "All" ),
		Year Rule( "20xx" )
	)
);


//The raw data table repeats information every 48 rows
//This table separates each item into individual tables
//The generated table does not have formated column headings
//For(i=1, i <=x, i++,     //this For loop is ignored for now until dt2 --> dt problem is solved
selrows =dt1 <<Select Rows(1::7);
dt2 =dt1<<Subset(selrows);
dt1 <<Delete rows(1::9);

For(i=1, i<=N Col(dt1), i++,
	Column(dt1, i) <<Set Name(Trim Whitespace(Column(dt1,i)[1]));
	Column(dt1, i) << Data Type("Numeric")<<Modeling type (Continuous)	
	);
	
dt1 <<Delete Rows(1);

dt3 =New Table("Recipe Info");

For(ii=1, ii<=N Rows(dt2), ii++,
	rlist =Words(Column(dt2,1)[ii], ":");  //Get the first element in the list and use it for column headings
	dt3 <<New Column(rlist[1]);           //Create/insert new columns into table dt3

	If(Regex(Trim Whitespace(Word(2,Column(dt2,1)[ii],":")), "^[a-zA-Z0-9]+$")==Trim Whitespace(Word(2,Column(dt2,1)[ii],":")),
		Column(dt3,ii) <<Data Type("Numeric")<<Modeling Type(Continous),
		Column(dt3,ii) <<Data Type("Character")
	);
		
	//dt3[ii,1] =rlist[2];   //It does not work.  The section below is for troubleshooting this problem.
);

a=1;
dt1[a,1]=300;   //this works	
dt3[a,1]=300;   //this does not work.  Why????????  What is different between dt1 and dt3?????


For(iii=1, iii<=N Rows(dt2), iii++,
	
	//strnlng =contains(Column(dt2,1)[ii], ":");  //Length of the string until ":" is found
	//show(Word(1,Column(dt2,1)[ii]),":");
	//cntword = Trim Whitespace(Word(2,Column(dt2,1)[iii],":"));
	//Show(cntword);
	dt3[iii,1] = Trim Whitespace(Word(2,Column(dt2,1)[iii],":"));
);

txnelson · Jul 11, 2017 08:40 PM

Your first issue is that there are no rows in your dt3 data table. If you add

dt3 << add rows(N Rows( dt2 ));

right after creating dt3, you will be able to populate that table.

The second issue is that what you want is not the second Word of the string, but rather, everything following the first ":". Therefore, using the Substr() function will work better:

dt3[iii, 1] = Trim Whitespace( substr(Column( dt2, 1 )[iii], contains(column( dt2, 1 ), ":") ) )

Jim

View solution in original post

txnelson · Jul 11, 2017 08:40 PM

Your first issue is that there are no rows in your dt3 data table. If you add

dt3 << add rows(N Rows( dt2 ));

right after creating dt3, you will be able to populate that table.

The second issue is that what you want is not the second Word of the string, but rather, everything following the first ":". Therefore, using the Substr() function will work better:

dt3[iii, 1] = Trim Whitespace( substr(Column( dt2, 1 )[iii], contains(column( dt2, 1 ), ":") ) )

Jim

Artemio · Jul 13, 2017 12:00 AM

Mr. Nelson,

Wow! I was not expecting such a fast response...... I have implemented your first suggestion, with a minor change, and it works. Very nice.

You have teach me a valuable lesson. It never crossed my mind that the new table did not have any rows associated with it.

Tomorrow morning I will start working with the second piece.

Thank you so much

Art