cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
View Original Published Thread

Cleaning Up CSV File with Different Blocks of Data

Artemio
Level II

I have several csv files that need to be clean up.  The files in question have two basic data block structures.  The first block consist of 1 column with 7 rows of information.  The second segment contains 6 columns with 5+ rows of data.

 

For the first block, the ideal transformation is to break the column into two pieces.  One piece will provide the header or name of the column.  And, the second piece will supply data information about that specific experiment.

 

The second block needs minimum intervention.

 

So far, I have managed to get close to the desired state, with the exception of a couple of nagging points.  First, breaking the csv file into two pieces –two tables- is not a problem.  The first data table –dt1- is the original csv file from which data is extracted and some rows deleted to create a second data table –dt2.  The second data table contains the 1 column with 7 rows; which need further treatment. 

 

In order to do this, I start with a new data table -dt3.  I have succeeded in creating the column headings with a For Loop and taking the first element of a list.  This information comes from dt2.  The next portion is the one that is giving me a problem.  I need to pass or send the second element of the list to the first row of dt3.

 

First question:

Why the following line is able to change the first cell of column 1 in the original data table (dt1) but the next piece of code does not work on the new data table?

dt1[1,1]=300;   //this works

dt3[1,1]=300;   //this does not work.  It produces an error.

invalid subscript (must be number or list of numbers) in access or evaluation of 'Assign' , dt3[1, 1] =  /*###*/300 /*###*/   

 

 

Second question

 

I using : as the character identifier to separate the first seven rows into header information and row data.  The downside of this approach is that Date/Time row has 3 of these characters and data is lost after the second found :.  How can I prevent this from happening? 

 

Any ideas or suggestions?  The attached Excel file is an example of data table.

 

 

Thank you so much for any assistance  

 

  

Example of original csv file

 

Food Recipe:       1-Thick-sauce

 

 

 

Blender Recipe:       3-speed

 

 

 

Temperature Recipe: 100 Steam

 

 

 

Cooking Hardware:    Stainless

 

 

 

Batch ID:             Mini-Cake-100

 

 

 

Sample ID:           Mini-Cake-100.1

 

 

Date/Time:          6/30/2017 4:12:44 PM

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Point

Density

Sugar

Ratio

          X

              Y

1

5020.607

98.6872

0.9976

145

0

2

4976.009

98.1275

0.9955

108.75

0

3

4977.177

98.3677

0.9958

72.5

0

4

4982.821

98.2569

0.9957

36.25

0

5

4989.282

98.2105

0.9977

0

0

 

Example of data table 2 (dt2).  1 Column with 7 rows.  The : separates the column name from the experimental information.

Food Recipe:       1-Thick-sauce

Blender Recipe:       3-speed

Temperature Recipe: 100 Steam

Cooking Hardware:    Stainless

Batch ID:             Mini-Cake-100

Sample ID:           Mini-Cake-100.1

Date/Time:          6/30/2017 4:12:44 PM

 

 

Desired transformation of dt2.

Food Recipe

Blender Recipe

Temperature Recipe

Cooking Hardware

Batch ID

Sample ID

Date/Time

1-Thick-sauce

3-speed

100 Steam

Stainless

Mini-Cake-100

Mini-Cake-100.1

#########

 

 

JSL Code

 


Names Default To Here( 1 );
Close All(data tables);
Clear Log();
Clear Symbols();
Close All(Journals, No Save);


//***********************************************************************************************************************************************************
/*Open collected data from a share drive
and assign working data table(s) -->dt1
*/
dt1 = Open(
	"C:\Users\...................filename.csv",
	columns(
		Column( "c000001", Character, "Nominal" ),
		Column( "c000002", Character, "Nominal" ),
		Column( "c000003", Character, "Nominal" ),
		Column( "c000004", Character, "Nominal" ),
		Column( "c000005", Character, "Nominal" ),
		Column( "c000006", Character, "Nominal" )
	),
	Import Settings(
		End Of Line( CRLF, CR, LF ),
		End Of Field( Comma, CSV( 0 ) ),
		Strip Quotes( 1 ),
		Use Apostrophe as Quotation Mark( 0 ),
		Use Regional Settings( 0 ),
		Scan Whole File( 1 ),
		Treat empty columns as numeric( 0 ),
		CompressNumericColumns( 0 ),
		CompressCharacterColumns( 0 ),
		CompressAllowListCheck( 0 ),
		Labels( 0 ),
		Column Names Start( 1 ),
		Data Starts( 1 ),
		Lines To Read( "All" ),
		Year Rule( "20xx" )
	)
);


//The raw data table repeats information every 48 rows
//This table separates each item into individual tables
//The generated table does not have formated column headings
//For(i=1, i <=x, i++,     //this For loop is ignored for now until dt2 --> dt problem is solved
selrows =dt1 <<Select Rows(1::7);
dt2 =dt1<<Subset(selrows);
dt1 <<Delete rows(1::9);

For(i=1, i<=N Col(dt1), i++,
	Column(dt1, i) <<Set Name(Trim Whitespace(Column(dt1,i)[1]));
	Column(dt1, i) << Data Type("Numeric")<<Modeling type (Continuous)	
	);
	
dt1 <<Delete Rows(1);

dt3 =New Table("Recipe Info");

For(ii=1, ii<=N Rows(dt2), ii++,
	rlist =Words(Column(dt2,1)[ii], ":");  //Get the first element in the list and use it for column headings
	dt3 <<New Column(rlist[1]);           //Create/insert new columns into table dt3

	If(Regex(Trim Whitespace(Word(2,Column(dt2,1)[ii],":")), "^[a-zA-Z0-9]+$")==Trim Whitespace(Word(2,Column(dt2,1)[ii],":")),
		Column(dt3,ii) <<Data Type("Numeric")<<Modeling Type(Continous),
		Column(dt3,ii) <<Data Type("Character")
	);
		
	//dt3[ii,1] =rlist[2];   //It does not work.  The section below is for troubleshooting this problem.
);

a=1;
dt1[a,1]=300;   //this works	
dt3[a,1]=300;   //this does not work.  Why????????  What is different between dt1 and dt3?????


For(iii=1, iii<=N Rows(dt2), iii++,
	
	//strnlng =contains(Column(dt2,1)[ii], ":");  //Length of the string until ":" is found
	//show(Word(1,Column(dt2,1)[ii]),":");
	//cntword = Trim Whitespace(Word(2,Column(dt2,1)[iii],":"));
	//Show(cntword);
	dt3[iii,1] = Trim Whitespace(Word(2,Column(dt2,1)[iii],":"));
);





 

 

1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User


Re: Cleaning Up CSV File with Different Blocks of Data

Your first issue is that there are no rows in your dt3 data table.  If you add

dt3 << add rows(N Rows( dt2 ));

right after creating dt3, you will be able to populate that table.

The second issue is that what you want is not the second Word of the string, but rather, everything following the first ":".  Therefore, using the Substr() function will work better:

dt3[iii, 1] = Trim Whitespace( substr(Column( dt2, 1 )[iii], contains(column( dt2, 1 ), ":") ) )
Jim

View solution in original post

2 REPLIES 2
txnelson
Super User


Re: Cleaning Up CSV File with Different Blocks of Data

Your first issue is that there are no rows in your dt3 data table.  If you add

dt3 << add rows(N Rows( dt2 ));

right after creating dt3, you will be able to populate that table.

The second issue is that what you want is not the second Word of the string, but rather, everything following the first ":".  Therefore, using the Substr() function will work better:

dt3[iii, 1] = Trim Whitespace( substr(Column( dt2, 1 )[iii], contains(column( dt2, 1 ), ":") ) )
Jim
Artemio
Level II


Re: Cleaning Up CSV File with Different Blocks of Data

Mr. Nelson,

Wow! I was not expecting such a fast response...... I have implemented your first suggestion, with a minor change, and it works. Very nice.

You have teach me a valuable lesson. It never crossed my mind that the new table did not have any rows associated with it.

Tomorrow morning I will start working with the second piece.

Thank you so much


Art