- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Cleaning Up CSV File with Different Blocks of Data
I have several csv files that need to be clean up. The files in question have two basic data block structures. The first block consist of 1 column with 7 rows of information. The second segment contains 6 columns with 5+ rows of data.
For the first block, the ideal transformation is to break the column into two pieces. One piece will provide the header or name of the column. And, the second piece will supply data information about that specific experiment.
The second block needs minimum intervention.
So far, I have managed to get close to the desired state, with the exception of a couple of nagging points. First, breaking the csv file into two pieces –two tables- is not a problem. The first data table –dt1- is the original csv file from which data is extracted and some rows deleted to create a second data table –dt2. The second data table contains the 1 column with 7 rows; which need further treatment.
In order to do this, I start with a new data table -dt3. I have succeeded in creating the column headings with a For Loop and taking the first element of a list. This information comes from dt2. The next portion is the one that is giving me a problem. I need to pass or send the second element of the list to the first row of dt3.
First question:
Why the following line is able to change the first cell of column 1 in the original data table (dt1) but the next piece of code does not work on the new data table?
dt1[1,1]=300; //this works
dt3[1,1]=300; //this does not work. It produces an error.
invalid subscript (must be number or list of numbers) in access or evaluation of 'Assign' , dt3[1, 1] = /*###*/300 /*###*/
Second question
I using : as the character identifier to separate the first seven rows into header information and row data. The downside of this approach is that Date/Time row has 3 of these characters and data is lost after the second found :. How can I prevent this from happening?
Any ideas or suggestions? The attached Excel file is an example of data table.
Thank you so much for any assistance
Example of original csv file
Food Recipe: 1-Thick-sauce |
|
|
| ||
Blender Recipe: 3-speed |
|
|
| ||
Temperature Recipe: 100 Steam |
|
|
| ||
Cooking Hardware: Stainless |
|
|
| ||
Batch ID: Mini-Cake-100 |
|
|
| ||
Sample ID: Mini-Cake-100.1 |
|
| |||
Date/Time: 6/30/2017 4:12:44 PM |
|
| |||
|
|
|
|
|
|
|
|
|
|
|
|
Point | Density | Sugar | Ratio | X | Y |
1 | 5020.607 | 98.6872 | 0.9976 | 145 | 0 |
2 | 4976.009 | 98.1275 | 0.9955 | 108.75 | 0 |
3 | 4977.177 | 98.3677 | 0.9958 | 72.5 | 0 |
4 | 4982.821 | 98.2569 | 0.9957 | 36.25 | 0 |
5 | 4989.282 | 98.2105 | 0.9977 | 0 | 0 |
Example of data table 2 (dt2). 1 Column with 7 rows. The : separates the column name from the experimental information.
Food Recipe: 1-Thick-sauce |
Blender Recipe: 3-speed |
Temperature Recipe: 100 Steam |
Cooking Hardware: Stainless |
Batch ID: Mini-Cake-100 |
Sample ID: Mini-Cake-100.1 |
Date/Time: 6/30/2017 4:12:44 PM |
Desired transformation of dt2.
Food Recipe | Blender Recipe | Temperature Recipe | Cooking Hardware | Batch ID | Sample ID | Date/Time |
1-Thick-sauce | 3-speed | 100 Steam | Stainless | Mini-Cake-100 | Mini-Cake-100.1 | ######### |
JSL Code
Names Default To Here( 1 );
Close All(data tables);
Clear Log();
Clear Symbols();
Close All(Journals, No Save);
//***********************************************************************************************************************************************************
/*Open collected data from a share drive
and assign working data table(s) -->dt1
*/
dt1 = Open(
"C:\Users\...................filename.csv",
columns(
Column( "c000001", Character, "Nominal" ),
Column( "c000002", Character, "Nominal" ),
Column( "c000003", Character, "Nominal" ),
Column( "c000004", Character, "Nominal" ),
Column( "c000005", Character, "Nominal" ),
Column( "c000006", Character, "Nominal" )
),
Import Settings(
End Of Line( CRLF, CR, LF ),
End Of Field( Comma, CSV( 0 ) ),
Strip Quotes( 1 ),
Use Apostrophe as Quotation Mark( 0 ),
Use Regional Settings( 0 ),
Scan Whole File( 1 ),
Treat empty columns as numeric( 0 ),
CompressNumericColumns( 0 ),
CompressCharacterColumns( 0 ),
CompressAllowListCheck( 0 ),
Labels( 0 ),
Column Names Start( 1 ),
Data Starts( 1 ),
Lines To Read( "All" ),
Year Rule( "20xx" )
)
);
//The raw data table repeats information every 48 rows
//This table separates each item into individual tables
//The generated table does not have formated column headings
//For(i=1, i <=x, i++, //this For loop is ignored for now until dt2 --> dt problem is solved
selrows =dt1 <<Select Rows(1::7);
dt2 =dt1<<Subset(selrows);
dt1 <<Delete rows(1::9);
For(i=1, i<=N Col(dt1), i++,
Column(dt1, i) <<Set Name(Trim Whitespace(Column(dt1,i)[1]));
Column(dt1, i) << Data Type("Numeric")<<Modeling type (Continuous)
);
dt1 <<Delete Rows(1);
dt3 =New Table("Recipe Info");
For(ii=1, ii<=N Rows(dt2), ii++,
rlist =Words(Column(dt2,1)[ii], ":"); //Get the first element in the list and use it for column headings
dt3 <<New Column(rlist[1]); //Create/insert new columns into table dt3
If(Regex(Trim Whitespace(Word(2,Column(dt2,1)[ii],":")), "^[a-zA-Z0-9]+$")==Trim Whitespace(Word(2,Column(dt2,1)[ii],":")),
Column(dt3,ii) <<Data Type("Numeric")<<Modeling Type(Continous),
Column(dt3,ii) <<Data Type("Character")
);
//dt3[ii,1] =rlist[2]; //It does not work. The section below is for troubleshooting this problem.
);
a=1;
dt1[a,1]=300; //this works
dt3[a,1]=300; //this does not work. Why???????? What is different between dt1 and dt3?????
For(iii=1, iii<=N Rows(dt2), iii++,
//strnlng =contains(Column(dt2,1)[ii], ":"); //Length of the string until ":" is found
//show(Word(1,Column(dt2,1)[ii]),":");
//cntword = Trim Whitespace(Word(2,Column(dt2,1)[iii],":"));
//Show(cntword);
dt3[iii,1] = Trim Whitespace(Word(2,Column(dt2,1)[iii],":"));
);
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Cleaning Up CSV File with Different Blocks of Data
Your first issue is that there are no rows in your dt3 data table. If you add
dt3 << add rows(N Rows( dt2 ));
right after creating dt3, you will be able to populate that table.
The second issue is that what you want is not the second Word of the string, but rather, everything following the first ":". Therefore, using the Substr() function will work better:
dt3[iii, 1] = Trim Whitespace( substr(Column( dt2, 1 )[iii], contains(column( dt2, 1 ), ":") ) )
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Cleaning Up CSV File with Different Blocks of Data
Your first issue is that there are no rows in your dt3 data table. If you add
dt3 << add rows(N Rows( dt2 ));
right after creating dt3, you will be able to populate that table.
The second issue is that what you want is not the second Word of the string, but rather, everything following the first ":". Therefore, using the Substr() function will work better:
dt3[iii, 1] = Trim Whitespace( substr(Column( dt2, 1 )[iii], contains(column( dt2, 1 ), ":") ) )
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Cleaning Up CSV File with Different Blocks of Data
Wow! I was not expecting such a fast response...... I have implemented your first suggestion, with a minor change, and it works. Very nice.
You have teach me a valuable lesson. It never crossed my mind that the new table did not have any rows associated with it.
Tomorrow morning I will start working with the second piece.
Thank you so much
Art