Dear JMP Community,
I have the task of writing a script that is intended to automate the importing of data from "messy" Excel files. I say "messy" because the actual data that I want to pull out of the Excel files is not organized in neat columns and rows -- it's all over the place -- but it's always in the same place in each Excel file. The files basically contain lot information on our production lots and I'm trying to pull the testing data out of these Excel files and put into a JMP data table.
There are multiple lot Excel files spread across multiple directories. To address that, I generate a list of all the Excel files I need to open using the Files In Directory() command, with the flag "recursive" so I also grab all sub directories when building the list. This creates a list that is several hundred items long. Call the list NLotFiles.
I first have a For loop that runs from i=1 to N Items(NLotFiles) long, so it needs to run a few hundred times through this bigger For loop.
Then, I need to open the file MULTIPLE times and import some information each time. I need to do this multiple times as information is not only spread across multiple cells, but also across multiple tabs in the Excel file.
The first open pulls the date and lot information. I then need to check if there is an entry in the first column, first row (if a date exists). If not, I want to close that data table and go back to the first For loop, going to the next iteration.
****This is where I get stuck.
The next multiple opens pulls all the data out into other tables that I then join together and make into one large data table. As long as there is an entry for the date in each file, this method works. But, not all files have a date -- e.g. no lot was made that day. If that happens, it still opens the file, but all the rest of the code doesn't work properly as it's based on the structure of the Excel file and the presence of the date.
Since there is no GOTO option in JSL, I'm having a hard time testing for the date value and closing the table if none and going back to the next iteration in the NLotFiles list. I'm testing for the existence of a date by using the Is Missing() function, and trying to use Continue(), but it's not working like I thought it would.
The basic structre of this code is something like this:
For( i= 1, i<=N Items( NLotFiles ), i++,
Open(
DataPath || "\"|| DataFiles[i],
Worksheets( "Coversheet" ),
Use for all sheets( 1 ),
Concatenate Worksheets( 0 ),
Create Concatenation Column( 0 ),
Worksheet Settings(
1,
Has Column Headers( 1 ),
Number of Rows in Headers( 1 ),
Headers Start on Row( 2 ),
Data Starts on Row( 3 ),
Data Starts on Column( 3 ),
Data Ends on Row( 3 ),
Data Ends on Column( 5 ),
Replicated Spanned Rows( 1 ),
Replicated Spanned Headers( 0 ),
Suppress Hidden Rows( 1 ),
Suppress Hidden Columns( 1 ),
Suppress Empty Columns( 1 ),
Import Cell Colors( 0 ),
Limit Column Detect( 0 ),
Column Separator String( "-" )
)
);
//This part tests if the date (first column, first row) is empty.
If(Is Missing(Column(Current Data Table(),1)[1])==1,
Close(Current Data Table(), no save),
Continue()
);
/////*****HERE IS WHERE I WANT IT TO GO BACK TO THE TOP IF NO DATE IS PRESENT BUT CONTINUE DOWN BELOW IF A DATE IS PRESENT*****
/////I don't think the Continue() is in the right place, but then if I don't have it present, the code just stops because all the stuff below can't be done.
//This part ensures the date is Numeric and Continuous
datedt = Current Data Table();
datedt:Date << Data Type( Numeric, Format( "m/d/y" ) ) << Modeling Type( "Continuous" );
//Next steps would bring in the data from different tabs as the date information (again, I have to do multiple opens on the same file.)
//Then join all the data, clean things up in the new data tables
); //This last ");" closes the big For loop that runs through a total of NLotFiles times.
//Then I concatenate the multiple tables (each table represents a lot now) into my final data table and close all unwanted tables.
It would be great if I could tell JMP which exact tabs and cells to import, like using a list of tab and cell numbers as these are always the same in the Excel files. This would eliminate the multiple looping needed to bring everything in.
Any help is greatly appreciated, especially if someone has a better idea of how to extract out the important information and make the code easier/simpler than it currently is. The actual code that I have is a total of about 300 lines (including empty lines), so it's not too bad. But simpler code is often easier to troubleshoot or someone else to code into aswell.
Thanks!,
DS