cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
SDF1
Super User

Importing messy Excel files & testing if null

Dear JMP Community,

 

  I have the task of writing a script that is intended to automate the importing of data from "messy" Excel files. I say "messy" because the actual data that I want to pull out of the Excel files is not organized in neat columns and rows -- it's all over the place -- but it's always in the same place in each Excel file. The files basically contain lot information on our production lots and I'm trying to pull the testing data out of these Excel files and put into a JMP data table.

 

  There are multiple lot Excel files spread across multiple directories. To address that, I generate a list of all the Excel files I need to open using the Files In Directory() command, with the flag "recursive" so I also grab all sub directories when building the list. This creates a list that is several hundred items long. Call the list NLotFiles.

 

  I first have a For loop that runs from i=1 to N Items(NLotFiles) long, so it needs to run a few hundred times through this bigger For loop.

 

  Then, I need to open the file MULTIPLE times and import some information each time. I need to do this multiple times as information is not only spread across multiple cells, but also across multiple tabs in the Excel file.

 

  The first open pulls the date and lot information. I then need to check if there is an entry in the first column, first row (if a date exists). If not, I want to close that data table and go back to the first For loop, going to the next iteration.

****This is where I get stuck.

 

  The next multiple opens pulls all the data out into other tables that I then join together and make into one large data table. As long as there is an entry for the date in each file, this method works. But, not all files have a date -- e.g. no lot was made that day. If that happens, it still opens the file, but all the rest of the code doesn't work properly as it's based on the structure of the Excel file and the presence of the date.

 

  Since there is no GOTO option in JSL, I'm having a hard time testing for the date value and closing the table if none and going back to the next iteration in the NLotFiles list. I'm testing for the existence of a date by using the Is Missing() function, and trying to use Continue(), but it's not working like I thought it would.

 

The basic structre of this code is something like this:

 

For( i= 1, i<=N Items( NLotFiles ), i++, 
	Open(
		DataPath || "\"|| DataFiles[i],
		Worksheets( "Coversheet" ),
		Use for all sheets( 1 ),
		Concatenate Worksheets( 0 ),
		Create Concatenation Column( 0 ),
		Worksheet Settings(
			1,
			Has Column Headers( 1 ),
			Number of Rows in Headers( 1 ),
			Headers Start on Row( 2 ),
			Data Starts on Row( 3 ),
			Data Starts on Column( 3 ),
			Data Ends on Row( 3 ),
			Data Ends on Column( 5 ),
			Replicated Spanned Rows( 1 ),
			Replicated Spanned Headers( 0 ),
			Suppress Hidden Rows( 1 ),
			Suppress Hidden Columns( 1 ),
			Suppress Empty Columns( 1 ),
			Import Cell Colors( 0 ),
			Limit Column Detect( 0 ),
			Column Separator String( "-" )
		)
	);

//This part tests if the date (first column, first row) is empty.
	If(Is Missing(Column(Current Data Table(),1)[1])==1,
		Close(Current Data Table(), no save),
                Continue()
	);

/////*****HERE IS WHERE I WANT IT TO GO BACK TO THE TOP IF NO DATE IS PRESENT BUT CONTINUE DOWN BELOW IF A DATE IS PRESENT*****
/////I don't think the Continue() is in the right place, but then if I don't have it present, the code just stops because all the stuff below can't be done.


//This part ensures the date is Numeric and Continuous datedt = Current Data Table(); datedt:Date << Data Type( Numeric, Format( "m/d/y" ) ) << Modeling Type( "Continuous" ); //Next steps would bring in the data from different tabs as the date information (again, I have to do multiple opens on the same file.) //Then join all the data, clean things up in the new data tables

); //This last ");" closes the big For loop that runs through a total of NLotFiles times.

//Then I concatenate the multiple tables (each table represents a lot now) into my final data table and close all unwanted tables.

  It would be great if I could tell JMP which exact tabs and cells to import, like using a list of tab and cell numbers as these are always the same in the Excel files. This would eliminate the multiple looping needed to bring everything in. 

 

  Any help is greatly appreciated, especially if someone has a better idea of how to extract out the important information and make the code easier/simpler than it currently is. The actual code that I have is a total of about 300 lines (including empty lines), so it's not too bad. But simpler code is often easier to troubleshoot or someone else to code into aswell.

 

Thanks!,

DS

1 ACCEPTED SOLUTION

Accepted Solutions
SDF1
Super User

Re: Importing messy Excel files & testing if null

Hi @gzmorgan0,

 

  Thanks for your feedback. I think I had tried that and it still didn't work. The problem wasn't so much the "glue" than the big outer For loop continues after exiting the "If(Is Missing..." part and doesn't know what to do at later parts in the code because the data table that would have been opened doesn't exist anymore.

 

  I just found a workaround yesterday that I tested this morning, and it works perfectly. It ran exactly like I thought it would.

 

  I found that if I changed the "Supress Empty Columns(1)" (which is the default when opening Excel files) to "Supress Empty Columns(0)", it would open an empty data table where I could set the the date to 1/1/1904 (i.e. 0), then, after I've built the big data table from the multiple hundred files, I searched for all rows where date=0 and just deleted them.

 

  It might not be the most elegant solution, but since JSL doesn't have any GoTo option, this is the best I could do -- at least what I'm capable of doing. So, it makes a bunch of empty entries, but these are easy to search for rather than close a table in mid-for loop.

 

Thanks!,

DS

View solution in original post

2 REPLIES 2
gzmorgan0
Super User (Alumni)

Re: Importing messy Excel files & testing if null

@SDF1 ,

 

I have not thoroughly digested your described problem, however, I spotted and error in your code, that might provide you some progress.

 

Your JSL:

//This part tests if the date (first column, first row) is empty.
	If(Is Missing(Column(Current Data Table(),1)[1])==1,
		Close(Current Data Table(), no save),
                Continue()
	);

The error is the comma after the close. I think you want to close AND continue. The commas should be a semicolon. The semicolon is glue for each JSL command/action you want to take, i.e., it defines the block of code to be executed.

 

Replace it with this and your Continue() should work:

//This part tests if the date (first column, first row) is empty.
	If(Is Missing(Column(Current Data Table(),1)[1])==1,
		Close(Current Data Table(), no save);
                Continue()
	);

 

SDF1
Super User

Re: Importing messy Excel files & testing if null

Hi @gzmorgan0,

 

  Thanks for your feedback. I think I had tried that and it still didn't work. The problem wasn't so much the "glue" than the big outer For loop continues after exiting the "If(Is Missing..." part and doesn't know what to do at later parts in the code because the data table that would have been opened doesn't exist anymore.

 

  I just found a workaround yesterday that I tested this morning, and it works perfectly. It ran exactly like I thought it would.

 

  I found that if I changed the "Supress Empty Columns(1)" (which is the default when opening Excel files) to "Supress Empty Columns(0)", it would open an empty data table where I could set the the date to 1/1/1904 (i.e. 0), then, after I've built the big data table from the multiple hundred files, I searched for all rows where date=0 and just deleted them.

 

  It might not be the most elegant solution, but since JSL doesn't have any GoTo option, this is the best I could do -- at least what I'm capable of doing. So, it makes a bunch of empty entries, but these are easy to search for rather than close a table in mid-for loop.

 

Thanks!,

DS