cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Discussions

Solve problems, and share tips and tricks with other JMP users.
Choose Language Hide Translation Bar
Yass
Level IV

Handling two different Excel header formats

Hi everyone,
 
I'm working on a JSL script in JMP to automate the import of BOM (Bill of Materials) Excel files. These files come from different sources and unfortunately don't follow a consistent format.
 
I’m dealing with two specific formats:
- Format 1: headers are on row 5, data starts on row 6
- Format 2: headers are on row 6, data starts on row 7
 
Right now, my script is hardcoded to use Format 1 (Headers Start on Row(5)), but I need to manually adjust it whenever I receive files in Format 2.
 
Goal: I’d like to make the script smart enough to automatically detect the correct format and apply the appropriate import settings (Headers Start on Row, Data Starts on Row), without manual intervention.
 
Here’s a snippet from the current script:
For( i = 1, i <= N Items (namefiles), i++,
	//Transformation du fichier en table de données
	dt = Open(
		namefiles[i],
		Worksheets( "BOM" ),
		Use for all sheets( 1 ),
		Concatenate Worksheets( 0 ),
		Create Concatenation Column( 0 ),
		Worksheet Settings(
			1,
			Has Column Headers( 1 ),
			Number of Rows in Headers( 1 ),
			Headers Start on Row( 5 ),
			Data Starts on Row( 6 ),
			Data Starts on Column( 1 ),
			Data Ends on Row( 0 ),
			Data Ends on Column( 0 ),
			Replicated Spanned Rows( 1 ),
			Replicated Spanned Headers( 0 ),
			Suppress Hidden Rows( 1 ),
			Suppress Hidden Columns( 1 ),
			Suppress Empty Columns( 1 ),
			Treat as Hierarchy( 0 ),
			Multiple Series Stack( 0 ),
			Import Cell Colors( 0 ),
			Limit Column Detect( 0 ),
			Column Separator String( "-" )
		)
	);
...
...
...

Has anyone faced a similar issue? I'm looking for:

- A way to inspect the first few rows of the Excel file (maybe using a temporary import) to infer where the header actually starts. For example, by checking for known column names.

- Or any JSL techniques to handle this kind of variability more flexibly.

2 ACCEPTED SOLUTIONS

Accepted Solutions
jthi
Super User

Re: Handling two different Excel header formats

Jim's suggestion is great option, other option would be to pull in a bit more data and then do cleanup in JMP

Names Default To Here(1);

path = "$DOWNLOADS/BOM_191910000_Sample_Row5.xlsx";
//path = "$DOWNLOADS/BOM_212100000-A2_Sample_Row6.xlsx";

dt = Open(
	path,
	Worksheets("BOM"),
	Use for all sheets(1),
	Concatenate Worksheets(0),
	Create Concatenation Column(0),
	Worksheet Settings(
		1,
		Has Column Headers(0),
		Data Starts on Row(5),
		Data Starts on Column(1),
		Data Ends on Row(0),
		Data Ends on Column(0),
		Replicated Spanned Rows(1),
		Replicated Spanned Headers(0),
		Suppress Hidden Rows(1),
		Suppress Hidden Columns(1),
		Suppress Empty Columns(1),
		Treat as Hierarchy(0),
		Multiple Series Stack(0),
		Import Cell Colors(0),
		Limit Column Detect(0),
		Column Separator String("-")
	)
);

header_row = Contains(dt[0, 1], "PART NUMBER");
If(header_row > 1,
	dt << Delete Rows(1::header_row-1);
);
dt << Move up;

//empty_rows = dt << Get Rows Where(IsMissing(AsColumn(dt, 1)));
//dt << delete rows(empty_rows);
-Jarmo

View solution in original post

txnelson
Super User

Re: Handling two different Excel header formats

In your spreadsheets, f the name of the first column that is read in using headers of length 6 contains a valid number then the data start row is 6, else it is 6.

Here is an example that you can start with


For( i = 1, i <= N Items( namefiles ), i++, 
	//Transformation du fichier en table de données
	dt = Open(
		namefiles[i],
		Worksheets( "BOM" ),
		Use for all sheets( 1 ),
		Concatenate Worksheets( 0 ),
		Create Concatenation Column( 0 ),
		Worksheet Settings(
			1,
			Has Column Headers( 1 ),
			Number of Rows in Headers( 6 ),
			Headers Start on Row( 1 ),
			Data Starts on Row( 7 ),
			Data Starts on Column( 1 ),
			Data Ends on Row( 7 ),
			Data Ends on Column( 0 ),
			Replicated Spanned Rows( 1 ),
			Replicated Spanned Headers( 0 ),
			Suppress Hidden Rows( 1 ),
			Suppress Hidden Columns( 1 ),
			Suppress Empty Columns( 1 ),
			Treat as Hierarchy( 0 ),
			Multiple Series Stack( 0 ),
			Import Cell Colors( 0 ),
			Limit Column Detect( 0 ),
			Column Separator String( "-" )
		)
	);

	If( Is Missing( Num( Word( -1, Column( 1 )[1] ) ) ) == 0,
		whatStartRow = 7;
		whatHeaderStartRow = 6;
	,
		whatStartRow = 6;
		whatHeaderStartRow = 5;
	);
	Close( dt, nosave );

	dt = Open(
		namefiles[i],
		Worksheets( "BOM" ),
		Use for all sheets( 1 ),
		Concatenate Worksheets( 0 ),
		Create Concatenation Column( 0 ),
		Worksheet Settings(
			1,
			Has Column Headers( 1 ),
			Number of Rows in Headers( 1 ),
			Headers Start on Row( whatHeaderStartRow ),
			Data Starts on Row( whatStartRow ),
			Data Starts on Column( 1 ),
			Data Ends on Row( 0 ),
			Data Ends on Column( 0 ),
			Replicated Spanned Rows( 1 ),
			Replicated Spanned Headers( 0 ),
			Suppress Hidden Rows( 1 ),
			Suppress Hidden Columns( 1 ),
			Suppress Empty Columns( 1 ),
			Treat as Hierarchy( 0 ),
			Multiple Series Stack( 0 ),
			Import Cell Colors( 0 ),
			Limit Column Detect( 0 ),
			Column Separator String( "-" )
		)
	);
);
Jim

View solution in original post

4 REPLIES 4
txnelson
Super User

Re: Handling two different Excel header formats

Could you provide a sample of each of the 2 different formatted files.  That would really help to determine the approach to take.  Basically what I typically do, is to read in the first 7 rows as a single column.  The parse through the table to determine what format it is, and then read in the full table using the determined format.

Jim
Yass
Level IV

Re: Handling two different Excel header formats

Thanks a lot for your reply.

I'm attaching the two samples, one for each format.

I’ll look into how to implement what you suggested to see if I can successfully read the first 7 rows as a single column and Then parse through the table to determine its format and read the full table using the determined format.

I really appreciate your help!

 

jthi
Super User

Re: Handling two different Excel header formats

Jim's suggestion is great option, other option would be to pull in a bit more data and then do cleanup in JMP

Names Default To Here(1);

path = "$DOWNLOADS/BOM_191910000_Sample_Row5.xlsx";
//path = "$DOWNLOADS/BOM_212100000-A2_Sample_Row6.xlsx";

dt = Open(
	path,
	Worksheets("BOM"),
	Use for all sheets(1),
	Concatenate Worksheets(0),
	Create Concatenation Column(0),
	Worksheet Settings(
		1,
		Has Column Headers(0),
		Data Starts on Row(5),
		Data Starts on Column(1),
		Data Ends on Row(0),
		Data Ends on Column(0),
		Replicated Spanned Rows(1),
		Replicated Spanned Headers(0),
		Suppress Hidden Rows(1),
		Suppress Hidden Columns(1),
		Suppress Empty Columns(1),
		Treat as Hierarchy(0),
		Multiple Series Stack(0),
		Import Cell Colors(0),
		Limit Column Detect(0),
		Column Separator String("-")
	)
);

header_row = Contains(dt[0, 1], "PART NUMBER");
If(header_row > 1,
	dt << Delete Rows(1::header_row-1);
);
dt << Move up;

//empty_rows = dt << Get Rows Where(IsMissing(AsColumn(dt, 1)));
//dt << delete rows(empty_rows);
-Jarmo
txnelson
Super User

Re: Handling two different Excel header formats

In your spreadsheets, f the name of the first column that is read in using headers of length 6 contains a valid number then the data start row is 6, else it is 6.

Here is an example that you can start with


For( i = 1, i <= N Items( namefiles ), i++, 
	//Transformation du fichier en table de données
	dt = Open(
		namefiles[i],
		Worksheets( "BOM" ),
		Use for all sheets( 1 ),
		Concatenate Worksheets( 0 ),
		Create Concatenation Column( 0 ),
		Worksheet Settings(
			1,
			Has Column Headers( 1 ),
			Number of Rows in Headers( 6 ),
			Headers Start on Row( 1 ),
			Data Starts on Row( 7 ),
			Data Starts on Column( 1 ),
			Data Ends on Row( 7 ),
			Data Ends on Column( 0 ),
			Replicated Spanned Rows( 1 ),
			Replicated Spanned Headers( 0 ),
			Suppress Hidden Rows( 1 ),
			Suppress Hidden Columns( 1 ),
			Suppress Empty Columns( 1 ),
			Treat as Hierarchy( 0 ),
			Multiple Series Stack( 0 ),
			Import Cell Colors( 0 ),
			Limit Column Detect( 0 ),
			Column Separator String( "-" )
		)
	);

	If( Is Missing( Num( Word( -1, Column( 1 )[1] ) ) ) == 0,
		whatStartRow = 7;
		whatHeaderStartRow = 6;
	,
		whatStartRow = 6;
		whatHeaderStartRow = 5;
	);
	Close( dt, nosave );

	dt = Open(
		namefiles[i],
		Worksheets( "BOM" ),
		Use for all sheets( 1 ),
		Concatenate Worksheets( 0 ),
		Create Concatenation Column( 0 ),
		Worksheet Settings(
			1,
			Has Column Headers( 1 ),
			Number of Rows in Headers( 1 ),
			Headers Start on Row( whatHeaderStartRow ),
			Data Starts on Row( whatStartRow ),
			Data Starts on Column( 1 ),
			Data Ends on Row( 0 ),
			Data Ends on Column( 0 ),
			Replicated Spanned Rows( 1 ),
			Replicated Spanned Headers( 0 ),
			Suppress Hidden Rows( 1 ),
			Suppress Hidden Columns( 1 ),
			Suppress Empty Columns( 1 ),
			Treat as Hierarchy( 0 ),
			Multiple Series Stack( 0 ),
			Import Cell Colors( 0 ),
			Limit Column Detect( 0 ),
			Column Separator String( "-" )
		)
	);
);
Jim

Recommended Articles