Martin Demel's Blog

martindemel · May 13, 2020 05:25 AM

In the big data era we would like to have all data in e.g. databases, hadoop servers, or access it directly from streams. However, still many data is saved or written in spreadsheets, updated manually or automatically by machines, engineers, lab people or other kind of operators. In JMP it is easy to import any kind of data from a spreadsheet to JMP. You can even automate this easily using the automatically generated "Source" script. But, if you run this script again a new data table opens and does not update the current data table.

Usually this is not exactly what you want. Why? You may have done some data cleaning or recoding already. You may have also done some analysis saved into the data table you want to use further on also for the new data. I will show you in this article a way to update your current data table with new data from a spreadsheet, preserving the rows you currently have and only adding the new rows from the spreadsheed.

There are two scenarios I want to cover:

In this article I cover the easy case of vertical structured data. That means you have one row with column names, and in the other rows beyond is the data.
In a second article I will cover the horizontal structured data case. That is first row/first column is the parameter name, in the following columns is the data. In the second row are the units, in the third row the batch number. Then an empty row and after that the same thing again for another parameter. This is easier to read in a spreadsheet but actually not very easy to analyze. I often see these kind of data, although it would be much better to train people to enter data in the vertical way than trying to workaround with the horizontal structure.

So let's take a look at the spreadsheet data, it's weather data, and it is simplistic for demonstrating the concept. Attached you will find the script in the JMP Data Table and the Excel spreadsheets for the first case.

Let's start with the situation

We have already imported an earlier version of the spreadsheet (3 rows of data to that point of time). Now we want to update the data table. You see an update script in the left top window of the data table. This script will do the following:

Access the current data table name and the number of rows in that data table
Set the spreadsheet file name and worksheet name we want to get the new data from
Opens the spreadsheet and read the worksheet data into a help data table.
1. We rely on the source script generated when we imported the data the first time.
2. But changed the "Data Starts on Row" entry from (2) to (2+n_Rows), so we only get the data we have not already imported
Now we only have to concatenate the two tables using the "Append to first table" option.

	cdt = Current Data Table();
	excelFile = "WeatherData.xlsx";
	name_worksheet = "WeatherData"; // or "Sheet1" for this Excel book example
	n_Rows = N Rows();

	dt_h1 = Open(
		excelFile,
		Worksheets( name_worksheet ),
		Use for all sheets( 1 ),
		Concatenate Worksheets( 1 ),
		Create Concatenation Column( 0 ),
		Worksheet Settings(
			1,
			Has Column Headers( 1 ),
			Number of Rows in Headers( 1 ),
			Headers Start on Row( 1 ),
			Data Starts on Row( 2 + n_Rows ),
			Data Starts on Column( 1 ),
			Data Ends on Row( 0 ),
			Data Ends on Column( 0 ),
			Replicated Spanned Rows( 1 ),
			Suppress Hidden Rows( 1 ),
			Suppress Hidden Columns( 1 ),
			Suppress Empty Columns( 1 ),
			Treat as Hierarchy( 0 )
		),
		Invisible
	);

	cdt << Concatenate  (dt_h1, Append to first table);

	Close(dt_h1);

Conclusion

Does not seem to difficult. The nice thing is that you can use e.g. data cleaning features like recode, save it as a formula column, and the new data will be handled automatically. Or you have created a predictive model with the previous data, saved the prediction formula back to the data table and see the predictions for the new data (and e.g. compare it to a new model, or the actual predicted data). I'm sure you will find plenty of use cases, and hope this will be useful for you.

Of course you can use this and extend it by adding analysis to the data, like updating a control chart or other trend analysis. Using Windows Scheduler you might want to then update your report and send it to our new product JMP Live, so other people (not necessarily licensed JMP users) can access the reports in a web browser with the interactivity of JMP like data filter, column switcher, and so forth.

What to expect next

In my next post I will talk about the second case which is slightly different, but not much more difficult.

I'm interested to hear if you have this kind of problem, or how your data is structured (in spreadsheets). Please leave your comments.

All the best,

Martin

PS: Two comment to the files:

In WeatherData.jmp you will find a script for updating the data table with the attached Excel File. Both files have to be saved in the same folder (this was for simplicity, you can easily change the hard coded File Name with e.g. Pick File(), and also extract the name.
The second script is "Remove last two rows" which allows to remove rows in the data table to update again from the Excel Sheet, without having to manually delete the rows. So all for your convenience :)

hogi · ‎03-11-2023

Alternatively, one can use the Join Platform ...

First load both tables:

Open( "$DOWNLOADS/WeatherData.jmp" );
Open( "$DOWNLOADS/WeatherData.xlsx", Worksheets( "Backup_NewData" ), );

Then the easiest way is to join them "by Row Number":

Data Table( "WeatherData" ) << Join(
	With( Data Table( "Backup_NewData" ) ),
	Merge Same Name Columns,
	By Row Number
);

Alternatively, the Date column can be used to identify individual entries independent of the row number.
This allows a more robust way to joining the tables via Match Columns:
(NB: due to the 1 min time offset, Oct. 15 & 16 will show up twice)

Data Table( "WeatherData" ) << Join(
	With( Data Table( "Backup_NewData" ) ),
	Merge Same Name Columns,
	By Matching Columns( :Date = :Date ),
	Drop multiples( 0, 0 ),
	Include Nonmatches( 1, 1 )
)

hogi · ‎03-11-2023

I just wondered what's the difference between "Update main table with data from second table" and "merge same name columns" - here they do exactly the same, right?

After fidling around for some while , I found a difference when using a second table with additional columns:

- "Update main table with data from second table" updates the main table (by merging same name columns), no additional columns

- "Merge Same Name columns" does the same, but in addition, it adds the new columns from the second table.

Are there more differences?

martindemel · ‎03-13-2023

Hi hogi, thanks for your additional options, very nice.

and you are right:

"Update main table with data from the second table" will just use the columns in the current table and update the columns if the column names match. It will not add any of the other columns from the second table.

"merge same name columns" will also merge/overwrite/fill empty spots with the data in similar named columns. But it will add the other columns f the second table, too.

For the first one could also use the Tables->Update functionality, especially if you do not want to create a new data table with this operation but update the existing one with data from the second.

I'm not aware that there is any other difference.

hogi · ‎04-01-2023

another approach:

1) load the new data via Source Script

2) remove all rows from the main table

3) append the new data via concatenate

dt = Current Data Table();
dtnew = dt << Run Script( "Source" );
dt << Select All Rows << Delete Rows;
dt << Concatenate( dtnew, Append to first table );
Close( dtnew, NoSave );

martindemel · ‎04-04-2023

Hi hogi,

this would be an option if you do not make any changes to your previous data. E.g you might did already some quality checks of your data and change or excluded data or made additional comments or whatever. As soon you changed something this way of appending the data will eliminate all of your changes which usually is not desired.

jwiltsie · ‎05-05-2025

@martindemel, you mentioned above:

For the first one could also use the Tables->Update functionality, especially if you do not want to create a new data table with this operation but update the existing one with data from the second.

Which is exactly what I want to do, but 'Update' does not have an option to add rows from the new table, it only updates existing data. And 'Join' does not have an option to NOT create a new data table. This is very frustrating since I am using this data table with a virtual join and the virtual join link breaks with the new data table.

I found a JMP Wishlist item for this: Tables join: option to update main table. Please upvote the Wishlist item if you would like to see this implemented in JMP.