Discussions

Ethanol_Guy · Feb 6, 2025 05:00 PM

Hello,

I'm not sure if there's a great solution for my specific problem, but I'd love to hear thoughts from the experts...

In my utilization of JMP 17, I often add new data to an existing file. Each row will have its own unique "batch number" assigned to it. When I get a new set of data, there will be new batch numbers that I need to add to my existing file. To date, the easiest way to do this would be just to copy the new batch numbers over to the master file and run a Table Update function to bring all of the necessary data over, but I'm looking for an option to do this in a more automated fashion to reduce the "data processing" time.

Essentially the bottom line is, am I stuck using one of the methods below, or is there a more creative way I can accomplish this task through JSL?

What I've tried:

Add rows and then highlight the last couple batch numbers and Fill -> Continue Sequence to End of Table
- Adding rows can be accomplished through JSL, but I've struggled to find an autonomous way to fill to the end of the table...
Copy and paste batch numbers from the data import into the data master, then run an Update Table function to pull the rest of the data in
- Tried-and-true method that works well every time, just need it to be a little more automated than this.
Concatenate data tables and then delete the duplicates
- This worked really well, but sometimes the data in the last handful of rows are incomplete in the master, but would be complete with the data update. Using this method deletes the older duplicates when it would be more ideal to keep the newer duplicates. This method could work really well if I could keep the newer duplicates.
Utilize a Row() formula in an adjacent column or for the batch number column itself.
- We almost never start on Batch #1, but this is easily accommodated by adding Row()+(a number). i.e. if the first batch is 6543, then my formula would be [Row() + 6542] and the correct count would be calculated.
- This also would work well, but sometimes batch numbers are skipped. This would cause the calculated column not to match the batch number column unless I want to have gaps in the data at times

txnelson · Feb 7, 2025 10:58 AM

You can use

Tables=>Join

which will create a new table with all of the data from the original table along with the new rows from the new inputted table. You can then simply delete the original table and rename the new joined table.

Jim

View solution in original post

txnelson · Feb 6, 2025 05:35 PM

If you can define the rules and the exceptions then JSL can be written to solve your issue. I am having a hard time following everything you have stated, but it seems that if you can define where to get the first batch number, and what to do for irregularities, and then how to detect and then handle the case where there are skipped batch numbers, that can all be handled in JSL.

Jim

Ethanol_Guy · Feb 7, 2025 5:46 AM

It's really a simple exercise...say I have one "master" data set with batches 19257 to 19640. I received a new data set with batches 19620 to 19665 that I now need to add to the "master" data set so they're all in one file. Normal practice is to copy these new batch numbers over to the master and then run Tables -> Update, match the two tables Batch = Batch and then the update brings all of the new data over.

My biggest hang-up is utilizing JSL to bring the batch numbers over from the new data to the master data, or getting them to fill to the end of the table autonomously. Adding rows, running Table Update, and deleting unused rows is the easy part.

txnelson · Feb 7, 2025 10:58 AM

You can use

Tables=>Join

which will create a new table with all of the data from the original table along with the new rows from the new inputted table. You can then simply delete the original table and rename the new joined table.

Jim

Ethanol_Guy · Feb 28, 2025 03:00 PM

This works well, but the new table then loses the row states that may be applied in the first data table. I know I can use a Row State column, but I need all of the actions to be recorded in the Log and "copy from row states" doesn't record in the log for some reason. The idea is for all of this to be recorded in the log so my peers can copy from there to create this data update script without needing to learn the coding behind it.

Concatenate might be the option I'm ultimately looking for, but I need a way to delete duplicates while keeping the SECOND duplicate. Most of the information across dozens of columns will be identical, but the first duplicate will have some missing data toward the end of the column list. Therefore the second duplicate will provide a more complete data set. As such:

Batch #

1

2

3 (duplicate, delete)

4 (duplicate, delete)

5 (duplicate, delete)

[below would be the concatenated data with some duplicate batches]

3 (duplicate, keep)

4 (duplicate, keep)

5 (duplicate, keep)

6

7

8

txnelson · Feb 28, 2025 11:27 PM

I cannot envision what you are describing when you say: "The idea is for all of this to be recorded in the log so my peers can copy from there to create this data update script without needing to learn the coding behind it.". Is there some way that you could step through exactly what is being provided in the log that the users are copying and using?

Concerning the duplicate batches, I suggest you look in the Scripting Index for Select Duplicate Rows. It will find all of the rows that have previously been seen. You could then loop through the found duplicates and from that find the previous row that has been duplicated and then delete that row.

Jim

Ethanol_Guy · Mar 4, 2025 11:48 AM

When I make that statement, I'm referring to the series of steps that are involved with bringing new data into an existing data set, including:

Open and import a new Excel data set that was previously saved with a specific filename in a specific location
Add that data to the existing data set to make it more complete
"Clean up" the data (i.e. delete columns that the new data set adds but aren't needed, reformat other columns, whatever else might be needed)

All of these actions are added to the Log, and the JSL that the Log generates can be copied into a new script on the data table. Not many of my peers are proficient with JSL, so we need this simplicity to allow the entire team to make this project repeatable for everyone. If I need to add any JSL code to a script manually that the Log does not capture, it makes the end product much more difficult to implement for the team.

How would one go about the "loop through the found duplicates and from that find the previous row that has been duplicated and then delete that row" part of your suggestion? I think that could be the last missing piece to putting this all together.

Discussions

Need to get creative with continuing sequence to end of table

Re: Need to get creative with continuing sequence to end of table

Re: Need to get creative with continuing sequence to end of table

Re: Need to get creative with continuing sequence to end of table

Re: Need to get creative with continuing sequence to end of table

Re: Need to get creative with continuing sequence to end of table

Re: Need to get creative with continuing sequence to end of table

Re: Need to get creative with continuing sequence to end of table

Recommended Articles