Functional DOE: Tips and Tricks for Design and Organizing Data (2022-US-45MP-1152)

2 Kudos

Andrea Coombs, Sr. Systems Engineer, JMP

The Functional Data Explorer (FDE) in JMP Pro allows for analysis of a DOE where the response is a curve. The entire functional DOE analysis workflow can be done within FDE - from smoothing response curves all the way to fitting the functional DOE model and optimization with the Profiler. But what about setting up the design and organizing your data for functional DOE analysis? This presentation will help you understand options around functional DOE design using the Custom Design platform and organizing your data using table manipulations such as Stack, Split, Join, and Update.

Hi, everyone. My name is Andrea Coombs.

I'm a Senior Systems Engineer, supporting customers

from major accounts in the eastern part of the US.

Today I'm going to be talking about functional DOE

and specifically around design of your functional DOEs

and how to prepare your data for analysis.

I'm going to turn off my video for presentation here.

Let's go in and look at the goals.

Really, the goals are very simple here.

I want to cover some tips and tricks for setting up your functional DOE

using the Custom Design Platform

and also give you some tips and tricks

for adding functional data to your DOE data table.

I am going to be using J MP Pro 16.2 during this presentation.

Let's start off by defining what is functional data,

what is functional data analysis, and specifically, what is functional DOE.

Functional data is really curved data.

It's any data that unfolds over a continuum,

and there's a lot of data that is inherently functional in form.

You can think about time series data,

sensor streams from a manufacturing process,

spectra data that's produced by lots of different types of equipment,

measurements taken over a range of temperatures.

And the Functional Data Explorer in JMP Pro makes it really easy

to solve many kinds of problems with functional data.

Here we have an example of some functional data.

Here we have a plot of Home Price Index for New Jersey, from 1990 to 2021.

Llike many functional data,

we don't get it as this smooth curve, like we see here,

but rather we get a series of discrete Index values.

So we get one value that represents the value for the X, the year,

and the value for Y, the Home Price Index.

With functional data analysis,

it isn't typically just one point of the curve that we are interested,

or even the collection of points from a single curve.

We typically have a collection of curves.

Here you can see we have the Home Price Index over time,

for the 50 states, plus the District of Columbia.

And when we're doing functional data analysis,

we want to understand the variability around these curves.

Often we want to understand what are the variables

that drive the variability in our curves.

Or maybe you want to use the variability in our curves to predict another outcome.

Functional data analysis

is going to use all of the information contained in the curves.

We're not going to leave any information behind.

To model the curves directly,

we can treat the curves as first class data objects,

in the same way that JMP will treat traditional types of scalar data.

When I'm thinking about functional data analysis,

I like to break this down into four steps.

The first step is to take the collection of curves

and to smooth the individual curves.

The next thing is we'll determine the mean of those curves and the shape components.

These shape components represent the variability around the mean.

Next, we extract the magnitude of each of these shape components.

Knowing the magnitude of the shape components,

the function that describes the shape component,

the function that describes the mean, we are able to reproduce all of our curves

by just knowing these two shape component scores.

Now we can use the shape component scores in an analysis.

Here what I've done is I've done a cluster analysis,

and I've defined four groups of my curve shapes.

In the Functional Data Explorer itself,

there are two primary questions that can be answered.

The first is about how to adjust process settings and product inputs

to achieve a desired function or spectral shape.

We call this Functional DOE analysis, or FunDoE, for short.

The second question we can ask is, how can I use entire functions

to predict something like yield or quality attributes?

We call this Functional m achine learning or FunML, for short.

Today we're going to be focusing on Functional DOE.

Let's take a closer look at the Functional DOE workflow.

The first thing you want to do is to set up your design

using the Custom Design Platform.

Then you can go out, run your DOE,

collect the results, and you want to organize your data

to get it ready to put into the Functional Data Explorer.

The remaining steps in our Functional DOE workflow,

will be done all within the Functional DOE Explorer.

In the Functional Data Explorer, we can process our data,

we can smooth our individual curves, we can extract our shape components,

and then we'll use our shape component scores for DOE modeling,

and we can use our profiler to address the goals of our DOE.

All of this is done within the Functional Data Explorer.

Now, there are many presentations at this Discovery Summit,

at previous Discovery Summit,

even in our Mastering JMP series on jmp.com,

that will go over lots of details around the Functional Data Explorer.

I'm not going to be talking specifically about the Functional Data Explorer today.

What I want to talk about is how do you set up your design

for a functional DOE and how can you organize your data.

To do this, I'm going to use this Bead Mill Case Study.

In this example, what we have is

we're essentially milling pigment particles for LED screens.

You start off with beads and pigment in this slurry, in this holding tank.

It goes through, it flows through this milling chamber,

and comes back to the holding tank in a continuous process.

So if we were doing a DOE on this process, some factors that we could look at

is the percent of beads we're starting off with here in the holding tank,

the percent of pigment particles we're starting off with.

We can look at the flow rate through the system

and also the temperature.

When we're looking at the goal of this DOE,

we essentially want to achieve an optimal size over time curve.

So let's take a look at that optimal curve.

The optimal curve is represented by this green curve here.

So essentially, we want our pigment particle sizes to decrease,

so they fall within specification quickly.

And our specification range is represented by this green shaded area.

We want those particles to remain within specification

throughout the duration of the run.

That is our optimal curve.

Let's go ahead and take a look at data prep.

I'm going to talk about data prep first,

and then we'll move backwards and talk about the DOE design.

For data prep, there are three main tips and tricks I want to share with you.

First of all, I want you to understand

that the Functional Data Explorer accepts data in different formats.

The Stacked Data Format is the default format, and it's the most versatile.

But you can also use Rows as Functions.

I'm going to go over some table manipulations,

such as Stack, Split, Join, and Update,

to show you how you can get your data ready for analysis.

And then I'll also show you how you can quickly import multiple files

if your curved data is stored in separate files.

What data format is FDE expecting?

Well, there's actually three different formats.

There's Rows as Functions, Stacked Data and Columns as Functions.

Let me open up a data table here and launch the FDE platform

to show you that there are different tabs up here for these different formats.

The Stacked Data format is the default.

We have Rows as Functions and Columns as Functions.

This example here happens to be Rows as Function.

Each row contains a full function.

Here we have the first run from our DOE,

and the function is represented here in these columns.

Each column represents an X variable or an input,

and then the value within the cell is a Y variable.

When we go to populate the Functional Data Explorer,

we can come in here, go to Rows as Functions,

our Y output is represented in these columns,

we can put in our DOE factors,

and our ID function, and then you can go ahead and analyze that data.

So this is Rows as Functions.

One thing to know is that Rows as Function

assumes that observations are equally spaced in the input domain,

unless you have an FDE X Column Property.

The FDE X Column Property is something that comes into play

when we design our DOE,

which we're going to talk about here in a second.

But I just want to show you here,

next to each of these columns,

I have a Column Property associated with it.

And you can see

here's the FDE X Column Property, and the X input value will be two here.

If you want to use the FDE X Column Property ,

I'll show you here at the end

how you can use the JMP scripting language to assign that.

So that's Rows as Functions.

Now let's look at Stacked Data.

Here's an example of Stacked Data,

where I have one run or one curve over multiple rows,

and each row is an observation of the curve.

So in row one here,

I have a value for X and a value for Y, and that continues over multiple rows.

This is the most common and the most versatile way

of organizing your curve data.

And when we populate the Functional Data Explorer,

we're here in the Stacked Data format,

we'll put in our X and our Y of our function,

put in our ID of the function,

and then we can put in our DOE Factors here as supplementary variables.

The last type of format that the functional data can use

is Columns as Functions.

I've never seen data organized this way, and it's a little perplexing.

It's hard for me to get my mind around why you would organize your data this way,

but I'll show it to you even though it's not very common.

In this example, each row is the level of your X variable of the function.

So here we have a column for time,

and each row represents the X measurement, and then each column represents a run.

Let's go ahead and launch the Functional Data Explorer.

We'll come over here to Columns as Functions.

We can put in our X variable,

which is time, and all of our output variables, which are each of our batches.

And you'll notice in here we cannot input supplementary variables

because we don't have any way of

defining which factor or which treatment is associated with each of these runs.

So you cannot do Functional DOE with Columns as Functions.

Now let's talk about getting your data into your DOE data table.

To do this, we're going to use the Tables menu.

We have lots of different platforms here where we can manipulate our data.

The two things that we may want to do with our data

is reshape it, using Sort or Stack,

or we may want to add data, by using Join, Update, or Concatenate.

And especially if you're new to JMP, some of these table manipulation platforms

can be a little confusing when you start using them.

The little icons next to each of the platforms,

can be very helpful to know which platform does what.

So what I've done in my journal

is I've taken each of these icons and I've blown them up here,

so we can take a closer look at these icons

to understand what each of these platform does.

Let's first talk about reshaping with Stack and Split.

Let's first talk about Stacking.

Stacking is going from wide to tall data.

In this example, you have data in multiple columns,

and you want to combine that data into one column.

Let's look at an example here.

Here we have wide data,

we have Rows as Functions.

Let's say we want to Stack this data,

so we can use it in the Stacked data format,

in the Functional Data Explorer.

We're going to come up here to the Tables menu, go to Stack.

I'm going to pick all of those columns I want to stack.

And here I have 50 measurements in each of my functions.

I'm going to select all 50 of those rows and say I want to stack them.

I can come down here and define what my new column names are going to be.

The data that I'm stacking is actually my size data.

My label column, which happens to be my column name here,

this refers to my time.

Now, two things when I'm doing table manipulation,

I always give my output table an explicit name.

Otherwise, JMP will call it Untitled,

and it will iterate through untitled numbers.

So I like to give them,

each of my tables, an explicit name,

and then keeping dialog open.

You can check this box to keep this dialog open,

so when you hit Apply and see your results,

if you didn't get the results you're expecting,

you have your dialog here to review what you did

and maybe fix what you need to fix to get the desired output.

Now, this data is stacked and ready to go.

Let's go through an example of Split.

Split is when you're starting with tall data or stacked data,

and you want t o split it out into different columns.

In this example here, I have stacked data,

and let's say I want to split it out so I can use Rows as Functions in FDE.

I'm going to come up here to the Tables menu.

I'm going to use Split.

And this Split dialog is probably the most confusing of all of them.

Even after using JMP for many, many years,

I always have to step back and think about how to populate this.

But the Split by Columns

is essentially what's going to be your new column headers.

So I want Time as my new column headers, and I want to Split out my size data,

and I want to be sure to group this by Run O rder.

A gain, give this an explicit name,

and I can keep the dialog open to see how I split this data.

Now here I have my data is wide, I can use Rows as Functions in the FDE.

That is reshaping your data.

Going from wide to tall or from tall to wide.

Now let's talk about adding data.

A lot of times you're starting out with a DOE data table that you created,

such as this.

Let me just delete this column out.

I want to add, I want to be able to join my curve data to this table.

Here's my curve data in a separate table.

Essentially, what I want to do is,

I want to add the columns in the second table to my first table.

I'm going to use Join. Join adds columns.

I'm going to start here with my DOE table.

I'm always going to start with my DOE table

because my DOE table has all of these scripts in here

that I can use to analyze my data.

These are very important.

So you always want to start with your DOE table.

And we're going to use Join, going to join it with our curve data.

You always want to make sure that you're matching up based on your row numbers,

so the right curve for the right run goes with the correct factors for that run.

And I'm going to select all the columns

in my DOE table and my Functions from my Curve Table.

A gain, I can use an explicit name here when I create this table.

Now, I have my table ready for analysis.

That's an example of Join.

Let's talk about Concatenate.

I don't use concatenate too much for my DOE data prep.

Concatenate…

You use that when you want to add rows to a data table.

Then DOE,

we typically have all of our rows, all of our runs in our data tables.

We don't need to concatenate,

but I just want to run through this example real quick.

Let's say I have my data for my first 16 runs.

I have 10 observations per run, so I have 160 rows.

Then I run my 17th run.

It looks like this.

That's 160.

Here's my 17th run with the 10 observations from that run.

Essentially, what I want to do is join this data table,

or sorry, concatenate.

I want to add these rows at the bottom of this data table.

I can start here, come to Concatenate.

We're going to add this.

With concatenate, you have this option to append to first table.

I'm just going to add these rows, append this data table.

Now, we have 170 rows of data.

That's Concatenate.

I want to end up here with Update.

Update can be a very handy tool

when you're populating your DOE table with curve data.

Here's an example of the DOE data table I created.

I have columns here to populate my curve data.

Here's my curve data.

Here's my DOE data table.

Essentially, what I want to do in Update is I want to be able to populate

my blank cell with the information I have in this data table.

I can do that by matching run order,

and then JMP will automatically match up

the columns with the same names and update this data table.

Let's come here in T ables, Update.

Select my table that has my curve data.

Match on Run Order, say OK.

And now, this data table is updated.

Those are some table manipulations

you can do to get your data ready for analysis.

The last thing I want to talk about is importing multiple files.

Let's say that your curve data gets stored as separate files for each batch.

I have this example here of…

I have my curve data in 17 different files,

and they happen to be CSV files.

I want to be able to import each of these CSV files

and concatenate them together so I have one data table.

You can easily do this by using

the Import Multiple Files function under the File menu.

When you use Import Multiple File,

you can click on this button here

to select the folder that contains all of those files.

Here's a list of all those files.

Now, the file name itself actually contains my batch number,

and this is data that I actually want to pull out of the file name.

I'm going to add the file name as a column.

We'll import.

Here's my curve data with time and size,

and here's my file name.

Now I can come up to the Columns menu

and use this column utility to convert my text to multiple columns.

I just have to put in the delimiter I want.

I'm going to use the underscore that's before the batch number

and the dot that's after the batch number, and I can say OK.

That gave me three columns:

the curve, the batch number, and the file extension.

This is the data that I want.

I'm just going to delete these other columns here.

And now I have all of my curve data

for all my 17 runs in one file with the batch number.

That is what I wanted to show you for data prep.

Now, let's talk about setting up your DOE design.

There's a couple of tricks that I want to show you.

In your DOE Dialog, there's two things to think about.

First of all, we want to make sure we're removing this default response,

and then we're going to talk about how to define the functional response

based on the format of your curve data.

Let's go ahead and launch our DOE Dialog.

We're going to come up here to DOE, go to Custom Design.

Here's our DOE dialog.

Now, the DOE Dialog, like I said, will have this default Y response.

If we just have a functional response in our DOE,

we don't need this default response, so we need to get rid of this.

What we don't want to do is just delete the name

because that response is still there.

What we want to do is select that default response

and actually use Remove to get rid of it.

Then we want to add a functional response.

I'm going to come here and add a functional response.

When we're defining our functional response,

we can give it a name.

We can say the number of measurements per run and the values.

Let's go ahead and do this for our DOE.

Our responses size…

This is what's on the y- axis of our function.

Then we can tell the DOE platform what our X values look like.

We can define the number of measurements

with the number of X values and what those X values are.

Let's say I'm going to measure the size every 2 hours.

I'm just going to type in here every 2 hours up to 20 hours.

That looks good.

The next thing I need to do is add my factors.

I have saved my factors and my factor ranges to this factor table.

I'm just going to load in these factors.

I have my factors up here.

Next thing I want to do is specify my model.

I'm going to choose a response surface model

which will add all my two- way interactions and all my quadratics.

Finally, I can enter in the number of runs.

JMP is recommending a default number of 21,

but let's say I only have enough time and resources to do 17.

I'll say 17.

I will ask them to make the design using all of those inputs that I entered.

It just takes a couple of seconds for JMP to create this design for me.

Here's my design.

Here are my 17 runs with the treatment I want to apply for each of those runs.

When I'm creating my DOE table,

I always want to use this Make Table button.

I always like to include the run order column

because the order that these runs are executed is very important.

I'm going to include that run order column and make our DOE data table.

Here's our DOE data table.

We have our treatment.

We have a place for R to enter in results for our function,

and I have my run order column here at the end.

I also have my scripts

that reflect the functional DOE and also the model I specified.

Whenever I'm adding data, my curve data, like I said before,

I always want to add it here to my DOE data table

because it contains information about the functional data analysis

and the DOE model that was specified.

That's a quick overview, but I want to give you some tips about

defining the functional response based on the format of your curve data.

Let's come back here.

We'll go back.

Let's come back up here to Responses.

The way that you populate this information here

will define how your DOE data table looks.

I want to give you a couple tips for what to enter here,

depending on what your curve data looks like

because we want the data prep part to be as easy as possible.

There's a couple of things to consider.

First of all, is your curve data wide or tall?

We talked a lot about this, right?

Do you have wide data?

Or do you have tall data?

Is it stacked or are you going to have rows as functions?

The other thing to consider is whether your data is equally spaced,

if you have the same X measurements,

or whether your measurements are asynchronous.

What do I mean here?

Well, let me pull up a couple examples.

In this example here, I just have a few measurements per run.

I have 10 measurements per run,

and they are all equally spaced,

and I have the same measurements for each run.

When I go to enter in this information, there's just a few to populate here.

It's not that difficult.

But you might have a scenario where you have asynchronous data.

In other words,

you have different measurements for each of your runs,

and you might have a situation where

you're collecting a lot of data points, maybe…

My rule of thumb is

if you're around 10, less than 20, yeah, go ahead and populate your values here.

But once you start getting up above 20, certainly hundreds,

that's a lot of information to add here to your response here.

The other thing to consider is

if your data will be manually entered,

are you going to manually enter the responses

or are you going to use Join or Update?

Let's run through some scenarios.

Let's say you have rows as functions.

You have a few measurements,

and you're going to manually enter your data.

Well, if you're going to do that, then set up your DOE data table like this.

Or set up your…

sorry, your functional response like this.

This is what your DOE data table is going to look like.

You can manually enter your results in here,

and then you can use this script here to run your functional data analysis.

That's the first scenario.

Let's say you have rows as functions, you have a few measurements,

but you want to use Update to update your datas.

Again, l et's come back and take a look at this example here.

In this example,

since I defined the name of my response, I get time in here with my column header.

Let's say when I bring in my data,

I just have the number here in my column header.

So if I was going use Update, these column names do not match.

To make these column names match,

what I want to do is come back here, remove the name,

and then when I create my DOE data table, I just have the number.

And then I can use Update to update this data table.

Let's go ahead and do that.

Here's my data.

Let's go ahead and update with my current data.

I'm going to update based on matching the row numbers,

and now my data is in here.

I can use this script here to go ahead

and go into the Functional Data Explorer to start analyzing this data.

So that scenario, let's say I have rows as function.

Again, I have wide data, but I have many measurements.

I have many more than 10.

Let's say I have 50.

Entering the 50 values in here doesn't make a whole lot of sense.

What I'm going to do is I'm going to set the number of measurements to one,

and I can just set the values to one as well.

When I go to make my DOE data table,

it will look like this.

I will get my run, order.

I will get my factors, and I'll get this blank column.

All I need to do is delete that column.

Here are my 50 measurements for each of my curves.

Again, I'm going to use Join, like I showed you up above.

I'm going to match based on run order,

bring in everything from my DOE table, my functions from my curve table.

I can give it an explicit name and say OK.

In this example here, since I've used Join,

I'm essentially ignoring what I set up as the details around my functional response.

This script here is not going to work.

In this scenario, I will have to come back here,

go open up the Functional Data Explorer, enter my supplementary variables, my…

This is rows as function.

Enter in my supplementary variables, my run ID, and my curves.

When I go to do the functional DOE analysis,

it will come back and look at the model here that's specified here.

It's generalized regression script,

so it will remember the model that I specified when I set up my DOE.

That is that example.

Let's talk about stacked format.

With stacked format, we typically are going to be adding the data using Join.

Again, what I populate here doesn't really matter,

just as long as I have a functional response entered in here.

Again, I get this same data table.

I can delete out the response.

Oops, I added to it.

Delete column.

I can remove that response.

Now, I can use Join to bring in my stacked curve data

by matching on run order,

bring in everything in from my DOE table and my function data from my curve table.

Again, for this example here, running this script is not going to work,

so I'm going to have to manually launch the Functional Data Explorer,

bring in my X, my Y,

my supplementary variable, and my run order,

and then I can go ahead and execute the Functional Data Explorer

when I go to do functional DOE.

Again, as before,

it will look at the model that's included in generalized regression script

that is based on the model that you specified when you designed your DOE.

The last thing I want to mention real quick is this FDE X Column Property

that I talked about before.

Let's say that…

this is a scenario where I want to bring in data, my curve data,

where I have rows as functions.

So I have rows as functions, I have many measurements.

I want to add the curve data using Join, but my column headings contain text.

In this case, I have the units of measurement

for each of my X values here in my column headers.

I can join this data together,

bring in my curve data by matching on run order.

We then bring in all my data from my DOE table,

bring in all of my curve data.

Let's say I want

the Functional Data Explorer to recognize the number in my column header.

Well, to do that, I need that FDE X Column Property.

But when I go in here to Column Properties,

you're not going to find FDE X Column Property here.

What I can do is I can use a script to define my FDE X Column Property.

Actually, it's going to be based on this.

What I can do is run this script, and now I have a column property assigned

where I have the number that the FDE will recognize as your X value.

That was my last tip or trick.

Let's just do a quick wrap- up, a review of the tips and tricks

starting with your DOE design.

You always want to remove that default Y response

before you add your functional response.

You're going to define your functional response

based on the format of your curve data

because you want to make your data prep as easy as possible.

You always want to add your curve data to the DOE data table

to take advantage of all, not only the FDE script,

but the model script that is created for you by the Custom Design Platform.

When you're preparing your data for analysis,

when you're bringing in your curve data, just know that

the Functional Data Explorer accepts different formats,

stacked data and rows as functions.

You can use Stack, Split, Join, and Update to get your data ready for analysis.

And if your curve data is stored in separate files,

use import multiple files.

I just want to acknowledge a couple of people.

Ryan Parker, he is the developer of the Functional Data Explorer.

I want to acknowledge him for all of his help with understanding

all of the wonderful things that FDE can do.

I also want to thank Chris Gotwalt for his leadership,

also for some of the slides that I used at the beginning of my presentation.

With that, I thank you very much.

If you have any questions, I'd love to hear about them in the chat.

Thank you.

Laurenh14 · ‎06-19-2023

Hi there, I would like to use this tutorial with my students however we are on JMP version 17 which doesn't have the option for adding functional response (as you show at min 26:00). Is there a work around to this? Thank you,

Lauren