I'm Jim Wisnowski, along with Andrew Karl from Adsurgo,
and we're here to talk a little bit about operationalization
and how you can effectively employ some JMP capabilities to do so.
The genesis of this presentation came from engagement with a customer last year
around this time who said,
"Our company just loves JMP as a sandbox tool
and maybe in the lab,
but it really doesn't do anything for operationalization for us."
At which point, it struck me as,
"I'm not really sure I understand what the word operationalization
nor really know how to say it necessarily."
S omehow there's a disconnect because all of our projects,
we seem like we deliver a product, and it is finished.
What we did is we figured out, "What exactly is operationalization
and then how can we do some neat things in JMP with it?"
What we're going to do is I'm going to talk about what it is
and then give you some quick wins in the form of the depot.
Then Andrew, as always, will do the work where it takes the high intellect,
and he'll also show you the beautiful part of operationalization.
Key here is we all know all models are wrong and some are useful,
but the other side of that is your boss or your boss's boss,
she is thinking all models are useless, whether she says it explicit or not.
They're useless unless somehow they're supporting a decision of mine
or we get them into the production environment.
We need to make sure that even though we have a good model,
that it can go to the next step and be implemented.
By the way, I do want to show George Box over here on the right,
and then Stu Hunter on the left, who just celebrated his 100th birthday
on the 3rd of June.
By definition, what is operationalization?
When we think of things, we can think of ideas
such as the leadership saying the usual,
What is the problem? What do I do about it?
How does it impact me?
And then the now what? What is the solution?
The solution isn't necessarily all the things that we do
in the data analytics, data science,
world of the discovery, data prep, and all that.
It really lies in the operationalization piece
for that senior leader to understand how you're going to solve the problem.
In other words, it's really how do we get from that lab environment
to the production line
where we have a solution that's going to be useful to us.
As we do that, let's not forget SAS's recommendation here
that we better make sure that we have a good disciplined approach
as well as automated in that world.
Next up, we can dig a little bit deeper
into what operationalization is on a micro level kind of thing.
I asked Chat GPT-4, I said,
"Can you give me an image of what operationalization
looks like in data science?"
This is essentially what they did.
I had to do the graphics, but they gave me all the words.
If we think about our usual data, do our data development,
and then we figure out what model we like,
and then we have to figure out how we're going to employ or deploy that,
what language?
Is it going to be JavaScript, C, or Python?
Then we do the deployment,
and then we do maybe perhaps an API integration.
Good news is JMP has a lot of tools for us to do that.
We're not left in just that lab environment as suggested.
Then on the bottom here,
we got the idea that once we have that model out there,
it's not a launch and lead kind of thing.
We have to babysit it and perhaps update hyperparameters
or add new data and see if it's still valid.
Then we have this idea here that you know what?
Not only are our users liking it,
they want to add more capabilities, so we start scaling up.
We have to make sure that we continue our good configuration management
and data compliance and documentation, ultimately resulting in business value.
The bottom line is how do I go from this lab and sandbox environment
to having business value?
That's what we're looking for in our operationalization.
Forbes gives the five steps here.
Important for us is to think about, first, you have to have a good story
so that management believes that you have a credible way
to approach this and solve the problem.
Then the last part here is, once you do have a model,
and deployed, and you have it operational ized
make sure that you have some metrics to make sure
that it is in fact performing.
But this is like that last mile idea is we take all of this work
that we do to create the models,
but getting it to that operationalization piece
is the tough part.
In fact, we can see that the data out there
doesn't suggest that we're doing all that great.
Overall, we're maybe even fewer than half of these models make it.
Then if they do, it takes upwards of three months or so to do so.
DevOps, we're all familiar with in terms of a good disciplined approach
for software development.
When we take that step into our model deployment world,
we'll call it ModelOps, where we want to have some a culture
or process and technologies to effectively employ them.
If we look at these three circles here,
it's really this intersection between two and three
that we're focused on
to make sure that the deployment is influencing those business decisions.
I'd like to go and do a demonstration here in JMP.
Before I do so, I do want to point out two fantastic presentations
from previous discoveries
that do talk about this idea of deployment and operationalization
by Nascif and Dan Valente in 2016 as well as 2017.
You can see over here, they have this whole idea
that if you give a man a fish, he eats for a day,
and if you teach him to fish, he'll eat for a life,
that's this operationalization piece,
which they also call the dark side of analytics.
That's what we're going to get into.
M eijian I also wrote a decent paper as well on how you can do that.
But for us, what I want to show you is using the Formula Depot,
and I got a little plus there because it's going to go a little bit beyond
just using the Formula Depot because that is not enough.
We'll use our well-traveled Boston housing data.
We'll look at what's the price of it as a function of rooms, distance,
and Lstat, which is an economic indicator.
We'll create a main effects model as well as a Bootstrap.
Then we'll look at the profiler, and I'll show you a quick trick
that could be helpful for you.
Then we'll look at how do I convert this to an operational solution
and being able to deploy it in a Python environment.
Certainly, this is a very small data set, but we could easily have done this
plugging into some data source
and using query builder and things like that.
But just want to show you some quick wins that you can go to the next step.
Because often we hear that it's great that you do all this work,
but the actual implementation has to be on this server,
and this server can only use JavaScript or C++,
whatever it happens to be, Python, maybe.
How can we take our good work and transport it into use in operation?
I'm going to transition over to JMP.
Here's a journal that we have for you,
and it goes through a few of our different options
and what we're doing here.
But here's Boston Housing that we know and love.
Here is my Least Squares model.
What you do in the Least Squares model, or any model for that matter,
is under Save Columns,
you're going to publish that prediction formula.
Then the Formula Depot comes up for you.
Let's go ahead and do the Bootstrap Forest as well.
Now we have the Bootstrap Forest, which we can also, under Save Columns,
publish that prediction formula.
If we come back up here
and we check out that Formula Depot report,
we can see that it is something that we are familiar with in this fashion.
That if I come down here
off of the red triangle next to Formula Depot,
I can look at the profiler .
I want to look at the profiler of both these models.
I do OK,
and there it is.
The main effects model, of course, doesn't have any interaction,
but maybe you want to make the point somehow
that when you have very short distance,
you have a real steep economic impact kind of a thing.
What we could do is we could show this dynamically in PowerPoint,
because often, that is the endpoint of some of our work,
and an analysis is a PowerPoint presentation
that we'd like to have some sort of a dynamic display.
There are certainly many screen capture capabilities
that you're probably all familiar with.
But if we just go back to PowerPoint,
we're able to do this inherently in the software, and it's pretty quick.
Here is a blank slide.
What I want to do is I want to show that dynamic display.
I just come under Insert,
and I'll do this screen recording right here.
What I'll do is I will get out of that and I'll come back to here,
and then I'll come back to JMP, go to PowerPoint.
Now I do the screen recording, and I want to be at the right spot
if it's a problem.
I'll just go ahead and take a snapshot of what I want.
I will go ahead and hit the Record button,
and it'll give me the 3-second countdown.
Then now I can narrate and say that we don't see any interaction
in our linear your model,
but you can see a Boosted Tree does have quite a bit of interaction.
You're happy with that.
You go ahead and stop the recording.
Then now you have this embedded in your PowerPoint that easily.
You can go ahead and run it here, and you can hear me in the background
that I was explaining it, so the audio is in there.
Then clearly, as you're in presentation mode,
it will run automatically as well.
Now back to the task at hand is what we want to do
is deploy this solution in a model that is Python.
What we can do under the Formula Depot options here
is we can go ahead and generate the Python code,
and we'll do it for the sake of argument for both of these.
We can see that the Boosted Forest here
has Python code that consists of 55,000 lines of code.
Good to know. But we'll put that aside for now.
What we're going to ultimately decide on, we'll say,
is t his main effects model only is what we're going to use
that we want to deploy to our system.
I'll go ahead and hit...
By the way, before I do that,
there is this one particular line of code here
that says import JMP score.
That is a Python file that is inherent,
and all of you have it on your install directory,
and you're going to need that to use it.
What it is, it's really just a helper menu.
It tells you here's how you do vectorization,
here's how you do power and things like that,
but important that it's there.
I'll go ahead and do File, Save.
What that creates is this Least Squares,
and then we'll call it BH for Boston House.
Now I've got that Python code.
What would be nice is if I could just go to Spyder off of Anaconda
and just run it and then score all my new data.
The problem is it's not that easy.
It doesn't run you yet.
It takes you a good way there,
but you still have to put some wrappers and things around it.
You have to import your Excel file or whatever it is.
I'm not a Python coder.
In fact, people say, "What do you do?" "I'm a data scientist."
"Oh, you must be good at Python." "No, I've never actually used it."
I'm an impostor in that fashion.
But maybe there's a way that we could use this latest technology, ChatGPT-4,
and have it create a Jupyter Notebook for me.
If I come under my ChatGPT- 4 here,
I have this question I ask it.
"I have Python code generated from a model in the JMP Formula Depot.
Go ahead and create a Jupyter Notebook for me
that's going to import data from Excel."
I say, "Can it do it?" And it says, "Absolutely."
Then what it does is it gives me the code right here.
I copy the code,
and I put it in my instantiation of Jupyter from Anaconda, all three.
Then I run it, and lo and behold, it doesn't work.
Nothing works for me the first time.
But I say, "It didn't work, and here was the error message I got."
It says, "Oh, well, try this." Then I tried that, and then it worked.
What did it work?
If I come back over here, this is my Jupyter Notebook
that was given to me by ChatGPT-4.
Again, I know nothing about Python, but I do know that it gave me these lines.
I just go ahead and say, I'm going to import Pandas
because I need that to get my Excel file in.
Then here is that Fit Least Squares. That's what I got from the Formula Depot.
It does that.
I'm running each one of these, by the way. Now it says go ahead and import.
I'm going to import an Excel file that has what I want to score,
and that's going to be under Boston Housing data.
It's new data that I want to score.
Then here's this outdata thing that it told me was my error.
I said, " I'll do that."
Then this says, "Hey, just let me know that I brought in some data," and it does.
Then now I'm going to go ahead and score it.
I go ahead and score it.
Then did it score? Sure enough, it did.
There's the first five values of it.
Then I can go ahead and save that, and we'll just call this For Demo.
Maybe I'll even add 100 in there.
Since 200 is 100, I'll go ahead and put that out.
Then I'll say, "W here does that live?"
Maybe I'll see it right here.
Here it is. Here's the scored data Excel file.
There are all 800 or so predicted values from that linear model.
This is easy as that.
Next up, what we want to do is set Andrew up
to why everything is beautiful in this world.
Coming back to PowerPoint here, the scenario is this.
It's that we were working with a group of doctors across the US,
and they wanted reports for every one of them
how productive they were in 2022.
They thought that perhaps they could be more productive
by changing some of the codes that they use for certain procedures.
They gave us millions of rows of data,
and we came up with exactly what they asked for.
We created a template of 10 pages or so of the report
with nice JMP graphics in there,
and it was virtually perfect, except for one thing.
The one thing is that this data table at the very end that gave the results,
we couldn't get it sized properly or put the borders on it.
It's as simple as just selecting it and then right-clicking and do AutoFit
and then hitting the border symbol next to it.
That's what I told Dr. Jay right here.
This is for Dr. No. You can see his annual report.
Essentially,
Andrew swam the English channel
and gave them the 99.8% solution to their problem,
but they weren't quite happy.
It wasn't until we went and took the step to make this automated.
Again, this is a two-second process,
but because it took that by 100s of things,
they weren't happy.
But then we ended up fixing that,
and that's when the customer said, "This is absolutely beautiful,"
hence we have beautiful operationalization.
With that, I'm going to turn it over to Andrew
to let you in on a few secrets
of how you can get some massively increased productivity.
Thanks, Jim.
I'm Andrew,
and I'm going to show you how we put together this report.
The assumption is we have some suppliers, and here are our suppliers.
We've got four different suppliers,
and we've got some metrics that we track
in this data set that we update frequently.
We want to be able to report this to the suppliers
so we can track how they're doing
and have a record of long-term performance.
What we'd like to do is to get something like this
is we have this template we like where we have our header up top, our logo.
We have in bold, we've got the company name.
Then each of these, we substitute in these calculated values from the data set.
The standard JMP functions of, we export a report to Word,
we get everything, all the outline boxes are open,
but it doesn't have our template function, and we have to fill in the blanks.
We can manually copy out the pictures, right-click those little gray triangles,
and say, edit copy picture
and get a nice version of the pictures out.
But it's still a manual process that has to be done.
This little application here, I've got four suppliers.
What if you have 400, and they're doing that once a month?
That becomes unwieldy.
How can we do this?
Not natively within JMP or JSL.
You can get close sometimes to being on your application,
but a more flexible example
is to take something like this where we have a template file,
we write the report once the way we want it.
Every time we have something we want substituted in by JMP,
we have these double brackets,
and we're going to put a little keyword in there.
In this case, team_p is going to be the team name.
Then down here, I've got mill_time_ p with an underscore,
and now, I've got that bold and highlighted.
We put whatever format we want,
and anything we want substituted in, we just put in brackets.
It's a natural solution, so it'd be great if we get this to work.
Then also, in addition to text, we can also do that with pictures.
We've got the q_pic11.
If we want a JSL line of box equivalent,
then we can get this table structure within Word,
and we just put in our keywords where we want to substitute things in.
Also, we're going to have a table at the end that doesn't have a key,
that doesn't have a holder here.
I'll get to that in a second.
Come back to my journal.
At the end, what we want it to look like, the intended document result,
is we also have this table at the end that goes through Tabulate.
We have this nice color gradient we can put it into JMP,
and we want to get this into Word.
But also, as Jim mentioned, we want to AutoF it this table
and we want it to look nice,
because a lot of times, the tables don't fit automatically.
We can go through all the work and create the tables,
but if we don't make them fit, then we're going to have a lot of work
ahead of us to go through and do that manually.
It's not something we can program natively within JMP.
What can we do is we will have a script
that, and this is shared on the Discovery website,
where we can open up the...
We have a reference for the data table, the container data, and this gets updated.
Every time we run this to generate the reports,
it pulls in the new data.
We have the output path,
which is a folder where all the temporary files get written to
that we normally clean up plus the output reports.
Then also template file, that Word file that contains those keywords.
A ll the rest of the script is going to be going down and calculating
the individual values that gets substituted in.
A t the end, we have this Python script
that does the actual substitution
and then also, auto fits and pulls in this table.
If you're saying, "I don't know Python,
I don't know how applicable this is going to be to me,"
we don't know Python either, but we got this to work pretty quick
because GPT-4 wrote the Python code for us.
I'm going to show you an example of how we did that.
What this script will do is we'll write the temporary files
to the output folder.
For example, here's our q12 graph, and the temporary files get written here.
The images get written to the output folder.
Then with the static code in Python, it knows to pull these files in.
The individual calculations will be run within JMP,
and then saved within the script,
and then passed along to Python at the end.
Here we run all of our JMP code.
We load any of our values into variables.
An example of that is, here we have a tabulate function.
We're going to make into data table.
From that data table, we're going to take the mean mill time column
and take the first entry in that column,
and then that's going to become our mill_ time_ p variable.
That is what will get sent down in the Python code.
When we initiate Python,
we tell it whatever variables we wanted to send over from JMP to Python.
Here's that mill_ time_ p.
That will hold any of the mill time calculations,
and that is what gets substituted in
to the mill_time_ p underscore area within the template.
I'm going to go back to my template file.
Here's my mill_time_ p area. That's what gets substituted in.
The intermediate output from the Python code for doing this
is the temp report.
We can see these values get substituted in.
The graphs get placed in.
We get our nice grid of graphs.
At the end, we don't have our table yet.
The reason we don't have our table yet
is because we like the way that
if we move a table from JMP using the get as a report to a journal
and then export that journal to Word with Save MS Word,
we like the way it keeps the table structure,
but we still need the AutoF it.
What we do is, in addition to the report that gets written out from the template,
is we also write out this other temporary table file.
We get the table import.
Here it is, and what we need to do
when we want to automate is this AutoF it t o Window,
and then also reducing the font size where it actually fits.
What we need to do is,
after the Python code that substitutes out into the template,
what we found is we have to convert that DOC file created by JMP,
convert it to a DOC X file,
and then we have Python code that will open up that DOC X file,
take the first table,
it will apply AutoF it, change the font size to 7,
and then it will append it to the original report file.
How did we know to create this?
Again, you can copy what we've done. We've got this file available.
But how can you reproduce this for your own
and create your own Python to do this?
I recorded a video of going through on GPT-4, how I did this,
and I'm going to show that now and narrate that.
The prompt I give is that I've got a Word document that I want to use.
I'm so bad at Python.
I'm going to go ahead and give GPT-4 my actual path here
because I don't know how to substitute out the backslashes,
and the spaces, and everything.
I say, "G o to this path, open it up,
take the first table you see in here, and then give me back that table AutoFit."
It's going to tell us, "You need to install this other package."
I've just got a base Ana conda installation.
You can ask it for instructions and help.
You make sure you install that package and then you'll be able to run this code.
Whenever it gives me this thing, the first thing I noticed is it tells me,
"Hey, I'm going to overwrite your report file, be careful."
I say, "No, I don't want that."
It's interactive the way it gives you the code.
I say, "F ix this code.
I want to write to a different file in the same directory."
It's going to modify the code
to give me a different output file in the same directory.
What I'm going to do is I just copy-paste this over to Spyder,
which is basically the JSL editor for Python.
It's the Python editor.
I hit Run, I hit the green arrow,
and I get this note about, "There's an error."
I don't know what the error means.
I don't want to spend time on Stack Exchange or Stack Overflow
or anything looking that up.
I paste the error back to Python,
and it's nice enough that it apologizes and explains what it did wrong.
I'm not that worried about what it did wrong.
I just want something that works and gives me my report.
I'm going to copy- paste the code it gives me
and then go back to the Spyder,
run that.
I get one more error. It says, "Table is not subscriptable."
Not really sure what that means.
I tell GPT-4 about that. It apologizes again.
It thought that the tables were zero- based index.
It turns out it's one- based.
It fixes that.
I'm going to copy this code over.
This time, it runs without error.
I go to the folder I specified to it, and here's the modified folder.
Now you can see AutoF it's been applied to this table.
We just made a Python code without really knowing Python.
You don't have to stop there.
If you have any version of modification that you want to give to it,
you can change the font size.
Here I'm going to ask it to make it Wingdings.
Let's see if it knows how to do that.
It gives me some new code, and I run it. Yeah, I can get Wingdings output.
Just to make sure it didn't just give me gibberishish,
I'm going to copy it all and make sure that it translates back
into more easily readable text, and it does.
That's what we're looking for.
Not only Python, but also Visual Basic.
Another thing that we run into is we output things.
You might have a 500-page report and you want the headers to look right
to header 1 , header 2 in words,
so that way you can get a table of contents.
W e told GPT-4 is, I'm going to use these delimiters.
I'm going to script these in to my outline box titles.
I get h1, close h1, h2, close h2 for header 1, header 2.
I want to substitute those in.
When we did that to GPT-4,
what it gave us is it gave us this macro file,
which I'm going to copy that.
I'm not sure if it was retained in my Microsoft Word.
We'll find out. If not, I'll paste it in.
You have to enable this Developer tab. That's a Word option.
When you do that, we get this.
We just paste in our macro here, which was generated by GPT-4.
I don't like programming in VBA, but it's good at it.
When I hit Run on this, it takes everything,
all these headers and fills them in.
That way I can go up here, I could say,
References, Table of Contents, and put on my table of contents.
If I generated 500 page report,
now I've got the right structure for all this.
It's easy to go in here, too.
If you want to add numbers to these headers and stuff,
you just right-click in here, and you go to Modify,
and you can turn on that numbering structure if you want.
The last thing I'm going to show for GPT-4, specifically for JSL,
is you might have noticed in my script back here,
I'm really bad about commenting my script.
I've got a couple of comments in here.
But for the most part, I don't have a lot of comments describing it.
If you have code, you don't want to give anything proprietary,
but if you give code to GPT-4 or even JSL, it'll add a bunch of nice comments to you
and explain what each code block is doing, and that way other readers can see.
Also for Python, if you don't know Python and you're taking a look at our script
that we've provided, and you're saying, " What does this do,"
you can provide this code to GPT-4 and say, "Explain this Python code to me."
It'll give you a long narrative story
and say, "Here is exactly the workflow of what's happening.
Here are the key functions that are doing it."
That's my favorite part.
You can say, "Do you have any suggestions for this code?"
It'll say,
"Y ou don't have any error handling. You've got some memory leak issues."
It'll go through and make a bunch of suggestions.
Then it's nice enough to go ahead and give you some new code
with all the suggestions implemented.
With all this in place,
you can go from doing all of your calculations in JMP
to actually getting this nice output file that has the right format you want.
Everything looks nice.
You're not going through making manual changes.
With this in place, this is what the customer said,
take a look at this and said, "This is beautiful."
With that, we hope that you can take this same idea
and go make some beautiful reports yourself.