Using JMP to Improve and Automate Analytic Workflows in Complex Processes (2025-...

Are your organization’s data practices inefficient and delivering inflexible reports? The ingredient company Amyris has built out the capabilities to generate an unprecedented scale of data in the synthetic biology industry by integrating automation, data capture, and analytics.

A review of current data operations established that JMP is a powerful tool for scientific analysis methodologies and can replace alternate tools such as Spotfire and Excel, even for an infrastructure as sophisticated as Amyris'. By taking advantage of Workflow Builder and automation in JMP, Amyris was able to create replacement methods that offered dramatically faster performance with more desired features and functionality.

By attending this presentation, participants can learn how simple workflow building and automation integration tools in JMP can help improve their own analytic work practices. We also touch on more advanced scripting and workflow features. This presentation is all shared by someone who is self-taught, demonstrating that anyone can get started with automation in JMP to deliver an impact to their organization.

All right. Thanks for tuning in today. I'll be talking about JMP, of course, but specifically really using it to automate analytic workflows at Amirys, but also looking at more complex processes and generalizing automation or data automation using the native JMP platforms. Before I jump in, I want to briefly introduce myself. My name is Stefan. I'm Associate Director of Data Science at Amirys, and I have I've got 15 years of life science industry experience, which has really a variety of backgrounds. I've worked in a lot of chemistry labs, fermentation labs, high throughput screening teams. I've worked in quality, a lot of statistics, data analytics, and now, of course, data science. I really love all things data from the way you store it to the way people consume it and access it. I currently support the broader AMRA scientific community with their own tooling as well as analysis methodologies. Today, Today, I'm going to cover three main parts. I'll give a little bit of background and context about the case study we're going to use, that I'm going to use to walk through some of these automation examples. We'll really start simple.

How do we use JMP to automate run off analysis workflows? These might be analysis that you're doing on the day to day, and you want to make a little bit more efficient. In the final piece, we'll get into more advanced automation, where we're looking at more complex workflows. We might be looking at analysis How do you analyze these workflows that are being used by a broader technical team. How do you generalize that to both support the data processing and ingestion as well as the analysis piece. Our case study today is from the company I work at, which is Amirys. Amirys is a little background. We're a synthetic biology company. We specialize in programming yeast and in industrial fermentation, feeding that yeast sugar and converting that sugar into pure ingredients or chemicals. We're based out of Emeryville, California, and do most of our manufacturing in Brazil. Our case study today is not going to be on the manufacturing side, but rather on the research side. In Emeryville, we have a number of highly automated labs, which we call our Screen and analytics labs, where we sift through hundreds of thousands of strains per month in these 96 wall plates.

You can see a picture there of some barcode of plates. That's what we use as scale down fermentors. That's our first place of testing to see how much are these strains producing? What are they producing? We're looking for specific phenotypes of those strains. We grow them in these 96 well plates, and we then use a whole suite of different analytics to generate measurements, and often multidimensional measurements on these plate wells. We also have a lot of in-house databases, a lot of the data, as well as the related metadata, is stored in application databases and a data warehouse that we can then access with tools like JMP. That's where JMP comes in. Of course, looking at the final step of this screening is we screen a lot of strains, generate a lot of data. We now need to make sense of that data, make a decision, which strain are we going to promote to buy a reactor fermentation, and potentially to manufacturing in Brazil.

We'll start with simple automation in JMP. What I want to do today is if you've never done automation in JMP, if you've never used those platforms, I want to hopefully introduce them to you in an accessible way, in a way that is relatively code-free, and give you a basic understanding of how these platforms are built up, and what I've learned in teaching myself and learning from others in using these automation platforms.

I like to say that if you do something more than 10 times, it's probably worth automating. I think the perfect place, if you're new to automation to start, is with your own analysis. The nice thing about those, or you're very familiar with them, you can give yourself feedback. Of course, they're usually simpler than trying to look at a broader analysis for a larger team. In our example, it's going to be a very simple measurement a single measurement on a plate. We have this serial propagation of plates from pre-culture to what we call production. In that second plate, we're taking a single measurement. In this mock example, we would say, "Hey, we're running this exact same type of experiment over and over." Every time we do this, we pull the data into JMP. We're adding a calculated column. We then filter on that column, and then we generate some visual and statistical tests us to help us make a decision. Doing it once, it's not a ton of time, but if you do that over and over every day, repeatedly, it starts to add up.

Before we jump into the automation piece, one thing I want to just touch on briefly is structured data, which is really a prerequisite for automation. Now, your life will be a lot easier if a lot of that standardization and structuring happens is upstream, where you're gathering the data and ingesting it into a database, or it could be CSV files. In our case, it's database already comes structured. Of course, a lot of this data munching you can do in JMP, and you often will have to depend on the application or the analysis you're trying to do. But really think about what is the source of the data? The further upstream you can address it, the better. If you are using database data, I would recommend using Query Builder. We're not going to go into that much today. That's my general recommendation. At the base, and really, foundation of scripting in JMP is the log. Whether you know it or not, by default, JMP is always recording what you're doing, what JSL, what script is being run to generate what you're doing by clicking around the UI. If you go to the View menu, you click on Log, pulls up the log, you'll see most of the steps you've done captured there.

If you click on any one step, it's going to show you the JSL that was used to execute that step. In this example I'm showing, we're doing that first step of adding a calculated column, and I've clicked on that and highlighted the JSL there. It's a quick first way to expose what JSL is being executed for a step, and you can, from the log, directly run that code or rerun previous code. With that being said, the log is not where I would recommend you try to really automate things. It's just a lot of the source of where this code comes from. One way you can pull things out of the log and into, say, a data table is using data table scripts. This for a long time was the method of making stepwise automation in JMP. In this example, I went to this data table, I clicked on the little red hotspot, I generated a new script, which just It gave me an empty window, and I paste it in directly from the log two steps of code, one creating the new column and the second one setting a formula for that column.

You could go further and clean up the code. It's completely optional. But in the later sections, when we start to talk about more complex examples, you might want to think about, how do I make my code the most maintainable or the cleanest? In this case, we've done a two-step cleanup, where in the first step, we've removed the redundant data table reference. In the second step, we've compressed it further by not renaming our column, column 21, then renaming. We're just naming it the right thing from the beginning. Each of those boxes are going to work. They're going to do the exact same thing. You won't notice, in this case, any difference in performance, but it can lead to cleaner and more maintainable code. If you're new to JSL and you see this code, you're like, what is this even doing? You can paste these things into your favorite LLM. We use Copilot, ChatGPT, whatever it is. You have the JMP learn bot, if you have access to that, and ask for an explanation of, "Hey, what is this code doing?" It's generally given you a good idea of what it's doing there.

We pulled data from the log, or not data, we pulled code from the log and pasted it into the data table scripts. That's the most manual way of auto-generating JSL. You can see it here. We have it as we've named it, step one, add calculated column. There's, in most platforms, more these are straightforward ways to generate these table scripts and save the script that generated a certain output. Many platforms have these options under the red hotspot. Under the red hotspot, you could look all the way down. There's a Save Script option that gives you a number of options. You could just save it to your clipboard, directly to the data table, which is what we're going to do, as well as other options. This becomes a really easy way to auto-generate JSL directly from the UI. We saved it to the data table. All we've done in this case is just rename the name of that script, step number two, data filter, and gone on with the analysis. We can do that here in our final step, which is generating a Y by X plot, and we're also applying a Dunnett's test with a reference group.

Again, here we go to the red hotspot, go to Save Script, and save that to data table. What we've created is this Pseudo workflow. We have this three-step automation. We're adding a calculated column in this table, adding a data filter or applying a data filter, and then doing a one-way analysis with a Dunnett's test. This leads you in that space of using table scripts, and this for a long time until I think JMP 16 or 17, sometime around then, the workflow came along. The workflow really led to a place where you can level up these table scripts with additional functionality. The workflow is a new platform. It's a standalone platform in JMP. You could access it by going to File, New, Workflow. There are a number of features that really can amplify not only the ease of development of these workflows, but also the end user experience of using these workflows. For example, there's a single button you can click that will run all steps in a workflow. There's step recording, which will record what you're doing in JMP and automatically add it to the workflow. You can annotate scripts directly in the workflow, and you can add things like logic gates, interactivity, the user prompts a lot more that we certainly don't have time to go into all today.

But I'll show some examples, especially in the more advanced version. What we can now do is take these table scripts, so we have the pseudo workflow. We can imagine maybe we created this before workflows existed, JMP added the workflow platform, and now we've added this to the workflow builder. All I've done here is just added three steps. I've pasted over the JSL from each step, and you can see we have the exact three steps in the workflow as we had in the data table scripts. I'll show a brief demo here of what these look like with our real data or demo data. In this first example, we have our structured data. We have a number of measurements. We have those measurement values. What we want to do again is to do these three steps. We can see in this data table script approach, all you go do here is press one by one. I press that. We've now added this calculated column. We apply a data filter, so that pops up and applies this filter, and then we generate a one-way analysis for this specific experiment.

Really simple, right? These work fine. We These work great for something simple like this. Do you need to move to a workflow? Do you need to do something more sophisticated? Not necessarily. If this works, that's great. We could reset this, and you'll see I'm having to do that manually here. We're going to clear our row state, make sure our data filter is closed, and then we can look at the workflow. This is again, three steps. What's nicer here is now I can automatically see the code. I can't do that here. Here, I would have to I right-click, go to Edit, and actually see the definition in a new window. Whereas in this other option, I'm seeing the code for each step. What I'll be able to do here is press play, and it's going to do all three steps together here. I got these nice little green check marks telling me it ran without errors. You'll see we have our new column. We've applied a filter, and it's opened the same analysis. It's a bit cleaner. In the next section, we'll go a bit more into the other bells and whistles you get with using the workflow.

One thing I want to touch on just very briefly, the demo, is the record button, which is great, especially, again, if you're not familiar with JSL, even if you're not familiar with coding at all, this is the easiest way to start making your own automation. All you can do here is hit record, and now anything I do in this platform is going to be captured in that workflow. I'll just do a dummy example. I'll draw a distribution of my measurements that generates some distribution here. We'll assume that means something really meaningful. When I go back to my workflow, you'll now see I have a new step here. We have our JSL script here. I can rename it, add notes, I can modify it, et cetera. A really nice feature with the workflow builder is this record button. Let's level it up one more, right? Maybe you've automated something for your sofa a very specific application. Maybe it works for one experiment type. How do we go from that to applying it to more complex processes? That's the question we had.

At Amirys was the reality is we have a lot of different experimental types, and we want to be able to ideally support those with as few workflows as possible. We don't want to make a unique workflow for every possible iteration because that becomes impossible to maintain and very difficult to develop. When you get into more advanced development, one thing I would suggest, and I found this on a video somewhere on the JMP community, is using JMP projects to organize what I would call the development environment. One of the pain points of starting to automate these complex things is you're going to have all these data tables, windows, things popping up, and it can be difficult to manage. If you use project, you can use it as a wrapper for your development. As well as for the eventual use of that tool.

Here I have some of the default project platforms open, things like Workspace and Contents. I've added a journal as a placeholder. This is where anything that the workflow opens, any new windows, they're going to open as new tabs instead of as new windows on my desktop. You can also use that journal, of course, for documentation or instructions. On the right side, I have my actual workflow that I've built out. On the bottom, I've also added my log.

If I need to do some troubleshooting, if I need to look into more details on error messages, I have the log right there, and I can do that directly all from a single window. That's how I also demo this workflow is within that project. General user feedback that I've received is very positive to not have that proliferation of tables and visuals. When we talk about generalizing automation, we want to target high impact applications because it's not a trivial effort to generalize automation, whether we're talking about data automation, if you're talking about physical automation, if you've ever done that in a lab or in manufacturing, right? It takes a lot of effort money and work and time. In general, if you're looking to automate data workflows, prioritize applying them to standardize workflows or use them as a way to push for standardization. Say, "Hey, look, if we standardize this, we can also automate this really painful data munching that you're doing every day."

Often data workflows require some flexibility, whether that's because of the process or whether it's because of how people are looking at the data. That's where you can automate piece by piece. This is really the powerful piece about JMP, because I've had people ask me if you're automating, why not just use Python? Why not just use a purely code-based platform to do that automation. The problem is, as soon as you make something purely code-based, it adds a big barrier to entry for people. Unless you can automate it from end-to-end, if it gets it to a certain point and the user has to take over, and now you're saying, "Hey, I need you also to be Python-fluent" it becomes a huge barrier to entry, even in very technical communities like the scientists that I'm working with.

JMP is really nice because you can automate just the first piece, and then the end user is in JMP. They have the UI, they can run with that. They can do whatever they want, they can have that freedom and leverage the automation. Of course, we want to identify high impact opportunities because it does take effort. In our case, we were looking at these plate experiments, and we talked to the scientists asking, "How much time are you using to retrieve this data?" We were noticing people were doing this very manually, copy-pasting, repetitive steps.

We had 17 scientists on the team. We did a little bit of a survey and estimated that we're spending about one whole person's time, pretty expensive person's time, to do all of this. That to us was an opportunity to say, "Hey, a lot of this looks manual, repetitive, and automatable." Let's give it a shot. The biggest challenge you'll definitely encounter is complexity. When we think about automation, there's two axes. It's one, both the number of total steps in the workflow, but also the number of variations at each step. A variation is really a fork in the path. I like this diagram on the right because it can very quickly become an exponential problem where you're trying to solve for too many possible iterations or variations of a certain process. You have to think about walking in, you want to ask yourself, is it manageable? And if it's not manageable, what can you do to make it manageable? You can either eliminate variations by standardization, as we mentioned, or saying, "Hey, this corner case is not going to be supported by this data analysis."

You can even add an error message, as you'll see in your workflow to that effect. You can also just eliminate steps with automation off-ramp. This is what I was mentioning, is having the user take over. If there's a very artisanal, unique quality control step, you can pause a workflow, let the user do whatever they need to do, and then they can resume the workflow from there. You have options to think about, but definitely planning is going to be in your favor when you're thinking about automation. In our case, we were looking at these plate processes, and this is the original example I gave. If we have a single plate propagation, we take a single measurement, and we'll call this process very propagation one.

When we started to look at the breadth of how does this differ from experiment to experiment, we saw, "Hey, actually, sometimes we take two measurements from this plate, not just one single measurement." Sometimes, actually, we're also measuring from the first plate in the serial propagation. Sometimes we're measuring two measurements for both. Sometimes we're actually splitting the plates. We're not just going one to one, we're going one to two or one to many.

We're taking measurements on those. Sometimes we're splitting them and actually taking different measurements on both of them. I don't expect you to keep track of this, but what I'm trying to demonstrate is that there's a lot of splay in our process. I'm just showing a few examples here. We have dozens of different variations of our process. We were just looking here at the complexity in the process itself, the process in our labs that generates the data. We hadn't even started thinking about, okay, how standard is the way people look at their data, analyze their data? What statistical tests are they using? If they're doing summary metrics or are using median or mean.

The complexity can become very overwhelming pretty quickly, but it's important to do your research intelligence beforehand. What we've been working on, and this is a work in progress. It's deployed, but it is something that we are iterating on continuously. That's the other thing I'd recommend is figure out what's the minimum thing that's going to be useful, and then let's get that frequent feedback from the end user. We came up with this generalized approach for our plane analysis.

This is what a more complex workflow looks like here. What we're doing is we're actually adjusting data from our database directly prompts the user to filter that data based on the specific experiment they're looking for. We're doing a bunch of data munching, fixing data types, naming conventions, making those things consistent. We're splitting things based on that process. We saw that serial propagation. We need to understand which measurement is associated with which plate. We do some checks, and that's where we're looking for, is this a process maybe that we're not supporting? Is this a measurement that isn't supported by this automation? Let's throw an error, give the user a message. We do more data munching around pivoting. We even give the user an option to add offline data. People might have manual annotations of their our genotypes. Finally, we generate a set of default visualization and analysis.

Talking through this more, I'll show you what this looks like. Here, as I showed now, we're in the project. We have our workspace contents. We have our journal. We have our log down here. I'm actually going to close the log because we don't need it for the demo, and it's taking up value valuable real estate on our screen.

But we have our workflow here. I've made these steps as human-readable as possible, so you understand what each step does. You can also see that you can activate or deactivate steps, in this case, for the purpose of this conference. We're just importing static data because we're not going to connect to our company databases to pull real experimental data. We just natively have that raw data in here. You'll also notice here some of these logic gates, right? There's these if/out statements that allow you to have logic switches or different paths, different forks in the row to say, if the data looks like this, do this thing. If not, do this thing.

I'll go ahead and run this. We're importing the data. I have a maybe slightly passive-aggressive a statement here, which is just a little... You can have little pop-ups to give the user feedback. I said, great. You only have measure B measurements for your PC plates, right? That's the only thing we allow. Good job. They hit okay. It continues to run. Now we're giving them a prompt. It's saying, hey, would you like to join manually created metadata?

This is commonly genotype data to this data set. On the first run through, I'll just hit no. Actually, I don't have any data there. It's like, okay, we'll just run through this. Now we've generated, we've run through everything. We have all green check marks. On the left here, you'll see a number of data tables as well as some default visualizations of performance as well as a normalization we're doing on the plates for strain selection.

Now, the really neat thing here, again, the workflow is I can hit this button here, this reset, and it's going to close everything the workflow opened and reset it to the beginning. Now I have reset. You notice we're just back to the journal and our little arrows at the beginning. I can rerun this again now. In that same prompt, instead, I'm going to say yes. This is going to lead me down a slightly different path. I'm going to say, yes, I want to add manual data. Now it adds another prompt asking which column do I want to use to join the data? I'm going to use this X number, which is a unique identifier.

We use four strains. Hit okay. It's going to say, "Hey, I don't see this in your data set." Of course, it doesn't because that's my own offline data. I'm going to go to my computer, find where that is. In this case, I've stored that in a CSV, and we're going to join that, and it's going to run through that script. You notice we actually get to the same at that point. We have this normalization. We have this Graph Builder box plot visualization of the measurement. But what we've done is we've also now added some additional... There's these columns, Promoter Mutations, other info that were in that offline data that we've joined to the main data that wasn't done otherwise.

These generalized workflows can become really powerful. When I publish these, I save this in a project as well. It's the workflow in a project. I add user documentation here as well as any hyperlinks to other internal documentation that might be relevant. A user would come here, they would press play, would prompt them, enter your experiment ID. that would dynamically change the data that's pulled in, and then it would run through this.

You'll notice I also here have some things like throwing errors. That's for specific cases where we don't support that, or maybe it's an unexpected experiment type. The one other tip, I would say, if you're generalizing, looking to generalize on is by default, a lot of the table and column references that are auto-generated. Whether you're using a record button, whether you're using the red drop-down from the platform, those are generally explicit. It's going to say data table this name, column this name. That doesn't always scale when you generalize.

One thing you might find yourself learning and having to manually modify the JSL is around column and data table references, how to manage those, how to make those dynamic. The number of columns I'm importing might change depending on what a user is bringing in, and I need to be able to support that. Once you do get into this realm, almost certainly you will have to write some JSL, code some JSL. But again, JMP is going to do 90% of the work for you with a lot of that workflow recording, the log, as well as all of those red hotspots and script save options.

You find yourself having to write JSL, where do you go? I think I've worked with a lot of different data tools, data softwares, coding languages. The JMP learning resources are really great. Natively, you have the scripting index in the application itself. These days, you can use the LLM assistance for JSL troubleshooting or even generation. I don't think it's as good as something like Python, partially because the training data set is far smaller than what we have for Python, but it can still get you on your way.

This is really the unmatched part is the JMP community. Use the forums. Here's a specific example from earlier this year. I wrote this question, and the one thing I want to call it is the time. I submitted this at 7:15 PM. Jim here, the legend, responded at 9:45 PM, same day. The next morning, Mark responded in about 12 hours. I got two responses by experts, to my questions in less than 24 hours. That's been my experience every time I posted a forum.

Shout out to everyone, to Mark, Jim. I see your names all the time, but really everyone on the forums use those. They're invaluable for your learning. With that, I want to wrap it up and urge you to look within your organization, within your teams, within your labs, whatever your work is, and look for high impact opportunities where you could apply generalized automation. In our case, we saw an area where we were spending hours per experiment running 50 plus of those types of experiments for a month. That was one person's entire time really spent dealing with that data.

Once you commit to that, manage and plan for complexity, we had a lot of variability in our experimental design and our plate screening process. We addressed that by both getting the user options, constraining what we're supporting reporting, but also leveraging the JMP UI to say, "Hey, from here, it's up to you." You can still visualize this any way you want. That leads to implementing these advanced workflows, and that can eliminate repetitive tasks and also use it as a way to standardize analysis. I think what I've often seen is we like to standardize things, but once it comes to data analysis, we consider it artisanal.

There's truth to that, but there's also elements where there are opportunities to set best practices. How should we be looking at this data? What is the right statistical test for the characteristics that this data takes? This gives you an opportunity to codify those in something like a JMP workflow.

Presented At Discovery Summit 2025

Presenter

Stefan Moser

Skill level

Beginner

Beginner
Intermediate
Advanced

Using JMP to Improve and Automate Analytic Workflows in Complex Processes (2025-US-30MP-2498)

Presenter

Skill level

Files

Automation and Scripting

Data Blending and Cleanup