A Complete Workflow for Analyzing and Reporting Experimental Data

Developing and manufacturing biopharmaceuticals involve many standardized experiments and reports. Most reports contain information from the associated protocol, product quality data from a laboratory information management system (LIMS), information from lab documentation, and performance data from different devices. Currently, these data sources are processed with JMP, Excel, and other software, and then assembled in PowerPoint presentations and Word documents.

I demonstrate an automation of this complete process that was developed in-house with JMP. With the help of a journal, the user is guided through an automated workflow. It uses LIMS data and lab documentation to fill in, e.g., the responses of a design of experiment (DoE) series automatically and proposed illustration options once the subject matter expert (SME) has built a model. These illustrations, chromatography traces, and information from the protocol are then assembled into a PowerPoint presentation and a Word report following a preset template structure.

Generating this automated report saves a significant amount of time for the SME, allowing more time to focus on interpreting the results. Furthermore, by keeping the automation in a single software, data integrity is intact and layout and evaluation standards are followed in every report.

Hello and welcome. I would like today to show you a workflow for gathering, analyzing and reporting experimental data. Because I'm presenting for the Discovery Summit, I decided for a DOE report as an example, but of course this tool is very general and can be used for all kinds of reports.

As a short background to what we are doing, Rentschler Biopharma is a contract manufacturing and developing organization for biopharmaceuticals. That means we are developing purification processes for any kinds of biopharmaceuticals.

Most of them are, of course, antibodies. We have over 120 of those, but we also produced the COVID-19 mRNA vaccine for BioNTech Pfizer. And then we have a rather large portfolio with over 140 therapeutic modalities, most monoclonal antibodies that is kind of the simple boring ones, but also multi specific antibodies, fusion proteins, recombinant enzymes, all other things which are a little bit more exciting.

The disadvantage if you have 140 different modalities, is that you have to write 140 different reports, and when you produce something, for example, every year then you also write yearly reports. For every of these projects, what we do, we write reports and in addition also because we work for customers, we do PowerPoint presentations, sometimes weekly.

Of course that can take quite a lot of time to prepare, so the idea is to reduce this time by doing it automatically with JMP. For that I brought one example and that is a DOE report. So you can imagine whenever you develop a process then you will have to do quite some DOEs to understand how this works, how the process is robust, if it's robust, to optimize it, all these things. We use a DOE approach for this and every process has quite some process steps, so every process can maybe get then 5 or 10 reports, so that is quite a lot of work. On the other hand, it's very standardized, so it's very nice as an example to harness the power of automatization.

The only thing you need for this is a JMP table, a typical DOE table which JMP will generate once you ask it to make you an experimental design. You need a protocol that you anyway need, every customer requires it, and in the end you also need at least JMP-18. That just comes from the fact that JMP-18 has a nice Python integration which makes everything a little bit easier, so a lot of the things you will see today are working with Python, however the user will never feel that, but it has the advantage that you can do a little bit exciting things with Python.

I will first show you a little bit how that looks in theory, and afterwards give you a practical example. So how does it look? You start with an empty DOE table, which you can see here on the left side. That's some run ID, a pattern, different factors and responses, and they are filled automatically with a few clicks from LIMS. That has the advantage that instead of copy pasting data from LIMS and eventually making some errors, the computer does it automatically.

It also locks automatically the columns to ensure that there is no error, and it also tracks every single step the user does during this progress. That is very helpful when you think about data integrity, so it is very clear where the data is coming from. They cannot be manipulated by accident, and that makes our quality assurance then very happy.

After the data is copied into the DOE table, the subject-matter expert has to build their model. Nothing has changed there. You can do it however you would like to do it. JMP build has already all these capabilities you need for that, and there's nothing changed for that. You can just save this model, and then you go back to the tool to build all kinds of plots which we usually like to have in our reports.

These include, for example, a replicate plot where you can see how does your data spread over the different runs, a summary of fit where you can estimate the quality of your model, parameter estimates telling you which parameter is responsible for a specific response, and different kind of profiler plots to give you a overview of which factors influence which response, and finally, some contour plots if you have some 2 or three-dimensional interactions, and you can directly see that in a single plot to maybe find an optimum or localize any kind of issue on the edge of failure or anything like that.

Then what is done is this data is assembled into a presentation or can be assembled into a presentation. That's just from the idea that our customers like to get updated about our progress and for that all these plots it has generated can be easily assembled into a PowerPoint presentation. That in essence means all figures are added, and some header is also automatically added, and then the SME just has to write interpretation to it.

The second one, that's the thing which I will show you in a second, is everything is assembled into a report. This report will then contain data from LIMS, that's some SQL database, from our laboratory electronic lab documentation that will include data like hold times and stuff like that. Then it can also contain device export like chromatograms or filtration plots, and finally data from the protocol. That's some experimental plans, some namings, all these kind of things.

How that looks, you can see a little bit on the right side. You have tables added from the protocol or from LIMS, you have plots from the devices which are then also shaded in any way you need, like pump performance or chromatographies and analytical plots.

The aim of all this is that you go from a state where you have to do all these things manually and consequently, of course, error-prone to a target state where the SME is just building the model, JMP is generating a draft report which is then just interpreted and reviewed by the SMEs, saving a lot of time and of course a lot of frustration.

I hope you will like that, and I would like to show how does that look in practice. What you see here is typical JMP journals, so for my automatization tasks I usually use journals. They have short introduction and links and then a lot of these outline items which guide you through the process.

You start in the beginning and in essence it tells you what you want to do, you load the LIMS data, logically you have to start with that, and then you will always find a button. So let's just click on that, which will ask you for an empty DOE table. Try it and just prepare it. It looks like this.

That's a typical table you are getting. All the factors are in there and some run numbers. There are scripts which are automatically put there by JMP, and of course the responses are still empty. Then it asks you to get the data from LIMS. I cannot show it directly to you because I have to use random data.

However, the effect is the same, so what you see is the responses are automatically filled, and in addition, it is colored green. That is just to indicate that this data is verified. First of all, it means that it is locked, so you cannot change it accidentally. But it's also saved the data, so even if you would save now the data or manipulate the data, it would afterwards notice that there's a discrepancy and would tell you that you have an issue with the verification.

It also saves all data extraction steps. That's table variables which you can see here. What you can then do is start doing all kinds of plots. So for example, always nice is the replication plot. What you have to do, you just click on Make Replication Plot.

The first thing it asks you is for a table that will only ask once in every session. In essence, this data verification works not only on the basis of tables, but also on plots with all kinds of figures. We have no figure yet done, so let's cancel that. Then it will just make a new one. What it now does is do this nice replication plot, which we can see here with all the responses and how over the replicates changes, the blue are the center points.

What it also did on the site is doing a nice table or a plot verification table. They can directly see which data is in there, so how this is the file, the file name of it, which steps somebody did, some manual verification because random data I invented. I did it just right now. I did the file.

Then there is this hashes MD5 value, that is in essence the value you can calculate for every single file. In this way, you can see if the file you're using does not only have the same name but is really an identical file. So there's just some safety that not somebody else saved a similar file with the same name, and then just considered validation even though it is not. As I mentioned, it's random data, so the source is not verified, and it tells you that's some kind of an issue, but it doesn't really matter for this demonstration.

You have seen that you can close it. All these plots are automatically saved in the folder. Then you have to do a model. You can use it, do however you like to do it. Let's click on Run, and then you can remove a few factors. We can politely ignore this. I'm always surprised how great random models you can generate, just like that.

You can also here remove a few factors. It just looks random. See some differences, and you can, of course, do that however you like. Let's just leave it to that. You can save that to the data table if you like.

And then you can start with making some plots. For example, you can make a summary of fit plot that looks like that. That gives you an overview of how good your response is. When the data model is, of course, terrible, however, at least you have a nice overview of how bad your model is. Enclose that you can understand which parameters play a role for your model.

You just click again. It makes it automatically for every single response. This is now then also saved, so every one of these factors are responsible for this response. Then the last or next thing would be prediction plots. Predictions give you a nice overview of how it looks.

Just also click on make plots, and then you get a bunch of them. So depending on the factors you have, you can get every one individual, every response with every factor. You might like that. You get if for response D, for example, three factors are responsible, then you get an overview plot where you can see them all in one. That's kind of some nice overview.

Of course, you can close that as well. Finally, we can do contour plot if you like to do that. Pred factors, do the prediction formulas of vector 1, or maybe 3. You can choose your color scheme, just leave it like that, and then you get some nice contour plot where you can clearly see how it is distributed, how they affect themselves.

That is in essence all, you can, of course, generate more of them when you have more data, but let's leave it to that, and we can just close this, we can save it. Maybe I'll show you shortly. For all these figures we noted, it generated one final more row, and you can kind of see which data extraction steps we did, the file name, and then this hash value. We save that.

Okay, you now have tons of data points. Now the only thing you will have to do is adding it to a report, and of course you could copy paste all this because they're all saved, but it's much more fun if you just have to click, and let JMP do the work. So you need to find your folder. This is, as I said, a tool which works for all kind of reports.

So for every kind of report, need your own table. I'll show you that in a second. You can see that you have here different sources. So this table is what instructs JMP to afterwards do the report from which source it takes, which data points, so from the protocol it copies a few tables. From the protocol it copies intermediate step coding, run scheme, some parameters, overview of batches, all your filters, stuff like that.

Then from the DOE plotting tool, it would copy the residual plots, parameter estimates, prediction, contour plots, and from LIMS data, that's another tool, it would get you the list of the used analytical methods. Then you need, of course, a template. Every report has its own template. I prepare Q1 for privacy reasons. I need a folder and, of course, you also need a protocol.

Manipulate process steps. If you like to do that, you can have some settings. We need to know which are the DOE tables. You just click on Okay, then you wait a few seconds, and then you see 22 items have been added. And that's all without that you have to spend any time on that.

And then you can directly see here it generates this data source table, so it tells you what it did. It copied intermediate coding, step coding from the protocol. This is the file. It did it with that and so on. It does it with a bunch of tables from the protocol, then quite some figures which have been used now in the other tool.

It writes them in this data source table, and it tells you, yeah, they're not verified because it's random data, but in general you get a full list of how it looks like. And then, of course, you also get a full report. This is now what you will get as a draft. The report contains quite some information, of course, has to be filled. I did have to shorten it a little bit for privacy reasons, but I can show you how it looks in general.

Of course, you have to write a copy of the summary, but you can start with an overview of the predictions, so which factor influence which response. If it's green, then it is significant. If not, it's not. That depends on your [inaudible 00:17:06] model, which we did. Then, of course, some data I deleted.

Then you have a few buffers which have been extracted from the protocol. There were filters which were also found in protocol, step coding was a whole protocol, then a list of analytical methods and which versions have been used for which run. That comes directly from LIMS. Run scheme, which has been also from the protocol, and some more data from the protocol, and then later on the results, the table we just did, and of course all the plots we just did. It's a white list, residual plots.

You get then the summary of it, and for each single response, you then get the risk, the parameter estimates, and you get the prediction plots that's the one which are significant, and if you have it, also the control plot. And then for each single response, so you can imagine, it always does it automatically, and then you have a full table. Then, of course, evaluation you have to write yourself and the conclusions, of course, as well. It finishes it with abbreviations. This is something which it does automatically. If the abbreviation is used in that report, it is added in this list.

So all of that, that's now a very short report with, I don't know, 30 pages and 20 elements, but of course it also works for much larger reports with many more elements. Especially something like some overview of the data sources is saving a lot of time because otherwise somebody has to check that.

Let me finish. I hope I have convinced you that you can save significant time by using JMP in Python to automatize your workflows. It also minimizes errors because there is no copy paste or anything else necessary, and you at the same time also protocol every single step who did the step, when did the step, which tool did the step, and so on.

And this last module, you get a standardized report format, which is important. For example, I'm working for the downstream purification department, meaning that I'm responsible for purifying what we get from the upstream department. Of course, it is important that when a customer gets reports from us that they look more or less the same, same colors and so on.

With that, I am at the end of my presentation. I would be very happy to see you all at the Discovery Summit in Berlin, and if you have any questions, feel free to send me a message. Thank you very much.

Presented At Discovery Summit Europe 2025

Presenter

Ole Herud-Sikimic

Skill level

Beginner

Beginner
Intermediate
Advanced

Files

Ole - Discovery Summit 2025 - Automatic report generation.pdf

A Complete Workflow for Analyzing and Reporting Experimental Data

Presenter

Skill level

Files

Automation and Scripting

Design of Experiments

Sharing and Communicating Results