Data quality assurance in high-throughput R&D projects to enable fast and smart decision-making (2020-US-30MP-642)
Carlos Ortega, Project Leader, Avantium
Daria Otyuskaya, Project Leader, Avantium
Hendrik Dathe, Services Director, Avantium
Creativity is at the center of any research and development program. Whether it is a fundamental research topic or the development of new applications, the basis of solid research rests on robust data that you can trust. Within Avantium, we focus on executing tailored catalysis R&D projects, which vary from customer to customer. This requires a flexible solution to judge the large amount of data that is obtained in our up to 64 reactor high-throughput catalyst testing equipment. We use JMP and JLS scripts to improve the data workflow and its integration. In any given project, the data is generated in different sources, including our proprietary catalyst testing equipment — Flowrence® —, on-line and off-line analytical equipment (e.g., GC, S&N analyzers and SimDis) or manual data records (e.g., MS Excel files). The data from these sources are automatically checked by our JSL scripts and with the statistical methods available in JMP we are able to calculate key performance parameters, elaborate key performance plots and generate automatic reports that can be shared directly with the clients. The use of scripts guarantees that the data handling process is consistent, as every data set in a given project is treated the same way. This provides seamless integration of results and reports, which are ready-to-share on a software platform known to our customers.
Speaker | Transcript |
Carlos Ortega | Yeah. Hi, and welcome to our presentation at the JMP Discovery Summit. |
Of course, we would have liked to give this presentation in person, but under the current circumstances, this is the best way we can still share the way we are using | |
JMP in our day-to-day work and how it helps us actually on the more day-to-day work, how to rework our data. However, | |
the presentation in this way with the video has also an advantage for you as a viewer because yeah if you want to grab a coffee right now you just can hit pause and continue when the coffee is ready. | |
But looking at the time, I guess the summit is right now well under way. And most likely, you heard already quite some exciting presentations. | |
How JMP can help you to make more sense out of the data to solve them a statistical tools to gain deeper insight and dive into | |
more parts of your data. However, what we want to do today (and this is also hidden under the title about the data quality assurance), | |
the scripting engine. | |
Everything, which has to do with JSL scripting because we help...this helps us a lot on our day-to-day work to prepare the data, which are then ready to be used for data analysis and by we I mean | |
Carlos Ortega, Daria Otyuskaya, and myself, which I now want to introduce a bit because, yeah, that's the to get a bit better feeling on who's doing this. But of course, as usual, there are some | |
some rules to this, which are | |
the disclaimer about the data we are using. And now if you're a lawyer for sure you're going to press pause to study this in detail. However, for all other people right now, let's dive into the presentation. | |
And of course nothing better than to start with a short introduction of the people you see, you see already the location. We all have in common, which is Amsterdam in the Netherlands and we all have in common that we work at Avantium. | |
company provider for sustainable technologies. However, the different locations we are coming from is all over the world. | |
We have, on the one hand side on the left side, Carlos Ortega, a chemical engineer from Venezuela, which lives in Holland, about six years and works at Avantium about two years as a project leader and services. | |
Then we have on the right side Daria Otyuskaya from Russia also working here for about two years and spending the last five years in the Benelux | |
area where she made her PhD in chemical engineering. And myself. I have the only advantage, can that I can travel home by car as I origin from Germany. I live in Holland since about 10 years and join Avantium about three years ago. | |
But now, | |
let's talk a bit more about Avantium. I just want to briefly lay out a bit of the things we are doing. | |
Avantium, as I mentioned before, provider for sustainable technologies and has three business units. One is Avantium Renewable Polymers, where we actually develop biodegradable polymer called a PEF, which is hundred percent plant based and recyclable. | |
Second, we have a business unit called Avantium Renewable Chemistries, which offers renewable technologies to produce chemicals like MEG | |
or industrial sugars from non food biomass. And last but not least, a very exciting technologies, where we turn CO2 from the air into chemicals via electro chemistry. | |
But not too much to talk about these two business units because Carlos, myself and Daria are all working in the Avantium Catalysis, which was founded in 20 years ago and it's still the founding...the fundamental of Avantiums technology innovations. We are actually providing their | |
We are a service provider in accelerating the research in your | |
company in the catalysts research, to be more more specific. And we offer there, as you can see on the right hand side, systems services and a service called refinery catalyst testing. And what we help companies really to develop the R&D, as you see at the bottom for this. | |
But this is enough about Avantium. Let's talk a bit how we are developing how we are working in projects and how JMP actually can help us there to accelerate the stuff and get better data out of it, which Carlos then later on the show in a demo for us. | |
As mentioned before, we are a service provider and as a service provider, we get | |
a lot of requests from customers to actually develop better catalysts, or better process. And now you might ask yourself, what's the catalyst. | |
A catalyst is actually a material which participates in a reaction when you transform A to A, but doesn't get consumed in a reaction. | |
The most common example of people, which you can see in your day-to-day life is, for example, the exhaust gas catalyst which is installed in your car, which turns | |
off gases from your ...from your car into CO2 and water as an exhaust. | |
And this is things which we get as requests. People come to us and say, "Oh, I would like to develop a new material," or things like, "I have this process, and I want to come with...accelerate my research and | |
Develop a new process for this." And what they use there is when we have an experiment in our team, we are designing experiments of... | |
designing experiments. We are trying to optimize the testing for this and is all we use JMP, but this is not what we want to talk today about. | |
Because as I said before, we are using JMP also to actually merge our data, process them and make them ready for things, which is the two parts, which you see at the bottom of the presentation. | |
We are executing research projects for customer in our proprietary tool called Flowrence, where the trick is that we don't experiment...don't execute tests, one after another, but we execute in parallel. | |
Traditionally, I mean, I remember myself in my PhD, you execute a test one reactor after another, after another, after another. | |
But we are applying up to 64 reactors in parallel, which makes the execute more challenging but allows a data-driven decision. | |
It allows actually to make more reliable data and make them statistically significant. | |
And then we are reporting this data to our customers, which then can either to continue in their tools with their further insights or completely actually rely on us | |
for executing this data and extracting the knowledge. But yeah, enough about the company. And now let me hand over to Carlos, which will explain how JMP and JMP script actually helps us to make us our life significantly easier. | |
Thank you, Hendrik,for the nice introduction. And thank you also for the organizers for this nice opportunity to participate in the JMP discovery summit. | |
So as Hendrik was mentioning, we develop and execute research projects for third parties. And if we think about it, | |
we need to go from design of experiments (and that's of course one very powerful feature from JMP), | |
but also we need to manage information and in this case, as Hendrik was was mentioning, we want to focus on JSL script that allows us to easily handle information and create seamless integration of a process workflows. | |
I'm a project leader in the R&D department and so a day...a regular day in my life here would look something like this. And so very simplistic view. You would have | |
clients who are interested and have a research question and I design experiments and we execute these in our own proprietary technology called Flowrence. | |
So in a simple view the data generated in the Flowrence unit will go through me after some checks and interpretation will goes back to the client. But the reality is somewhat more complex and on one hand, we also have internal customers. | |
That is part of...for example our development team...business development team. And on the other side, we also have our own staff that actually interacts directly with the unit. So they control how the unit operates and monitor everything goes according to the plan. | |
And the data, as you see here with broken lines, the data cannot be struck directly from the unit. The data is actually sent to a data warehouse and then we need a set of tools | |
that allows us to first retrieve information, merge information that comes from different sources, | |
execute a set of tasks that go from cleaning, processing, visualizing information, and eventually we export that data to the client so that the client can get the information that they actually need and that is most relevant for them. | |
If you'll allow me to focus for one second on these different tasks, what we observed initially in the retrieve a merge is | |
that data can actually come from various sources. So in the data warehouse, we actually collect data from the Florence unit, | |
but we also collect data from the analyzer. So for those that they're performing tests in a laboratory, you might be familiar with the mass spectrometry or gas chromatography, for example, and we also collect data on the unit performance. So we also verify that the unit is | |
is behaving as expected. | |
In...as in any laboratory, we would also have manual inputs. And these could be, for example, information on the catalysts that we are testing or calibration of the analytical equipment. | |
Those manual inputs are always of course stored in a laboratory notebook, but also we include that information into an Excel file. And this is where JMP is actually | |
helping us drive the work flow of information to the next level. So what we have developed is a combination of an easy to use vastly known Excel file with | |
powerful features from a JSL script. And not only we include manual data that is available in laboratory notebooks, but we also include in this Excel file | |
formulas that are then interpreted by the JSL script and executed. That allows us to calculate key performance parameters that are | |
tailored or specifically suited for different clients. | |
If we look | |
in more detail into the JSL script, and in a moment I will go into a demo, you will observe that the JSL script has three main sections. | |
One section will prepare the local environment. So on one side we would say we want to clear all the symbols and close tables, but probably the most important feature is when we | |
define "names default to here." So that would allow us actually to run parallel scrapes without having any interference between variables that are named the same in different scripts. | |
Then we have section that is collapsed in this case so that we can show it actually that creates a graphical user interface. And then the user does not interact with the script itself, but actually works through a simple graphical user interface with the buttons that have descriptive | |
button names. | |
And then we have a set of | |
tasks that are already coded in the script. In this case, they are in the form of expressions. | |
Because well, it has two main advantages. One would be a it's easy to later on implement on the graphical user interface. And second, when you have an expression, you can use this expression several times in your code. | |
OK, so moving on into the demo simulation. So I mentioned earlier that we have different sources of data. And | |
on one side we have data that is | |
in fact... | |
that is in fact stored in our database. And this database will contain | |
probably different sources of information, like the unit or different analyzers. In this case, you will see or you see an example Excel table. | |
This only for illustration. So this data is actually taken from the data warehouse directly with our JSL script. So we don't look at this Excel table as a search. | |
We let the software collect the information from the data warehouse. And probably what is most important is that this data, as you see here, | |
can come again from different analyzers, and we're structuring somehow that the first column contains divided names. In this case, we have | |
made some domain names. So, for reasons of confidentiality, but also you will see that all the observations are arranged in rows. So every single row is an observation. | |
And depending on the type of test and the unit we are using, we could think that overall in one day we can collect up to half a million | |
data points in one single day. That depends of course on the analyzer, but you immediately are faced with the amount of data that you have to handle and how JSL script that helps you process information can help you with this activity. | |
Then we also use another Excel file. And this one is also very important, which is an input table file. | |
And this files, specifically with the JSL script, are the ones creating the synergy to allows us to process data easy. What you see in this case, for example, is a reactor loading table and we see different reactors with different catalysts. And this information that seems... | |
is not quantitative, but the qualitive the value is important. And then if we move to a second tab, and these steps are all predefined across our projects, | |
we see the response factors for the analyzers. Different analyzers will have different response factors and it's important to log this information into use through the calculations to be able to get quantitative results. | |
In this case, we observed that the condition that the response factors are targeted by condition instead. | |
Then we have a formula tab. And this is probably a key tab for our script. You can input formulas in this Excel file. You make sure that the variable names are enclosed into square brackets. | |
And the formula, you can use any formula in Excel. Anyone can use Excel; we're very much used to it. So if you type a formula here, | |
that follows ??? syntax in Excel, it will be executed by our JSL script. Then we also included an additional feature we thought it was interesting to have conditionals. | |
And for the JSL script to read this conditional, the only requirement is that the conditionals are enclosed in braces. | |
There are two other tabs I would like to show you, which are highly relevant. One is a export tables tab and the reason that we have this table is because | |
we generate many columns or many variables from my unit, probably 500 variables. But actually the client is only interested in 10, 20 or 50 of them. Those are the ones that really add value to their research. So we can input those variables here and | |
send it to the client. And last but not least, I think many of us have been in that situation where we send an email to a wrong address and that can be actually something frightening when you're | |
talking about confidential information. So we always double, triple check the email addresses and but does it...is it really necessary? | |
So what we are doing here is that we have one Excel file that contains all manual inputs, including the email address of our clients. | |
And these email addresses are fixed so there is no room for error. Whenever you have run the JSL script the right email addresses will be read and the email will be created and these we will see in one minute. | |
So now going into the JSL script, | |
I would like to highlight the following. So the JSL script is initially located in one single file in one single folder and the JSL script only needs one Excel file to write that contains different tabs that we just saw in the previous slide | |
Once you open the JSL script, | |
you can click on the run script button and that will open the | |
graphical user interface that you see on the right. Here we have different options. In this case we want to highlight the option where we retrieve data from a project in that given period. | |
We have selected here only one day this year, in particular, and then we see different buttons that allows us to create updates, for example. | |
Once we have clicked on this button, you will see to the left on the folder that two directories were created. The fact that we create these directories automatically help us to have harmony or to standardize how is a folder structured also across our projects. | |
If you look into the raw database data, you will see the two files were created. One contains the raw data that comes directly from the data warehouse. | |
And the second, the data table contains all merge information from the Excel file and different tables that are available in the data warehouse. | |
The exported files folder does not contain anything at this moment, because we have not evaluated and assessed the data that we created in our unit is actually relevant and | |
valuable for the client. We do this, we are, we ??? and you see here that we have created a plot of reactor temperature versus the local time. | |
And different reactors would be plotted so we have up to 64 in one of our units. And in this case we color the reactors, depending on the location on the unit. | |
Another tab we have here, as an example, is about the pressure. And you see that you can also script maximum target and minimum values and define, for example, alerts to see if value is drifting away. | |
The last table I want to show is a conversion and we see here different conversions collapsed by catalyst. | |
So once we click the export button, we will see that our file is attached into an email and the email already contains the | |
addresses...the email addresses we want to use. And again, I want to highlight how important it is to send the information to the right person. | |
Now this data set is actually located into the exported files folder, which was not there before. And we always can keep track of what information has been exported and sent to the client. | |
With this email then it's only a matter of filling in the information. So in this case, it's a very simple test. So this is your data set, but of course we would give some interpretation or gave maybe some advice to the client on how to continue the tests. | |
And of course, once you have covered all these steps you will close the graphical user interface and that will also close all open tables and the JSL script. | |
Something that I would like to highlight at this point is that these workflow using a JSL script is is rather fast. So what you saw at this moment, | |
of course, it's a bit accelerated because it's only a demonstration, but you don't spend time looking for data and different sources, trying to merge them with the right columns. All these processes are integrated into a single script and that allows us to | |
report to the client on a daily basis | |
amounts of data that otherwise would be would...would not be possible. And the client can actually take data driven decisions with a very fast pace. That's probably the key | |
message that I want to deliver with with this script that we see at this moment. | |
Now, well, I would like to wrap up the presentation with with some concluding remarks and some closing remarks. | |
And so on one side, we developed a distinctive approach for data handling and processing. And when we say distinctive it's because we have created a synergy between an Excel file that most people can use because you are very familiar with Microsoft Office and a JSL script | |
which doesn't need any effort to run. So you click Run, you will get a graphical user interface and a few buttons to execute tasks. | |
Then we have a standardized workflow. And that's also highly relevant when you work with multiple clients and also | |
also from a practical point of view. For example, if one of my colleagues would go on holiday, it will be easy for another project leader for myself, for example, to take over the project | |
and know that all the folder structures are the same, that all the scripts are the same and the buttons execute the same actions. | |
Finally, we can also...we can guarantee seamless integration of data and these fast updates of information with | |
thousands or even half a million data points per day can be quickly sent to clients and then this allows them to take almost online data driven decisions. | |
At the end, our purpose is to maximize the customer satisfaction through a consistent, reliable and robust process. | |
Well, with this, I would like to thank, again, the organizers of these discovery summit. | |
Of course, to all our colleagues at Avantium, who have made this possible, especially to those that have worked intensively on the development of these scripts. And if you are curious about our company or the work we do in Catalysis, please visit one of the links you see here. | |
And with this, I'd like to conclude, thank you very much for for your attention. And yeah, we look forward to your questions. |