The Power Behind Data: Historical Data Analysis of a Biopharmaceutical Process (2021-EU-30MP-759)

1 Kudo

Level: Beginner

Anne-Catherine Portmann, USP Development Scientist, Lonza
Sven Steinbusch, Senior Project & Team Leader, Microbial Development Service USP, Lonza

Often, the analysis of big data is considered to be essential in the fourth big industrial revolution – the “Data-Based Industrial Revolution” or “Industry 4.0.” However, handling the challenge of unstructured data or a less than in-depth investigation of data prevent using the full potential of the existing knowledge. In this presentation we offer a structured data handling approach based on the “tidy data principle,” which allowed us to efficiently study the data from more than 80 production batches. The results of different statistical analyses (e.g. predictor screening, machine learning or PLS) were used in combination with existing process knowledge to improve the overall product yield. With the newly created knowledge, we were able to identify certain process steps that have a significant impact on the product yield. Additionally, several models demonstrated that the overall product yield can be improved up to 26 percent by the adaptation of different process parameters.

Auto-generated transcript...

Speaker	Transcript
Anne-Catherine Portmann	Hello, today I will present you the power behind data. This presentation is based on the idea that a principal this presentation will allow us to efficiently study the data of more than 80 production batches.
	We were able to improve the product yield of more than 26% based on the process knowledge and the statistical analysis. The statistical analysis allows also to identify the key process step which have an impact on the product yield.
	So I will
	first introduce Lonza Pharma Biotech and then we will go to the historical data analysis.
	Lonza Pharma Biotech was found firstly in 1897 and shortly thereafter, it was transformed
	to chemical manufacturer.
	Today we are one of the world's leading supplier
	to pharmaceutical, healthcare and life sciences industry.
	Here at Visp, we are one of the biggest site from Lonza
	and we are most significant for R&D, development and manufacturing.
	We are, we have a new part of the company, the Ibex solutions, where we are able to complete biopharmaceutical cycles from preclinical to commercial stage, from drug substances to drug product, all of this in one location.
	You probably heard about this lately in the Moderna vaccine against the COVID-19,
	but it's not the only product that we are producing here in this. We are also producing small molecules, mammalian and microbial biopharmaceuticals, high potent APIs, peptides and bioconjugates including antibody-drug conjugates.
	Now that you know a little bit more about Lonza, I will go to the historical data analysis.
	So, first of all, I will present to you this process on which the 80 batches are run.
	So, first the upstream part. So the upstream part have first the fermentation part, where the product is generated by the micro organism.
	So the product make a microorganism.
	The...
	product to produce...the product, the microorganisms
	is produced (???) from the DNA
	during fermentation. Then we have the cell lysis where we disturb the cell membrane and allowed to release the product and all what is in the cell.
	and have access to this product. Then become the separation. In the separation part, we remove the cell fragments, such as the cell membrane or the DNA.
	Then we come to the downstream part, which is based on three different chromatography and allow the purification of the product. So the product is here in yellow in the
	below part of the slide. And we can see that during each of the chromatography part, we are able to profile a little bit more of the product. At the end, we perform a sterile filtration of the product.
	So the goal was to increase the overall project yield, and to do that, we first collect the data of the 80 batches and order them in a way that we can analyze them.
	Then we perform yield analysis.
	And then we discuss the result with the process analysis.
	With the SMEs, so the subject matter experts.
	Then we have seen and...we went to the data analysis for the upstream part and we perform this for analysis on the left of the slide.
	Then we go for the downstream part and focus on the Chromatography 1. At the end, we make a conclusion from all what we see in the...in the analysis and what the subject matter expert orders. And at the end, we recalculate the yield.
	Let's see what...how we organize our data. So we based the data on the tidy data principles.
	That is a big part of the...before the analysis, which takes time, but it's really important to have clean data and making an efficient analysis afterwards. So first we have about, that is, the observational unit one, for example, we can say to the fermentation.
	And then on each row of the...of the file we include one batch each time.
	For each column, we take a parameter. For example, for fermentation, the pH, the temperature, all the
	titer (that means
	the amount of the product at the end).
	And then, for each values,
	here corresponds the correct value from the column and the batch.
	And with this one, we can go to JMP and perform the analysis.
	So let's see how we calculate the yield.
	So, first we calculate the yield for each step, beginning at 100% for fermentation
	and see how it decreases along the process.
	So what we observe is that we have a big variation at the fermentation step.
	And then we have a decrease in the in the product amount
	at the separation step, as well as the chromatography 1 step.
	And so we go with this data to the subject matter experts
	and they told us that the complex media variability impacts the final titer of fermentation, so we have to explore this spot.
	Then, for the separation, the strategy that was choose could have a different impact on the mass ratios.
	And for the chromatography 1, the pooling strategy have most probably an impact.
	So, then we will see what the data said. So we look at the upstream part and perform different analysis. So the first analysis was the multivariate analysis of each of these USP process stages.
	So we focus on the fermentation, cell lysis, and separation. And see all the parameters, how they could correlate with the product at the end.
	So here, what we see the fermentation, the amount added to Reactor 1 had a medium correlation with a good significance probability. For separation, the final mass ratio.
	The mass ratio at the intermediate separation have both a major impact on a significant probability.
	You see that other parameters, such as the initial pH from Reactor 2 is very close to the medium correlation threshold
	and have a significant probability. And we will see if this parameter in the next analysis.
	We have also selected here only the parameter, which is scientifically meaningful
	for the other analysis also.
	Then we went to the partial least squares. For the partial least squares, we see that we have for fermentation a positive correlation for all these parameters. So again we see the amount of Reator 2, the initial pH of Reactor 2 and the initial amount of Reactor 2.
	As well as a new parameter, that is the hold time.
	And we see that
	the amount of Reactor 2 have a positive correlation with this analysis, but the negative with MVA.
	And this could be explained, because of the 80 batches whereby just...which were running production, but they were not designed to answer a question of positive or negative correlation on the product...on the final product. So
	that could be done in the future, in another analysis with a proper design.
	With ??? to still say that we have an impact on the final product.
	For the other parameters, at the other steps of the upstream part, we also see that the prediction matches the multivariate results.
	And we have also a possibility to improve the titer. Here we see with the prediction profiler that we can also optimize
	in the future and the project yield.
	Then we test the product...the predictor screening. And here we ran 10 times the predictor screening and the five parameters which will always found in the top 10 were selected. And here what we see. It's the initial pH of Reactor 2,
	the mass ratio at the end of separation, the mass ratio at intermediate separation,
	the initial amount of Reactor 2,
	the amount of Reactor 2, and the amount of Reactor 1. So again, we have the same parameter that appears to have an impact.
	So, then, we went to the machine learning.
	This machine learning analysis, XGBoost, is a decision tree machine learning algorithm. And to avoid having in this result parameter that's not part
	of the...
	of the top of our parameter, we include a fake parameter, which give us a kind of threshold in the parameter importance.
	And all the parameters which appear above this threshold were considered to have an impact. The other are considered to be random and below this random parameter and have no impact or not significant impact on the on the final product.
	And here we can see that the negative correlation will appear for Reactor 1. The pH of Reactor 2 and initial amount of Reactor 2, we have a positive correlation.
	And for the mass ratio, we have a negative correlation. Again, as I explained before, the difference between negative and positive correlation was not the goal and not designed for this experiment, so we know it's an impact, but we don't know if it's positive or negative yet.
	Then we will go to the downstream part,
	specifically on the Chromatography 1.
	And here we use the neural predictive modeling.
	So in the normal predictive modeling we use the
	a different fraction of the chromatography. So on the graph on the right, we see that Fraction 8 have...is the main fraction, so where we found most of the product, the highest purity. And then by decreasing from Fractions 7-1, we have product, but also more impurity.
	and
	Until now, we were taking into account of the Fraction 4 in our analysis and we would like to see if we can include also Fractions 3, 2 and 1.
	And what we saw is that the by increasing the number of fraction, we increase the yield but decrease very few the purity.
	So the graph on the left, we see that if we go to the Fraction 2, we decrease the purity to less than 1% but the yield was increased to 5%. And then, when we include the Fraction 1, there we have a bigger decrease of the purity but it's a little bit than 1%.
	More than one percent of purity decrease and the yield was in the other side increase of about 10%.
	So then, with this result we try to summarize everything together.
	So far, fermentation, we have a final volume of the tanks and reactor, which were identified by most of the methods.
	The initial pH of the fermenter was also identified by the different analysis methods and the complex compound variability by the process experts. And to be able to see the effect of the complex compound variability, we will need further investigation in the lab.
	Then for separation, we have the mass ratio, which was identified by some methods,
	analysis of the data but also with the process...by the process experts.
	And the strategy is very interesting.
	The process expert decide to look at it and try to make some tests to be able to improve the yield in the production.
	For the Chromatography 1, the pooling strategy was identified by the process expert on the neural network analysis. And here
	the method can be easily implemented in the lab and also in the production. And the yield is really increasing a lot with with this method.
	So, then, we recalculate with the prediction...
	prediction profiler how we can increase our yield of the different steps. And for fermentation we were able to increase the yield up to 16%.
	For the separation, we are able to increase it up to 5%, and for Chromatography 1, here we've raised
	up to 5%. In the other slide we wrote up to 10%, so we try to be the worst case scenario to say up to 5%. And, at the end, we will have a total increase at the end of
	26%, so that is a good way to improve our process and to focus exactly on the part where we can have a big impact. And just based on the data without doing a lot of experiment in the in the lab,
	that is also cheaper to do these analysis with JMP as doing a lot of experiments in the lab. So we have a lot of of gains at the end.
	So, thank you very much to all of you for listening me today, and also a big thank you to my colleague,
	Ludovic, Helge, Lea, Nichola and Sven with the ??? of this presentation.