Automating Bioprocess Data Analytics in JSL for Cultivated Meat (2025-US-30MP-24...

Cultivated meat offers a sustainable, slaughter-free alternative to traditional animal agriculture by growing animal cells in bioreactors for human consumption. This emerging field draws from traditional biopharmaceutical manufacturing but requires lower input costs to achieve commodity-scale pricing. At UPSIDE Foods, we developed high-throughput experimental workflows to test and optimize formulations rapidly. However, large data sets created a new bottleneck, where manual data processing slowed iteration cycles and delayed process decisions.

To overcome this, we set up real-time access to key instrument data (e.g., ViCell XR/Blu, Nova Flex 2). We then developed tools using the JMP Scripting Language (JSL) to pull data into JMP and automate our data analysis workflows. This presentation demonstrates the script's functionality, such as built-in data visualizations (Graph Builder), automated trend analysis (Bivariate Fit), and operator-facing tools to guide process interventions (Tabulate). Our approach also unlocked automation for advanced calculations requiring numerical integration and conditional relationships. As a case study, I use cell-specific glucose consumption qGluc (pg/(cell*day)) to highlight a few critical aspects of our methodology and demonstrate scripting strategies for beginners:

Developing standardized sample naming conventions
Establishing real-time access to instrument data with Open Database Connectivity (ODBC)
Automatically organizing and cleaning data using operations like JOIN and SORT
Creating formula columns using key functions (e.g., IF, LAG, IS MISSING)

Integrating live data streams with automated analytics in JMP significantly accelerated bioprocess development and reduced decision latency at UPSIDE Foods.

My name is Ben Richardson. I'm excited to present today on some of the work that we've been doing at UPSIDE Foods to automate bioprocess data analytics in JSL for cultivated meat. I want to first shout out some major contributions from Megan and Julia, and especially UPSIDE Foods, the company that I'm representing here. That's some of our product there shown to the right.

First, I'm going to cover a little bit of a table of contents to talk about what we're going to cover. First, is just a little bit of background introduction in case some people aren't familiar with what cultivated meat is, as well as what Cell Culture Media is. That's specifically the department I work in, and it's going to be pretty important for why JMP has been such a powerful tool for us.

I'm also going to talk about just how our JSL script has accelerated bioprocess development here at UPSIDE Foods. Then I'm going to go into a script demonstration. Showing off some of the functionality of this tool that we've developed, like real-time access to key instrument data, some data visualization tools, as well as trend analysis, and even some cool bells and whistles like operator-facing tools using functions like the Tabulate.

The final section is where I'm going to go into a beginner's guide for automating some of these advanced calculations in JMP. In order to do this, I'm going to focus on a case study. In bioprocessing, self-specific glucose consumption is a pretty important parameter. It's also a notoriously difficult parameter to calculate.

What I'll do is talk about four different things that we've done in order to break down this advanced calculation. The first one is going to be standardizing the sample naming. That's really critical and really foundational piece of our approach here. Also, how we actually access the data using open database connectivity, how we automatically clean, sort, and organize the data, and then lastly, how we actually create those formula columns using some key functions.

Let's dive right into some background. What is cultivated meat? What is UPSIDE Foods? UPSIDE Foods was the first company founded in the cultivated meat space back in 2015. It's really been a leader in the space since, gaining regulatory approval from the USDA and FDA, and that was in 2022.

What we do is we take animal cells. This is not plant-based meat. This is animal meat, and we isolate cells. This is slaughter-free. You don't have to kill animals to do this. Then we develop cell lines from those isolations in Cell Culture Media. Again, that's where I've been living.

Cells and Cell Culture Media are combined in a bioreactor. Very similar to how you might brew beer, you use yeast to grow those things. We're going to grow those animal cells in a bioreactor and then formulate them into a product for human consumption. What's really exciting about this is bio processing could make meat production more efficient, lowering costs for consumers, limiting negative environmental impacts. That's something I'm really excited about. Lower land and water use and even greenhouse gasses, but also reducing animal cruelty.

Why is Cell Culture Media so important? The realm that I'm going to be talking about here, the cell food, something that we want to talk about. You can see some solutions in the corner. Those are some fun colors, mostly from vitamins, actually, that create those colors. Cell Culture Media is really important because it provides the cells, the nutrients and the surrounding environment necessary to grow outside of the body.

Cell Culture Media is really nothing special. It's just little pieces of the common food ingredients that we would feed to chickens or cows or even eat ourselves, amino acids, sugars, vitamins, minerals. In order to accelerate low-cost media development, especially for commodity scale pricing, UPSIDE Foods has vertically integrated our Cell Culture Media production. This means we've developed proprietary in-house media formulations from the ground up with food-grade components.

If you're familiar with Cell Culture Media or bioprocessing, designing media formulations from the ground up is no easy task. That's where JMP statistical software comes into this. Just as an example to illustrate how difficult this is, if we take a factorial for an experimental design with 50 media components, that's actually a low ball estimate. A lot of medias can be upwards of 100. If we just take a low, medium, and high concentration for each of these. The design space is 3 to the 50th power, which is 7.17 times 10 of the 23rd. That's more than a mole, 6.02 times 10 of the 23rd, which is practically infinite. We can't test all of these conditions.

This is where design of experiments comes in. DOE is an applied statistics methodology that allows you to more efficiently probe design spaces, especially relative to one factor at a time or of OFAT approaches. Here's some illustration courtesy of JMP. If we look at three different variables and visualize them in three-dimensional space here where we've got variable one, two, and three.

A traditional one factor at a time approach, we'd go and optimize variable one, find the best condition, move on to variable two, find the best condition, and variable three. We're going to find a local maximum eventually. What if the actual best condition is somewhere over here and we miss it? We're, one, taking a lot more time and effort to maybe not find the best condition.

The reason why JMP is so powerful here and why it's so important for our approach is because JMP has industry-leading DOE software. It's really a gold standard for Cell Culture Media development and bioprocess development. JMP has been widely adopted across the different bioprocess development pipeline groups at UPSIDE Foods. It's a natural tool for to look at and to use for some of this scripting.

Let's talk about how this script actually was able to accelerate the bioprocess development here at UPSIDE Foods. I've got a little schematic here, isolating parts of the bioprocess development pipeline, all the way from upstream activities like cell line development down to manufacturing science and technology and actually production. I've been talking a lot about media development. We live in here, but also process development has a lot of overlap.

The thing that I want to point out is that dynamic feedback loops across this development pipeline are really an important piece of driving scientific innovation and breakthroughs. Basically, you might get a better cell line that gets past to media development. That allows us to then get a breakthrough on some limiting nutrient and iterative cycles between this, is really where that exciting scientific breakthrough happens.

I was just talking about how DOE allowed us to more efficiently probe our design spaces. You might think that our experiments are getting smaller. Actually, because we're trying to design medias from the ground up, our experiments are still massive. Even though we're more efficiently testing the space, we still just have this absurd design space that we need to investigate.

We have these huge numbers of samples, sometimes experiments, over 100 conditions in order to really build these medias from scratch. You can see a picture of my colleague Dasha there printing out labels. It's just an illustration of how many samples we're working with here and how this actually created a bottleneck between teams.

To further illustrate this, I have some cell counts per year that the company has done, which are pretty staggering. Over 40,000 cell counts per year for the media team here at UPSIDE Foods. Necessarily, this is going to change over the course of the pipeline. As you get larger and larger scales, you have fewer and fewer vessels and degrees of freedom. You can see PD, the step below us, actually still runs something like 7,000 or 8,000 samples per year.

This is just a huge amount of data that we're trying to work up, and we're moving at lightning speed pace, trying to get this data. What we did was developed this JSL script to reduce the average times between our routine bioprocess data workup, to disseminate that information more quickly, to get that information in the hands of other teams and accelerate those feedback loops. What we were really able to do is reduce that routine data workup time from about one week to about two days, which is about a 3X increase, which is pretty exciting.

Now I'm going to get into a bit of a script demonstration. Show off some of the features of this tool. Here's the script. It's pretty large. Definitely has grown over time. We've got a lot of code in here. During this short talk, I'm not going to have time to get into all of this. What I will do is I will take and demonstrate some of the functionality. If I run this script, the first thing that's going to happen is we're going to get a little warning window that this is going to close some other windows that you might have open. I'm okay with that.

I'm going to use an example here, maybe 5109, so one of our experiments. This is real data. What this is doing is the script is running our open database connection, and it's pulling data into JMP for me. This is speeding up the process, and it's something that we can do on a routine basis. We can do this as needed. While the experiment is going, data is getting routinely uploaded to our database, and we can pull and look at the graphs in real-time. You can see different instruments are populating. It's making some data tables for us. Then it's going to ask me if I want to add a condition key.

The condition key is something that I'm going to talk about in just a minute here and open up. What we're going to do is say yes so that I can show some of that functionality. It's also going to ask if I want to add additional data that's not tied to an instrument. This is additional metadata that you might want to add The example here is pack_cell_volume. This is really anything that isn't being run by an instrument and uploaded automatically to the database. We're going to say yes, again, so that I can show off some functionality. Then what the script is going to do here is build us a data table. We've got a nice data table all built out.

Before I get into it, what I'm going to do is open a normalization script. This is just a little script that I wrote in order normalize the data. It's going to scale all of the numbers from zero to one so that I can talk about it intellectual property free. What I'm going to do is actually take this script here, and I'm going to run it alongside my data table here, and we can see what happens.

The process time there has just scaled. Now, our maximum value is one, and everything else is relative to that. I don't have to show off all the important data that might be hidden in here. I can just show you some of the functionality.

Just circling back to the condition code. I'm going to open this up to explain a little bit of how this is working. Basically, we have numeric conditions, 202, 203, 204. Then basically, all the variables that we're actually changing are linked to those through this condition key. This is where we're going to put all that information because we need a standardized sample naming system. I'll get into that a little bit later. This is just illustrating how we're actually tying information to this.

We have a nice big data table. You can see we've got 1,206 sample measurements here. We've got all parameters, all calculated things, maximum lactate. We've got specific lactate generation. We've got doubling times, some other fun things. We've got all of our data in one nice place. The next thing that I want to highlight here, is some of these scripts that we have saved to the top left.

For example, we have this Graph Builder. This Graph Builder is cool because it automatically has a column switcher, it gets all our parameters and our conditions graphed. We can really quickly look at, oh, 202 is declining over time. That's not doing well. 209. Because we added that condition key, we can basically say, okay, I don't want to see 201, 202. What I'd like to see is the condition.

Let's say these are some different condition. The other thing that's really nice about JMP and Graph Builder is I can quickly say, well, I want to see what variable 1 is doing. We can take and put variable 1 here and see, oh, okay, all the controls aren't doing well. We've got a couple of different strategies here that might be working. We can similarly just drag variable 2 over here and interact with the data in a really nice user-friendly way. Get an idea of what's going on.

We can look at lactate. We could look at something like maybe doubling time down here. What we can do is just really quickly interact with our data, get a feel for it, and get it ready to present to other people within the company in order to speed that up. Let's say, I also want to get an idea of variable 2. What I can do is just put that on the overlay and get an idea. The doubling times are increasing. This is making sense. Strategy 1, that's keeping those low. That's something that we want. That's the Graph Builder portion, which is pretty exciting.

We also have a bunch of other functionalities. I'm going to try to show off a couple of them really quickly before we get into the next section. One of them is the cycle analysis. Basically, I'm going to click this, and it's going to read out all of the different batches that we have. This passage number, and it's going to build out a checkbox.

What we're going to do is select all of these ones because n minus 1, n minus 2, those are scale up. I don't really want them included in this analysis. What I'm going to do is just select the ones I want, hit okay. It's going to pop up with some of these bivariate fits, and we're going to get these trends. You can see it's actually going to quantify what's going on here and get that data for us and look at these trend analysis and add them directly to our table, all automatically via scripting. We don't have to do this manually. It speeds up our ability to get data out.

You can see here's the trend analysis. Now we have a graph showing, okay, what's the slope? What are these looking like over time? Again, we can just take our variable and we can look at it. We're actually going to put that on color here. You can see, again, really easy ways to interact with your data and to analyze it.

The last feature that I'm going to show off is some operator-facing tools. For example, if we wanted to take and do, say, a feed or an ad back, and an operator just really quickly wants to know, hey, what am I supposed to do today to take care of these reactors? What we can do is say, okay, let's take a day.

For example, maybe we want to do this on, say, 24/08/'19. That's a date. What I'm going to do is actually take and run this script here. We're going to say 24/08 what was that? Maybe '19. We're going to say maybe we're working a small reactor. We might have a working volume of something like 225 milliliters, and maybe let's say let's feed 2%. We just hit okay, and it's going to build this table for us that's going to tell us how much each of these reactors is going to get fed and whether they need certain ad backs.

In this case, basically everything's going to need both glutamine and glucose, but you could imagine a situation where if they weren't all on this day below the threshold for an ad back, you might get a couple of nos here. You can get a really quick read on what needs to be done to intervene with these reactors. That's the script in a nutshell. I'm going to close some of these out and go back to the presentation.

Let's get into a little bit about how to do this, especially maybe if you're a beginner, if you're just starting out with JMP. How do we do this? Let's start with the case study, this notoriously difficult calculation, self-specific consumption. The four steps that I'm going to cover is developing a standardized sample naming system, establishing a connection to your data, automatically organizing, cleaning, sorting data, and then lastly, we're going to create a formula column that's going to calculate this parameter for us.

Let's get right into it. First, in order to frame the argument or the task at hand here, we're going to have to do a little bit of school. I'm going to have to do a little bit of a lecture here where we talk about what is cell-specific consumption, and why is it so difficult to calculate? You can think of cell-specific consumption like cellular horsepower. Every cell needs to consume an amount of nutrients, and we can think of that like energy every day, so per unit time.

Therefore, our consumption Q is going to be a Delta C divided by a Delta iVCD. You can see this is a viable cell density. This is basically cell counts. It's a very important thing that we do in bioprocessing. Similarly, we can measure the amount of a nutrient in the solution. These are typical curves that might be a good example of a fed batch bioprocess. What we're going to do is look at two time points, the one we're interested in and the previous time point in order to define this parameter.

There's a time in between them. You can see that with the Delta T here. Why is the area under the curve important? I said iVCD instead of VCD. I was talking about this graph of viable cell density, but what we're really interested in is the integral of that. The reason why that's important is because a cell that's alive at the beginning of this cultivation cycle, maybe it's four days long, is going to eat something like four times as much food as a daughter cell that divided right here and only existed for one day before it was harvested.

What we can do is approximate that. We can do a little bit of empirical math and the trapezoid rule in order to calculate that. This is the piece in the denominator. We also need a change in concentration, but that's also a little bit more complicated than it seems on face value.

Since we want the change in concentration due to cellular consumption, we need to subtract the amount that got fed. We're not normally measuring this because we know what we're adding. We don't usually have a measure data point showing how this is going up and then coming back down, but we need to factor it in if we want our consumption rates to be accurate.

There's an amount that they got fed, and then when we actually remeasured it the next day, they consumed not just the Delta here, but the amount of the delta here. What that means is we have to subtract that additional point. What that ends up with is this difficult formula here where we've got a lot of parameters that are conditionally related to each other. That's the crux of the issue, what makes some of these calculations so difficult to automate.

I'm going to talk about how we do it anyway. The first one is developing a standardized sample naming. This is really foundational to the automation strategy that we've employed here. I'm going to use an example sample string and break down some of the pieces. Each of these pieces is an important handle that helps us automate and helps us treat data do those things like cleaning, sorting, and calculating automatically.

We've got, for example, a group ID. We've got that date code, an experiment number, some underscores to separate things, make it a little bit more visually appealing. We've got a batch or cycle, a replicate handle, and a condition or reactor code here. This long character string might appear a little intimidating, but these are typically sequential and numeric.

Especially if you're thinking this would frustrate some operators, usually they just have to type this in once per day, and then you can use autonumerate functions on a lot of these analytical instruments to just click down the list and populate. What this really does it pays dividends once you actually go about trying to automate and auto-calculate some of these parameters.

My advice for adapting this approach would be to identify what your team uses to manipulate and display data, then consolidate and codify the minimum number of parts that you need. You really want to keep this as simple as possible. There are some things that you just need and will be independent of this specific experiment or that or this and those things that you really need to manipulate this data.

You can always add additional functionality later. For example, we've got some optional features here, an instrument error replacement that you can append at the end. You can do intentional resamples, for example, like that feed that I was talking about. If we did want to measure that and check that it's what we think it should be. There's other bells and whistles that you can build in, but you do want a really rigid naming scheme for your default analysis that you get buy in and get everybody to start using.

What's the second step? You need access to this data. The way that we did that is with an open database connection. We've got JMP in the right here that we're going to try to get our data into. We worked with our data and IT teams to set up our instruments so that they export CSV files as soon as they've run. Then these get uploaded to our database. You can see a cell counter, say, and chemistry analysis are both getting uploaded to this database.

The ODBC is the connection. That's this piece here that connects the database to the program JMP. It's an application programming interface, an API, and it uses SQL or Structured Query Language. You saw during the demo, you can also add metadata directly into the application or JMP, or you could also feed this through the database. They both work.

How do you actually do this? There's actually an SQL Query feature in JMP. You can manually go through and there's a nice user interface here where you can select these. Once you get the open database connection set up, you can build your query, and you can combine this with a really cool feature that I want to plug called the Workflow Builder, which will track all the steps you're doing, and it will build JSL code for that.

Now, when you're doing some more advanced scripting, you'll probably need to modify that scaffold. What it does is it shows It gives you an example of the syntax, and it speeds up the coding, especially for beginners. For people who aren't classically trained in computer science. It's a really powerful tool that I want to make sure people are aware of.

This is a snippet that's illustrative of what the code would actually look like. We've got a connection. In this case, we're using that user interface to add an ELN. That's some of the bells and whistles that we're going to need to add after the fact. It gives you a starting place to work with and to test and to start I'll start iterating on.

All right, step 3. We need to clean and sort the data. This is a big step because a lot of times what people are doing with a specific glucose consumption calculation is they're pulling the data into Excel, say. What they're doing is manually moving the data around, organizing, deleting, referencing different cells. Cleaning and joining the data automatically is going to be really critical to make sure we get accurate results.

The first two areas I'm going to talk about are Join & Sort operations. In the user interface, these are under the Tables tab and can similarly be combined with the Workflow Builder. This is an example of a join function where we're taking those chemistry analytes and the cell counts, and we're just joining those tables together. Next, we want to sort them by our parameters so that the data is all organized, and it's in the order that our functions are going to expect.

Lastly, we can do all kinds of cleaning. This is a little bit more intensive, but it can be accomplished fairly easily using logical expressions. The example I'm using here is basically deleting duplicate rows from instrument errors. That's that underscore R parameter I mentioned earlier.

What we're doing here is going through and labeling what is and is not a duplicate. Then we basically take and hide and exclude anything that's not the most recent sample. Again, these scripts build a lot of functionality into cleaning, getting your data really close to ready to go, ready for prime time.

All right, let's get into the last section here. How do we actually build a column with key functions to calculate this parameter? We've got our crazy equation. It's pretty brutal. The first step is going to be using functions like if and is missing to allow the calculation to depend on what parameters are available.

In this example window here, we can say, for some of these, we do have a glucose add back. That's that feed parameter that we were talking about. For some of them, we don't. We need a way to set up an equation that's going to say, okay, if this value is present, calculate it, including it. If it's not present, don't. That's how we do that is with the if and is missing functions.

We have another problem. We now have values that we need for this calculation that appear in different rows. In a columnar-based format, which has its advantages for data management, it's harder to reference something that's outside of the same row. The formulas are more built along the lines of referencing things that are in the same row. What we can do is leverage a really powerful function that I want to call out here, which is the lag function.

When your data is properly sorted and organized, this allows really accurate references to data that's not in the same row. We have our C of T here and our C of T minus 1 that we can refer to using the lag function. This is what that code will look like. We're going to have a new column that's being generated. We're going to make a formula. We're going to have an if statement where we're going to check that basically these are the same thing. We're not comparing two different batches. We're only within the same batch using this unique ID handle. We're going to look for whether or not that ad back is happening, and we're going to perform the calculation. The iVCD, that other chunk is happening in another column, but it works very similarly.

What I'm going to do now is switch back to JMP and show that in action. All right. Now I have another normalized data set. You can see we've got some samples here, we've got our unique IDs, we've got some glucose values, and we have some ad backs. Some of these are happening. In some cases, they're different values, and they're not evenly spaced. We need something that's smart enough to figure out how to calculate this and do it right every time.

What we're going to do here is run the script that I showed in the bottom of that. We're going to run this script to create a new column where we're going to have this specific glucose consumption calculated, and then we're just going to move the selected column so that we can see it and don't have to scroll. I'm going to take and run this little excerpt of the script here.

I have to run the entire thing, not just the section highlighted. Once I run it, what we can see is that this popped up with a new column here, and we're getting calculations. It's making breaks in the calculation the way that we want, and we're getting accurate values that are scaled the way we want.

If we look at our formula, we're built out a formula. It's going through this conditional logic, and we're getting the calculation how we want. We can preview the data here. We can look at the different pieces, how they add up and troubleshoot. This is how we're getting advanced, accurate calculations automated.

All right. That is the end of the talk. I just want to end here quickly with some acknowledgment. Here, UPSIDE Foods across the board, people have been involved with this, specifically Megan and Julia. Here's some of my teammates when we were doing a Top Chef competition, the entire company, our production facility, and some of our product.

Special shout out to our IT and data teams for doing some of the heavy lifting around the open database connection, helping troubleshooting systems with me. Then for my manager, Cameron, who originally introduced me to JMP and showed me just what a powerful tool this can be. Thank you, and I will end my talk.

Presented At Discovery Summit 2025

Presenters

Skill level

Beginner

Beginner
Intermediate
Advanced

Automating Bioprocess Data Analytics in JSL for Cultivated Meat (2025-US-30MP-2413)

Presenters

Skill level

Files

Automation and Scripting

Basic Data Analysis and Modeling