Optimization of subsample size for soil pests on sugarcane experiments (2021-US-...

Paulo Peloia, Data Analytics Lead LATAM, Syngenta Crop Protection

Field experiments with products against soil pests on sugarcane in Brazil evaluate the percentage of stalks attacked in four subsamples of 0.5 row-meter per plot. Nevertheless, only 40% of the experiments detect a significant difference between treatments (p-value < .05). The increase in the number of subsamples (X) initially leads to a lower experimental error up to an optimum point (X ₀ ), from which the gain in detecting differences between treatments is very small. Thus, two blank experiments with 11 treatments, 4 blocks and 8 subsamples/plot were conducted. To estimate the mean experimental error, 1000 bootstrap samples for each value of X, from 8 to 1, were performed. To determine X ₀ , the quadratic plateau model (QPM) was fitted by least squares Gauss-Newton method for nonlinear models using JMP 16. In the following season, three real (validation) experiments were conducted with X ₀ subsamples and their capacity to detect significant differences was compared as if they had been conducted with X = 4, using 1000 bootstrap samples for each experiment. The QPM estimated X ₀ = 6 subsamples (R² = .972 and .971). Experiments with X ₀ = 6 detected a significant difference between treatments in 54% of the cases and with n = 4, 41%, what represents an increment of 13 percentage points.

Auto-generated transcript...

Speaker	Transcript
Paulo Peloia	Hello, my name is Paulo Peloia. I am the data analytics lead for Latin America at Syngenta Crop Protection. The research
	that I'm going to present to is about the optimization of subsample size for soil pests on sugar cane experiments.
	Here in Brazil, we conduct few experiments to test products against soil pests on sugar cane with four replicates on four subsamples per plot.
	And in each subsample, we have to dig the soil to remove the roots of...the of the sugar cane and then evaluate the damage caused by these soil pests. Okay, we do that in 0.5 row meters.
	With this design, unfortunately, this is our problem, we can detect a significant difference between treatment in 40% of the cases only considering an alpha of 5%. Okay.
	Before increasing the number of replicates in our trials, what would dramatically increase the cost of our experiments, we decided to first evaluate the, the number of subsamples that we collect.
	It is known that the higher the number of subsample, the lower the experimental error. It is also known that this...this decrease in the experimental error reduces with the number of subsamples. So we have a nonlinear trend and
	at a certain point, this reduction in the experimental errors becomes close to zero. Okay, for this reason, we decided to use the quadratic plateau model, the methodology that we are going to talk about in some seconds.
	So, I said that the the objective of this research was to determine the optimum number of subsamples per plot
	for soil pests on sugar cane, what contributes in the least instance to a more efficient experimental process. Okay, now let's talk about the methodology that we that we used.
	Well, we started this study with...with the conduction of two blank experiments with 11 treatments, four blocks and eight subsamples per plot. Again eight, not four
	subsamples per plot. Okay, with this...with the result of these two blank experiments, we we apply it, this...the bootstrap methodology
	to estimate the mean standard error or the experimental error for each subsample size from eight to one. After that we fit the quadratic plateau model. Now let's take a look at this methodology, the quadratic plateau model.
	Okay, so here on the left, we have the the equations related to this model, but I will focus on the right here, on the on the chart, okay.
	Basically, this model starts with a quadratic model, okay, up to a breaking...up to a breaking point where it becomes a plateau or a horizontal line, okay.
	In the case of this study, in the X axis, we have the subsample....the number of subsamples per plot, and in the y axis, we have the root mean
	square error, or the standard deviation of our trial, and then this break-even point, okay. This breaking point is called optimum point. In our case, this is the
	optimum number of subsamples per plot, okay. Just..just an additional comment, if you want to implement that in JMP,
	this formula here on the top becomes this formula on the bottom part of the slide in JMP, okay.
	And then you can go to menu analyze, then specialized modeling, and finally nonlinear platform, okay. So if you want to do that in JMP,this is the...this is the way that you have to follow. Let's go back to the first slide.
	So here we are still in the...in the methodology, knowing the optimum number of subsamples
	per plot that we...that we identified using the quadratic plateau model, we conducted in the following season
	three real trials using this number of subsamples, and then we applied the bootstrap method again to compare the capacity that these trials have to detect significant differences between treatments compared if they had been conducted with four subsamples, our previous standard, okay.
	So let's take a look at the...at the results. First here the results related to blank experiments to identify the optimal number of
	subsamples per plot.
	So here on the left, we have one...one chart for each blank experiment. In the X axis, we have the number of subsamples per plot from eight to one. On the y axis, we have a standard deviation of the trial, and here we can see that we have 1,000...1,000 points for each number of subsample.
	Okay, and here in the Blank Experiment 1, we can see a very symmetric distribution of the points, and the red dot here represents the mean,
	the standard deviation. In the second blank experiment, we can see a nonsymmetric distribution. Actually it is a right skewed distribution, okay.
	And again for this reason, I think it is important to use more than one blank experiment to estimate that, because not always, we have a normal distribution, okay.
	So basically the red dots here are transferred to the...to the charts on the right side of the of the screen here. Then we feed the quadratic plateau model. As I said, we start with quadratic model and then
	in the...in the breaking point, it becomes an horizontal line, okay. In both cases, I mean, in the case of both experiments the R squared is really good, 97%, okay. We can see that the model and the
	data are good, I mean they are...they are close and we can see here the optimal number of subsamples. In the case of the first blank experiment, 5.6, and in the case of the second one, 5.4, okay.
	Well, since it is not possible to have 5.5 subsamples per plot, we adopted the optimal number so six subsamples per plot, and this is the number that we used to conduct the real trials in the following season, okay.
	So
	back to the first slide. Again, we covered the first part of the methodology and then we are going to validate the results, I mean to evaluate the benefit of increasing the number of subsamples per plot, okay. Let's go there.
	So, again on the left part of the screen, we have in the y axis, the p value for treatments, and here, this red line represents our alpha, 5%. So experiments with...
	with a p value for treatments below this red line can identify a significant difference between treatments and values that both cannot, okay. On the top part we have the three...the three different validation experiments and within each...
	each experiment on the left, we have the experiment with four subsamples, and on the right, we have with six subsamples, okay.
	We can clearly see that the amount of data points below the 5%, or this upper specification limit, is higher than with four subsamples, okay. We can clearly see that. And if we summarize these results, we have
	almost 41% of the trials detecting a significant difference between treatments with four
	subsamples. And with six subsamples, our optimum number, it becomes almost 54, okay. This is not perfect, but there is a good increment here, okay, an increment of 13 percentage points. So let's go back to the first slide.
	So, to conclude about this study, as I said,
	the increment from four to six subsamples per plot increased our capacity to detect significant differences between treatments in 13 percentual points.
	We can say that the quadratic plateau model worked well because they R square was good, and it was quite easy to identify the optimum point, okay. It is good.
	And for future research, we have to investigate if this improvement is enough or if we need to increase the number of replicates as well. And to do that, it is always good to add some...
	some value, some costs to this model to identify the optimum experimental design, okay. Well, this is what I had to...what I had to show you. I hope you like it. If you have any questions, please feel free to contact me. Thank you and bye bye.

Presented At Discovery Summit Americas 2021

Presenter

Paulo Peloia

Files

2021-US-EPO_894.pptx

Optimization of subsample size for soil pests on sugarcane experiments (2021-US-EPO-894)

Presenter

Files

Advanced Statistical Modeling

Basic Data Analysis and Modeling

Data Exploration and Visualization

Design of Experiments

Predictive Modeling and Machine Learning

Quality and Process Engineering

Reliability Analysis