Choose Language Hide Translation Bar
Paulopeloia
Level II

Optimization of subsample size for soil pests on sugarcane experiments (2021-US-EPO-894)

Level: Intermediate

 

Paulo Peloia, Data Analytics Lead LATAM, Syngenta Crop Protection

 

Field experiments with products against soil pests on sugarcane in Brazil evaluate the percentage of stalks attacked in four subsamples of 0.5 row-meter per plot. Nevertheless, only 40% of the experiments detect a significant difference between treatments (p-value < .05). The increase in the number of subsamples (X) initially leads to a lower experimental error up to an optimum point (X0), from which the gain in detecting differences between treatments is very small. Thus, two blank experiments with 11 treatments, 4 blocks and 8 subsamples/plot were conducted. To estimate the mean experimental error, 1000 bootstrap samples for each value of X, from 8 to 1, were performed. To determine X0, the quadratic plateau model (QPM) was fitted by least squares Gauss-Newton method for nonlinear models using JMP 16. In the following season, three real (validation) experiments were conducted with X0 subsamples and their capacity to detect significant differences was compared as if they had been conducted with X = 4, using 1000 bootstrap samples for each experiment. The QPM estimated X0 = 6 subsamples (R² = .972 and .971). Experiments with X0 = 6 detected a significant difference between treatments in 54% of the cases and with n = 4, 41%, what represents an increment of 13 percentage points.

 

 

Auto-generated transcript...

 


Speaker

Transcript

Paulo Peloia Hello, my name is Paulo Peloia. I am the data analytics lead for Latin America at Syngenta Crop Protection. The research
  that I'm going to present to is about the optimization of subsample size for soil pests on sugar cane experiments.
  Here in Brazil, we conduct few experiments to test products against soil pests on sugar cane with four replicates on four subsamples per plot.
  And in each subsample, we have to dig the soil to remove the roots of...the of the sugar cane and then evaluate the damage caused by these soil pests. Okay, we do that in 0.5 row meters.
  With this design, unfortunately, this is our problem, we can detect a significant difference between treatment in 40% of the cases only considering an alpha of 5%. Okay.
  Before increasing the number of replicates in our trials, what would dramatically increase the cost of our experiments, we decided to first evaluate the, the number of subsamples that we collect.
  It is known that the higher the number of subsample, the lower the experimental error. It is also known that this...this decrease in the experimental error reduces with the number of subsamples. So we have a nonlinear trend and
  at a certain point, this reduction in the experimental errors becomes close to zero. Okay, for this reason, we decided to use the quadratic plateau model, the methodology that we are going to talk about in some seconds.
  So, I said that the the objective of this research was to determine the optimum number of subsamples per plot
  for soil pests on sugar cane, what contributes in the least instance to a more efficient experimental process. Okay, now let's talk about the methodology that we that we used.
  Well, we started this study with...with the conduction of two blank experiments with 11 treatments, four blocks and eight subsamples per plot. Again eight, not four
  subsamples per plot. Okay, with this...with the result of these two blank experiments, we we apply it, this...the bootstrap methodology
  to estimate the mean standard error or the experimental error for each subsample size from eight to one. After that we fit the quadratic plateau model. Now let's take a look at this methodology, the quadratic plateau model.
  Okay, so here on the left, we have the the equations related to this model, but I will focus on the right here, on the on the chart, okay.
  Basically, this model starts with a quadratic model, okay, up to a breaking...up to a breaking point where it becomes a plateau or a horizontal line, okay.
  In the case of this study, in the X axis, we have the subsample....the number of subsamples per plot, and in the y axis, we have the root mean
  square error, or the standard deviation of our trial, and then this break-even point, okay. This breaking point is called optimum point. In our case, this is the
  optimum number of subsamples per plot, okay. Just..just an additional comment, if you want to implement that in JMP,
  this formula here on the top becomes this formula on the bottom part of the slide in JMP, okay.
  And then you can go to menu analyze, then specialized modeling, and finally nonlinear platform, okay. So if you want to do that in JMP,this is the...this is the way that you have to follow. Let's go back to the first slide.
  So here we are still in the...in the methodology, knowing the optimum number of subsamples
  per plot that we...that we identified using the quadratic plateau model, we conducted in the following season
  three real trials using this number of subsamples, and then we applied the bootstrap method again to compare the capacity that these trials have to detect significant differences between treatments compared if they had been conducted with four subsamples, our previous standard, okay.
  So let's take a look at the...at the results. First here the results related to blank experiments to identify the optimal number of
  subsamples per plot.
  So here on the left, we have one...one chart for each blank experiment. In the X axis, we have the number of subsamples per plot from eight to one. On the y axis, we have a standard deviation of the trial, and here we can see that we have 1,000...1,000 points for each number of subsample.
  Okay, and here in the Blank Experiment 1, we can see a very symmetric distribution of the points, and the red dot here represents the mean,
  the standard deviation. In the second blank experiment, we can see a nonsymmetric distribution. Actually it is a right skewed distribution, okay.
  And again for this reason, I think it is important to use more than one blank experiment to estimate that, because not always, we have a normal distribution, okay.
  So basically the red dots here are transferred to the...to the charts on the right side of the of the screen here. Then we feed the quadratic plateau model. As I said, we start with quadratic model and then
  in the...in the breaking point, it becomes an horizontal line, okay. In both cases, I mean, in the case of both experiments the R squared is really good, 97%, okay. We can see that the model and the
  data are good, I mean they are...they are close and we can see here the optimal number of subsamples. In the case of the first blank experiment, 5.6, and in the case of the second one, 5.4, okay.
  Well, since it is not possible to have 5.5 subsamples per plot, we adopted the optimal number so six subsamples per plot, and this is the number that we used to conduct the real trials in the following season, okay.
  So
  back to the first slide. Again, we covered the first part of the methodology and then we are going to validate the results, I mean to evaluate the benefit of increasing the number of subsamples per plot, okay. Let's go there.
  So, again on the left part of the screen, we have in the y axis, the p value for treatments, and here, this red line represents our alpha, 5%. So experiments with...
  with a p value for treatments below this red line can identify a significant difference between treatments and values that both cannot, okay. On the top part we have the three...the three different validation experiments and within each...
  each experiment on the left, we have the experiment with four subsamples, and on the right, we have with six subsamples, okay.
  We can clearly see that the amount of data points below the 5%, or this upper specification limit, is higher than with four subsamples, okay. We can clearly see that. And if we summarize these results, we have
  almost 41% of the trials detecting a significant difference between treatments with four
  subsamples. And with six subsamples, our optimum number, it becomes almost 54, okay. This is not perfect, but there is a good increment here, okay, an increment of 13 percentage points. So let's go back to the first slide.
  So, to conclude about this study, as I said,
  the increment from four to six subsamples per plot increased our capacity to detect significant differences between treatments in 13 percentual points.
  We can say that the quadratic plateau model worked well because they R square was good, and it was quite easy to identify the optimum point, okay. It is good.
  And for future research, we have to investigate if this improvement is enough or if we need to increase the number of replicates as well. And to do that, it is always good to add some...
  some value, some costs to this model to identify the optimum experimental design, okay. Well, this is what I had to...what I had to show you. I hope you like it. If you have any questions, please feel free to contact me. Thank you and bye bye.