Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
Choose Language Hide Translation Bar
High-speed breeding cycles

Biotechnology has grown vibrantly over the past years with new products that these days offer endless possibilities to generate genomic data for a wide range of plant species in a short period of time and reduced cost. Orphan plants (those that have not yet been deeply explored by humans) are finally getting the benefits of such technological revolution. I’m positive that much more is still to be uncovered and to be used for the benefit of humanity.

Up until a few years ago, phenotyping was considered a bottleneck in the study of complex traits in plants because it relied on intensive manual labor work, which is costly and time consuming. Phenotyping, like genotyping, has entered a new era of high-tech in which new instruments are capable of producing high throughput measurements, such as: imaging acquirements and processing; sensors that capture temperature, humid, etc.; greenhouses with automated conveyor belts equipped with systems that capture images at every couple of hours or days, and so on. All these new phenotyping capabilities have emerged in which is been termed phenomics.

Scientists have been mastering ways to handle and analyze big data coming from genomics, proteomics, transcriptomics, epigenomics, etc. I am, as a scientist, delighted to finally see phenotyping catching up with the other omics fields and being promoted to their elite group. Not only will phenomics allow more variables to be measured, but also more samples to be measured too.

Breeders and analytical staff need analytical tools that are easy to use and have methods that can analyze these data. Why? To be able to harness the tremendous potential of these omics data, and help breeding programs not only shorten their breeding cycles, but also produce better food for humanity. 

I would like to introduce the Cross Evaluation tool in JMP Genomics software that was designed to help breeders evaluate thousands of plants (lines or varieties) using genomics and phenomics data with the goal to identify those lines that have higher potential to continue in the breeding pipeline.

I’ll start by showing a data table that has information on genetic markers (genomics data) in the columns and individual plants in the rows (partially seen in Figure 1). This data table has 5,131 columns with genomic data (SNP genotypes) and 181 rows, each row representing an inbred line with their IDs in the Lines column.

Figure 1Figure 1

Now, let’s look at the Cross Evaluation tool (Figure 2). The data file from Figure 1 is loaded in the SAS Input Data Set field, and two score files are loaded in the Scoring Code Files field. Score files contain mathematical equations that link the phenomics data (traits) to the information provided by the genomic data.

Figure 2Figure 2

There are many options in the Cross Evaluation tool. For instance, you can define the type of cross you want (self or outcross), number of progenies to be simulated for each cross, and whether to compute an index selection (also known as super trait) or not. You can also make multiple generations of progenies, by letting Cross Evaluation produce pairwise crosses of individuals in the data table, and, once it has produced the first generation of progenies (first crossing cycle), Cross Evaluation will make new crosses out of these progenies, producing the second generation of progenies (second crossing cycle), and so on. I will show some of these options in this post.

In this post, the goal of the analyses will be to identify pairs of plants that when crossed, would produce offspring with some desirable traits. Specifically, I want plants that would have high yield when grown under regular water supply as well as under restricted water supply. 

In my first analysis, I set up Cross Evaluation to make all pairwise crosses among individuals present in the example data table and to produce progenies from each cross (Figure 3).  


Figure 3Figure 3

One of the outputs is shown below in Figure 4. The scatterplot shows the bivariate distribution of the statistics we compute (mean, min, max, range, and standard deviation) for the predicted yield under regular and restricted water supply.

Figure 4Figure 4

The Data Filter dashboard, seen toward the left side of Figure 4, can be used to select those crosses that have higher potential to produce offspring that have the desired trait values, in this example, high yield under regular and restricted water supply.

For this purpose, I moved the threshold bar for the simulated trait means of yield under regular and restricted water supply, with the resulting changes shown in  Figure 5. A quick comparison between Figures 4 and 5 reveals that most dots in the scatter plots have turned gray. The gray dots represent unwanted crosses based on the threshold that I set up in the Data Filter dashboard. The crosses that passed the selection threshold are in black and the pairs of inbred lines that produced those crosses are shown under the Data Filter dashboard. We can easily see that inbred line 41 crosses well with other inbred lines to produce progenies with high yield under regular and restricted water supply.


Figure 5Figure 5

 In my second analysis, I set up Cross Evaluation to make four cycles (Figure 6) of selection for the best crosses identified in my first analysis.

Figure 6Figure 6

One of the outputs is shown in Figure 7. The plots show the trend of simulated traits outcome (y-axis) throughout four breeding generations for each cross (x-axis). We quickly visualize that selection was effective in increasing the values of traits across generations for all crosses. Because the crosses in the x-axis are ordered from lower (left) to higher (right) trait values, we can easily see that cross 41x148 has higher breeding potential specific for regular water supply (upper plot), cross 41x2 has higher breeding potential specific for restricted water supply (lower plot), and cross 41x138 has higher breeding potential in both conditions.

Figure 7Figure 7

In this post, you saw several options in the Cross Evaluation tool from JMP Genomics that can help breeders in high-paced breeding programs. There are, however, many other options in this tool that breeders can explore to make the best breeding strategies according to their specific needs.

I hope you this information is helpful, and I look forward to your comments!