Apr 12, 2010 11:34 AM
| Last Modified: Aug 20, 2017 1:50 PM
This week we celebrate the 35th anniversary of SAS user group meetings with SGF 2010 (formerly known as SUGI). SAS has exhibited extraordinary growth and success since that first meeting of five users in 1975. Over this time span, we have also seen major advancements in the field of modern statistics, especially in the use of computer-intensive methods. One of the leading methods, which coincidently originated around the same time as SAS, is the bootstrap (Efron, 1979). The basic bootstrap method, as this Bootstrapping Page from Wikipedia explains, works by resampling the data with replacement to form an empirical sampling distribution of a statistic of interest. Like SAS, the bootstrap has become a remarkable tool for statistical inference; Efron’s works alone on this method have been cited over 30,000 times.
Fast-forwarding a decade, John Sall (one of the co-founders of SAS) and a small group of developers began working on JMP in 1989. Like many SAS products, JMP continues to evolve and grow in popularity worldwide. Just a few years later, Ross Ihaka & Robert Gentleman began the now widely used open-source statistical language R, with syntax based on the previous S and S+ languages.
SAS Global Forum is an opportunity to learn about how our users employ SAS products, as well as for users to learn about the latest and greatest advancements in upcoming SAS software. One such announcement for JMP 9 (releasing in the fall of 2010) that I am proud to be a part of is the new integration capabilities of JMP with R. I’d like to show the basic elements of the integration in the context of a bootstrap confidence interval simulation example.
JMP is a wonderful complement to R. The integration with R is surfaced with several new JSL commands that allow you to connect to an install of R on your desktop, send data to and from R, and submit R routines available through the R packages. JMP dialogs can easily be built for parameter input to R as a front-end, and more significantly, JMP’s interactive and dynamic statistical platforms and graphics make for a perfect back-end to R functions. The dialog below is an application that connects to R to create data for a given distribution and perform simulations to test the coverage of bootstrap confidence intervals for a few common statistics.
You may ask yourself why such a simulation would be interesting or relevant. The bootstrap method (and bootstrap confidence interval) is widely used today to estimate properties of a statistic especially when the distribution of the statistic is not well known. But if that distribution happens to be biased or skewed (such as the distribution of the variance), the bootstrap confidence intervals for the statistic can be inaccurate. This application quickly and easily allows you to evaluate the coverage of bootstrap confidence intervals for the most common distributions and statistics.
The output below shows the results of a simulation (1000 runs) for the 95% confidence intervals of the bootstrapped mean (1000 replications) of a Standard Normal distribution (mean of 0 with variance of 1). The R package “boot” is loaded, and the boot.ci() function computes bootstrap intervals by several different methods. Using the Distribution and Graph Builder platforms in JMP, we see that 95% bootstrap confidence intervals using the Basic, Normal, Percentile, and the BCa method accurately cover the true mean (the Coverage column values equal 1 when the confidence interval contains 0, so the Prob entry for Level 1 corresponds to the empirical coverage). The Graph Builder output shows the confidence interval widths are roughly the same both within and between methods.
Here’s the R integration JSL code used to run the bootstrap for the above scenario:
Using the R Connect() JSL command and assigning it to the object “rconn”, the code sends messages to the JSL scriptable object “rconn” to submit R code via the Submit() command and to retrieve R matrices containing the bootstrap confidence intervals back via the Get() commands.
Now let’s see what happens when we test the coverage of bootstrap confidence intervals for the variance of a standard normal distribution. In the dialog, if you change the Target Value to 1 and choose the Variance as the bootstrap statistic to compute and click Run Simulation, you get the results below. In this case, all four methods under-cover the true value of the variance, including the bias-corrected and accelerated (BCa) method proposed specifically to combat bias and skewness in the bootstrapped statistic’s distribution (Efron 1987).
Note how easy it is to see the confidence intervals that fail to cover the target value in the Basic bootstrap confidence intervals (by selecting the “0” histogram bar under the Basic Coverage distribution) with new features in JMP 9 that grey out points that were not selected. We can also see how intervals that fail in the Basic method perform in the other methods. Other standard JMP tools such as the Data Filter can help to explore these results in ways that cannot easily and quickly be done in R. Likewise, the comparison of the coverage among methods is easy to see with a custom JSL graphics script to create a Venn diagram that shows the counts of when the intervals contained the true parameter (available on the JMP File Exchange and in JMP Genomics).
This application shows just a taste of what you can do with JMP and R together. With a little JSL and the statistical and graphics platforms of JMP coupled with the breadth and variety of packages and functions in R, one can build complete easy-to-use applications for statistical analysis.
JMP can also integrate with SAS, which adds the ability to work with large-scale data through the file-based system as well as the depth and advanced capabilities of SAS procedures. With these seamless integrations, JMP can become a hub that enables you to connect with both SAS and R, as well as provide unique statistical features such as the JMP Profiler and interactive graphic features such as Graph Builder. The possibilities of what you can do with this are endless!
SAS, JMP, R and the field of statistical computing have come a long way over the last 35 years. I am excited to be a part of this company and this industry as we continue to provide software tools that allow you to accomplish your analyses with flexibility, style, and ease!
When JMP 9 is officially released, I will put the script for the bootstrap simulation and others on the JMP File Exchange.
Now let me leave you with a question: If the bootstrap confidence intervals for the variance of a standard normal distribution are under-covered, how do you think the bootstrap confidence intervals of the mean of a Chi-square distribution will behave?