Snapshot of Big Data: Sampling Approaches in JMP ®
Sep 7, 2017 1:15 PM
Mandy Chambers- @Mandy_JMP, JMP Development Tester, firstname.lastname@example.org
Olivia Lippincott- @olippincott, JMP Systems Engineer, email@example.com
In the era of big data, we often don’t have internal access, the capacity or the ability to work with the entire data set of interest. Instead, we need to work with a sample, or snapshot of the data, and apply what we learn to the whole data set. In the case of Medicare provider fraud detection, it is important that the sample include all the repeated procedures used by a provider to capture anomalies for that provider type, which leads to a two-stage sampling approach. We explore the JMP workflow for capturing a sample, finding all relevant repeated procedures used by providers, and restructuring and cleaning the messy data for analysis. We will also illustrate the various ways to sample from larger data sets, including sampling from data external to JMP using the Query Builder for connecting to a database or SAS server, and working with data within JMP using the JMP Query Builder and Virtual Join. With each example, we cover the advantages and disadvantages, and use recommendations.