A 5-minute introduction to functional data analysis
Sep 13, 2019 12:40 PM
Functional data can be a mind-bending topic to think about or describe to others. Since SAS introduced Functional Data Explorer in JMP 14, I've had many opportunities to discuss its abilities. However, the main issue in working with Functional Data Analysis (or FDE) in general is just wrapping your head around the idea of "functional" data. I thought I'd share a little thought exercise that I've used effectively to help people understand what functional data is and what functional data analysis provides us. According to my little tracker plugin, this should take you about 4:30 to read.
Happy Birthday to You!
You can sing the "Happy Birthday" song, right? That's because you have a "functional" form of the song in your memory.Imagine for a moment that you are at a birthday party. You can expect to sing the traditional "Happy Birthday" song, and this doesn't bother you. You know the song and given any starting note, you know (vocal range notwithstanding) how the song should sound. Have you ever taken a second to think about why that is? It's reasonable to assume that you don't know an infinite number of variations on "Happy Birthday." How is it that you can be so confident in your knowledge of the song?
Simply put, your brain has stored the "shape" of the song in your memory. Given any starting note, your mind can then reconstruct the appropriate pitches to produce the "Happy Birthday" tune. In the context of this discussion, you have a "functional" form of the song in your memory. If you were to make a graph with frequency on the y-axis and time since you started the song on the x-axis, it might look something like this:
Functional data analysis (FDA) does the same thing your mind does with songs to other types of time-based data. (Note that "time" here has a relatively loose interpretation.) FDA starts with a data set consisting of multiple curves (the "Happy Birthday" melody) and uses small bits of simple mathematical formulas (lines, curves, etc.) to construct a general form of the data (the shape of the "Happy Birthday" melody) that can be used to reproduce any of the curves in the original data set with adjustments to a small number of constants in the equation. The result of these mathematical acrobatics is that we can reduce large, structurally complex, data sets into a small handful of constants and a single (nightmarish) equation. The good news is that we don't usually have to work with that equation directly because the exciting information is in the constants. Cool, right?
And for your information, yes -- I know I'm geeking out about math, but I wasn't always this way -- I just hung around with the wrong sort of people in college. Pick your friends wisely people!!
A Functional Happy Birthday
As I've hinted so far, you should be able to analyze the "Happy Birthday" song using functional data analysis. Spoiler alert -- you can. I'm going to use the Functional Data Explorer in JMP Pro to demonstrate how we do this and where all the bits and pieces of FDA are in the interface.
Let's start with the data itself (available on JMP Public with the figures). The interactive graph below shows the melody in all the 12 keys of western music. You can hover over each dot to see the word (or part of the word) that goes with the displayed frequency. Time is on the x-axis. The data table is set up with three data columns: Time (the x-axis), Frequency (the y-axis), and Key (the ID for each function). That's what you need to start doing FDA -- a family of curves with some time-like x-axis and a response.
Now, since this is more about the bits and pieces of FDA (and not an FDE tutorial) I'm going to skip ahead to the bit where I've got my functional form of "Happy Birthday" set up. If you want the click-by-click instructions on Functional Data Explorer in JMP Pro, the documentation has an example. So, below is the JMP Live report of the FDE output for "Happy Birthday."
Remember from our little thought exercise -- there are a couple of things we need to locate to understand what we're looking at: the functional form (shape of the melody), and the coefficients that modify the functional form to produce the tune for a given key. Locate the little graph under the Functional PCA heading (you might need to scroll down in the window). That's the "functional form" (called the basis function in FDA jargon) of the "Happy Birthday" song. All the coefficients, etc., you need to recreate that form are in the report (or can be exported to a data table). The line you see is the visual representation of all that modeling work. Now, open the Functional Summaries outline box -- don't get nervous! -- a lot is going on here, but we're just interested in the FPC and Mean columns. JMP uses the Mean value and the FPCs to reconstruct the song for a given starting note (you can end up with multiple FPCs for more complex data). All the other information is useful too, but at its core, those are the pieces of information you need to reconstruct each version of the "Happy Birthday" song. And your brain does all that stuff on the fly … amazing.
So, that's it! Functional data analysis is just another example of people trying to get computers to do what your brain does on its own. Cool, right? That said, the possibilities of the things you could do with functional data analysis are pretty exciting. You can use it to transform time series data into a format for other modeling methods. You can use it as part of a design of experiments (DOE) exercise… lots of possibilities there.
You know, that sounds like an interesting series of articles to write... hmmm…