Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
Choose Language Hide Translation Bar
Staff
Create cumulative sums with by groups to visualize data over time in JMP

Cumulative sums can be useful when working with a data set of measures that grow over a standard period of time and that repeat regularly. An example of this is measurements taken monthly that repeat yearly, such as home electricity usage.

While raw measurements can be informative, cumulative sums can help us to see whether measurements are on track, trending up, or trending down over the course of the year. Creating cumulative sum columns is easy to do in JMP using the New Formula Column functionality (available through right-click) in the data table.

This is the graph we will be creating:

Consider the following example. We have data on electricity use in a home, which is reported on a monthly basis. In order to create our proper grouping (year), we can take the Date Mailed column, right click -> New Formula Column -> Date Time -> Year.

This creates a new column automatically with just the year. I do want to change from a continuous year to an ordinal year in order to group by year properly.  That can be done from the columns list of the data table by right-clicking on the blue triangle of Year[Date Mailed] and changing it from Continuous to Ordinal.

To build our cumulative sum column, we will first right-click on the Year[Date Mailed] column -> New Formula Column -> Group By.

Now that we’ve identified the grouping column, we can right-click on the kWh Usage column -> right click -> New Formula Column -> Row -> Cumulative Sum. We now have our brand new column with the measure we need for visualization.

Since we want to visualize our data on a relative time scale (month) vs. an absolute time scale, we can again go to the Date Mailed column -> New Formula Column -> Date Time -> Month Abbr.

At this point, we are almost ready to build our graph. But first we will add a column property to our Year column in order to keep consistent colors in our graph when using a data filter. Right-click on that column -> Column Property -> Value Coloring. Then click apply.

Now we can go to Graph Builder and build our cumulative sum graph. Drag month to the X axis and cumulative sum to the Y axis. Drag Year to the Overlay and then drag the line element to the graph to maintain the data points as well. Changing the points’ summary statistics from None to Mean will line the points up with the lines.

In order to add a filter to this graph, click on the red triangle of Graph Builder and then click Local Data Filter. Add Year as a filter.

Click Done on Graph Builder.

Using the Local Data Filter, we can focus on just the most recent three years. We’ll now want to make this graph presentation-ready. We can edit the title of the graph, the Y axis labels and X axis labels by double-clicking.

Now we have a graph that we can share with others. It gives a sense of how our power utilization is tracking this year vs. historical years and enables us to make any interventions necessary to keep our power utilization on target.

Here is how to build the graph, in real time.

P.S. Cumulative Density Functions and CDF plots are a corresponding way that statisticians “smooth” out bumps in histograms to compare a distribution to a reference distribution or compare two distributions. I'll explore this a bit more in a future post.

Article Labels
Article Tags
Level I

Very cool stuff, thanks for sharing.

Would you mind explaining what's going on behind the scenes in JMP when you "right-click on the kWh Usage column -> right click -> New Formula Column -> Row -> Cumulative Sum."  I don't fully understand the values that JMP is calculating.

Thanks,

Joe

Level I

Hi,

I want to be able to build a graph using the cumulative sum but instead of sum of result I would like the column formula to add the number of instances of that result.  See example table below.  The data set I have is huge over 3 million rows with different over 100 tests.

Test     Time Test               Result          ModuleSN       Instance of test result by time
A        2019/4/18 11:00 am    5                       A1                  1
B        2019/4/18 11:15 am    1                       B1                  1
A        2019/4/18 11:17 am    6                       A1                  2
B        2019/4/18 11:30 am   1.5                     B1                  2
A        2019/4/18 11:18 am   5.5                     A1                  3
B        2019/4/18 11:31 am   1.8                      B1                 3

Thanks a lot for the help.

Vaiju

Level IV

Vaiju,

i think you would just want

``col cumulative sum(1, :Test)``

which would give you a rolling count by the Test column, or you could swap out any column you want.

Level I

Thanks for the quick response.

Is it possible to group by additional columns besides "Test" if needed?

Level IV

yes of course.  You would just add them as subsequent groups.

``col cumulative sum(1, :Test, :OtherColumn, :AnotherColumn)``