Subscribe Bookmark RSS Feed

Rolling t-test and F-test down a column

markschahl

Community Trekker

Joined:

Jun 18, 2012

I have one minute level data from a process vessel. The level data contains some noise. Using the level data, I want to determine is the vessel filling, emptying, or holding steady.


I have used simple one-row Dif(x,1)  > y and even two point (Lag(x,-1) - Lag(x,1)) > y to find the changes from one state to another. Even with the latter there are false positives. From one minute to another the vessel can be filled. Some of the fills/empties last ~10 min.


I spoke with a process control colleague about this and he suggested a method that is used to find (un)steady-states:


  • do a t-test to compare the means of row(1-10) with rows(11-20) to see if the means are same/different to determine fill/empty/hold
  • do an F-test of the same data to see the variances are different to determine un/steady state
  • repeat this down every row in the column (i have 785k rows)

He did this using Excel using the T.TEST() and F.TEST() functions.


I looked in the scripting index and could not find an obvious way to accomplish this.

Any suggestions?



2 REPLIES
ron_horne

Super User

Joined:

Jun 23, 2011

hi MarkSChahl

first i have to tell you that i think this violates quite a few assumptions of the t test. otherwise, try the following script.

once it is done you will have a connecting letter report which you can sort by the levels of column 2 in order to get the summary in the order of time.

best,

ron

New Table( "DATA",

       Add Rows( 500 ),

       New Column( "Column 1",

              Numeric,

              Continuous,

              Format( "Best", 12 ),

              Formula( Sin( Row() / 100 ) + Random Normal() )

       ),

       New Column( "Column 2",

              Numeric,

              Ordinal,

              Format( "Best", 12 ),

              Formula( Floor( Row() / 10 ) )

       )

);

output = Oneway(

       Y( :Column 1 ),

       X( :Column 2 ),

Each Pair( 1, Connecting Letters Report( 1 ) );

       Name( "Means/Anova" )(0),

       t Test( 0 ),

       Compare Densities( 0 ),

       Box Plots( 1 ),

       Mean Diamonds( 1 ),

       Points Jittered( 1 ),

      

);

jerry_cooper

Staff

Joined:

Jul 10, 2014

Hi Mark,

Another approach that might accomplish the objective would be to do a Bivariate plot and then select Flexible->Fit Spline->0.01,flexible from the bivariate report's red triangle menu. Use the slider to smooth out the curve to your liking (you may need to zoom in on various parts of the graph depending upon how dense your data are), and then select Save Predicteds from the fit menu. Then go to the formula column that was created and take the derivative with respect to your time variable. The result will be the slope of the curve at any given observation. From there, it's just a matter of deciding an appropriate interval for what you want to consider as 0 slope, i.e. "holding steady". Anything less would be "emptying" and anything more would be "filling".

Just a note about response times: in version 12 each action, opening the formula editor and each subsequent click, took ~20-30 seconds for a data table with ~45K rows. Response times in the version 13 beta were instantaneous.

Hope this helps.

-Jerry