Subscribe Bookmark RSS Feed

(JSL) Iterative Regression by Date, Create New Variable using Regression Results in Formula

terapin

Community Trekker

Joined:

Jun 23, 2011

I have some rather large growth datasets in which I want to detrend the growth data by:

  1. Determine max growth for each day and store datetime for each day's max growth
  2. Run a linear regression ( bivariate ( Y(growth), X(datetime), fit line  ) for each consecutive pair of datetimes at which max growth was determined.  E.g., regress max growth 1 & max growth 2 by datetime 1 & datetime 2, then max growth 2 & max growth 3 by datetime 2 & datetime 3, etc..
  3. Create a new variable (adj_growth) that is created from a formula that uses the linear regression results to calculate growth for datetime located between the consecutive pair of datetimes at which max growth was determined. E.g., If datetime >= datetime 1 & datetime <= datetime 2, then, adj_growth = m + b*growth, else  datetime >= datetime 2 & datetime <= datetime 3, then, adj_growth = m + b*growth, else ...............

I have been able to write some JSL (see below) that completes step 1, but I'm getting hung up on how to complete steps 2 & 3.  I would appreciate any suggestions on how to solve these issues.  Thanks in advance for your help!

Clear Log();

Names Default To Here(1);

dt_growth = New Table( "Growth",

  New Column( "Date & Time",

  Numeric,

  Continuous,

  Format( "m/d/y h:m", 12 ),

  Input Format( "m/d/y h:m" ),

  Set Values(

  [3225884400, 3225888000, 3225891600, 3225895200, 3225898800,

  3225902400, 3225906000, 3225909600, 3225913200, 3225916800,

  3225920400, 3225924000, 3225927600, 3225931200, 3225934800,

  3225938400, 3225942000, 3225945600, 3225949200, 3225952800,

  3225956400, 3225960000, 3225963600, 3225967200, 3225970800,

  3225974400, 3225978000, 3225981600, 3225985200, 3225988800,

  3225992400, 3225996000, 3225999600, 3226003200, 3226006800,

  3226010400, 3226014000, 3226017600, 3226021200, 3226024800,

  3226028400, 3226032000, 3226035600, 3226039200, 3226042800,

  3226046400, 3226050000, 3226053600, 3226057200, 3226060800,

  3226064400, 3226068000, 3226071600, 3226075200, 3226078800,

  3226082400, 3226086000, 3226089600, 3226093200, 3226096800,

  3226100400, 3226104000, 3226107600, 3226111200, 3226114800,

  3226118400, 3226122000, 3226125600, 3226129200, 3226132800,

  3226136400, 3226140000, 3226143600, 3226147200, 3226150800,

  3226154400, 3226158000, 3226161600, 3226165200, 3226168800,

  3226172400, 3226176000, 3226179600, 3226183200, 3226186800,

  3226190400, 3226194000, 3226197600, 3226201200, 3226204800,

  3226208400, 3226212000, 3226215600, 3226219200, 3226222800,

  3226226400, 3226230000, 3226233600, 3226237200, 3226240800,

  3226244400, 3226248000, 3226251600, 3226255200, 3226258800,

  3226262400, 3226266000, 3226269600, 3226273200, 3226276800,

  3226280400, 3226284000, 3226287600, 3226291200, 3226294800,

  3226298400, 3226302000, 3226305600, 3226309200, 3226312800,

  3226316400, 3226320000, 3226323600, 3226327200, 3226330800,

  3226334400, 3226338000, 3226341600, 3226345200, 3226348800,

  3226352400, 3226356000, 3226359600, 3226363200, 3226366800,

  3226370400, 3226374000, 3226377600, 3226381200, 3226384800,

  3226388400, 3226392000, 3226395600, 3226399200, 3226402800,

  3226406400, 3226410000, 3226413600, 3226417200, 3226420800,

  3226424400, 3226428000, 3226431600, 3226435200, 3226438800,

  3226442400, 3226446000, 3226449600, 3226453200, 3226456800,

  3226460400, 3226464000, 3226467600, 3226471200, 3226474800,

  3226478400, 3226482000, 3226485600, 3226489200, 3226492800,

  3226496400, 3226500000, 3226503600, 3226507200, 3226510800,

  3226514400, 3226518000, 3226521600, 3226525200, 3226528800,

  3226532400, 3226536000, 3226539600, 3226543200, 3226546800,

  3226550400, 3226554000, 3226557600, 3226561200, 3226564800,

  3226568400, 3226572000, 3226575600, 3226579200, 3226582800,

  3226586400, 3226590000, 3226593600, 3226597200, 3226600800,

  3226604400, 3226608000, 3226611600, 3226615200, 3226618800,

  3226622400, 3226626000, 3226629600, 3226633200, 3226636800,

  3226640400, 3226644000, 3226647600, 3226651200, 3226654800,

  3226658400, 3226662000, 3226665600, 3226669200, 3226672800,

  3226676400, 3226680000, 3226683600, 3226687200, 3226690800,

  3226694400, 3226698000, 3226701600, 3226705200, 3226708800,

  3226712400, 3226716000, 3226719600, 3226723200, 3226726800,

  3226730400, 3226734000, 3226737600, 3226741200, 3226744800,

  3226748400, 3226752000, 3226755600, 3226759200, 3226762800,

  3226766400, 3226770000, 3226773600, 3226777200, 3226780800,

  3226784400, 3226788000, 3226791600, 3226795200, 3226798800,

  3226802400, 3226806000, 3226809600, 3226813200, 3226816800,

  3226820400, 3226824000, 3226827600, 3226831200, 3226834800,

  3226838400, 3226842000, 3226845600, 3226849200, 3226852800,

  3226856400, 3226860000, 3226863600, 3226867200, 3226870800,

  3226874400, 3226878000, 3226881600, 3226885200, 3226888800,

  3226892400, 3226896000, 3226899600, 3226903200, 3226906800]

  )

  ),

  New Column( "Year", Numeric, Continuous, Format( "Best", 8 ), Formula( Year( :Name( "Date & Time" ) ) ), Lock( 1 ) ),

  New Column( "Month", Numeric, Continuous, Format( "Best", 8 ), Formula( Month( :Name( "Date & Time" ) ) ), Lock( 1 ) ),

  New Column( "Day", Numeric, Continuous, Format( "Best", 8 ), Formula( Day( :Name( "Date & Time" ) ) ), Lock( 1 ) ),

  New Column( "Hour", Numeric, Continuous, Format( "Best", 8 ), Formula( Hour( :Name( "Date & Time" ) ) ), Lock( 1 ) ),

  New Column( "Date", Numeric, Continuous, Format( "m/d/y", 10 ), Formula( Date MDY( :Month, :Day, :Year ) ), Lock( 1 ) ),

  New Column( "Growth",

  Numeric,

  Continuous,

  Format( "Fixed Dec", 12, 3 ),

  Set Values(

  [-0.00100011424000851, -0.000363872799278378, -0.000363871984284188,

  -0.00100005824000851, -0.00227235992075627, -0.00418081408649503, -

  0.00545309175689348, -0.00736156445270497, -0.00736136657166755, -

  0.00545294517616856, -0.00672516584616703, -0.00354446026036948, -

  0.00227219704017202, 0.00154456188025604, 0.00726960356524363,

  0.00790566809993933, 0.0129946043300728, 0.0142672814031039,

  0.0142680804082943, 0.0212680506560791, 0.021269003361861,

  0.0219050687116747, 0.0212692891739123, 0.014270349583592,

  0.0161791280461629, 0.0123615072003508, 0.014269902140721,

  0.0104520647031608, -0.0022725075312535, -0.00418091710202046,

  0.00663394656359483, 0.0123593477559727, 0.0212654783489747,

  0.0288989636199062, 0.0346241368109538, 0.0435298706416388,

  0.0734286369895607, 0.072790864489845, 0.0632482414590739,

  0.063884648195645, 0.0740617924812754, 0.10524374653983,

  0.105238560337556, 0.106520183190921, 0.095705821744203,

  0.0721597475663495, 0.0823432919531381, 0.0829770846230983,

  0.0944282992395635, 0.0918775750507814, ., ., ., ., ., ., ., ., ., .,

  ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., .,

  ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., .,

  ., ., ., ., ., ., ., 0.132598446121035, 0.142136100159265,

  0.144047485365236, 0.141510082527217, 0.133243284004832,

  0.136430823506921, 0.129433763704459, ., 0.116070160678918,

  0.115432357842292, 0.117340824227052, 0.114794572105785,

  0.114790201208717, 0.114151689739583, 0.113513458180752,

  0.112872468220974, 0.115423050522635, 0.143405851093052,

  0.138950940416588, 0.140217253219071, 0.144662268859769,

  0.14847441971641, 0.152290783704404, 0.155475963397883,

  0.164426756532424, 0.16760053536469, 0.162514344711853,

  0.159332058795417, 0.151068736269937, 0.149160923755755,

  0.137709192270229, 0.130708924366787, 0.123070306637102,

  0.117974687738857, 0.119245809936872, 0.117326106775782,

  0.122413756226772, 0.130047595709665, 0.145948812468612,

  0.150402142613181, ., ., 0.05, 0.04, 0.045, ., ., ., ., ., ., ., ., .,

  ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., .,

  ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., .,

  ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., .,

  ., 0.41763939536042, 0.415693459004884, 0.411893026189672,

  0.409996367624642, 0.406806168639344, 0.405541937379636,

  0.403607071498291, 0.409315222566001, 0.409944023628439,

  0.409934840470904, 0.413750645246311, 0.417566432921535,

  0.422655385641797, 0.42965365812232, 0.436654896453155,

  0.44364930329344, 0.45064765414928, 0.457646036355387,

  0.467820889024742, 0.481185887340017, 0.493284494492094,

  0.49456564669957, 0.499026564408861, 0.503492132922427,

  0.50285933023531, 0.506056129447622, 0.506702549723624,

  0.506067464368605, 0.507337632217846, 0.504137348451565,

  0.500946283675119, 0.504757712657227, 0.512386052830777,

  0.520015386838913, 0.529555420635595, 0.539096576517502,

  0.553087055836816, 0.567072203175747, 0.581062055491441,

  0.593783392145494, 0.603324019354862, 0.617957789218505,

  0.630681868813513, 0.640872945402754, 0.656807530753031,

  0.659356671575131, 0.660007636808086, 0.665113592325388]

  )

  )

);

Wait( 2 );

// Get the highest growth for each day

Summarize( growth_date_list = by( dt_growth:Date ), max_growth_list = Max( dt_growth:Growth ) );

// Converts max_growth_list, currently a matrix [ ], to a list { }

Is List( max_growth_list = As List( max_growth_list ) );

Wait( 10 );

// Now get the dates of the highest growth

For( i = 1, i <= N Items( max_growth_list ), i++,

  one_growth_date = growth_date_list;

  one_growth = max_growth_list;

  // Could have the same maximum growth on different hours

  one_growth_date_max_rows = dt_growth << get rows where(

  :Date == Num( one_growth_date ) & :Growth == one_growth

  );

  For( k = 1, k <= N Rows( one_growth_date_max_rows ), k++,

  m = one_growth_date_max_rows;

  one_date = :Date;

  one_date_time = :name( "Date & Time" );

  one_hour = :Hour;

  date_time_list = Insert( date_time_list, :name( "Date & Time" ) );

  hour_list = Insert( hour_list, :Hour );

  Print(

  "Maximum growth of " || Format( one_growth, "Fixed Dec", 3 ) || " occured on " ||

  Format( one_date_time, "m/d/y h:m" )

  );

  );

);

4 REPLIES
ms

Super User

Joined:

Jun 23, 2011

Here is one idea based on 1) identifying the rows and create a column with an index for each period, 2) fit linear regressions  (although maybe not the best model here) by each period and 3) save the prediction formulas to a new column.

Try replace everthing after "Wait(2)" in your example code with the code below:

// 1. Get the highest growth for each day and locate the corresponding datetime

Summarize( growth_date_list = by( dt_growth:Date ), max_growth_list = Max( dt_growth:Growth ) );

max_growth_list = As List( max_growth_list );

//locate rows with max growth

rows = dt_growth << get rows where(

  And(

  !Is Missing( :growth ),

  Contains( max_growth_list, :Growth ),

  Contains( max_growth_list, :Growth ) == Contains( growth_date_list, Format( :Date, "m/d/y" ) )

  )

);

//New column "Period" with an index for each time period metween max growths

dt_growth << New Column( "Period", numeric, nominal );

// set values

For Each Row( dt_growth:Period = Loc Max( Row() <= rows ) );

// for each row(if(min(rows)<=row()<=max(rows),dt_growth:Period=locmax(row()<=rows))); // Use this alternative if growth before first and after last max should be excluded

// 2. Fit linear regression for each period

regressions = Fit Model( Y( :Growth ), Effects( :Name( "Date & Time" ) ), by( :Period ), run() );

// 3. Save prediction formula

regressions << prediction formula;

Column( N Col( dt_growth )) << set name( "adjGrowth" );


terapin

Community Trekker

Joined:

Jun 23, 2011

Fascinating solution to the problem.  However, the regressions that are done include all data for a given period, instead of just the 2 max values.  That is, the regression should be done between max(growth) of period 1 & 2, period 2 & 3, etc.  Is there a command in Fit Model that lets us fit the model between the max(growth) values only instead of all data values for a given period?  I couldn't find such a switch/option in the scripting guides.  Thanks

ms

Super User

Joined:

Jun 23, 2011

OK, if I understand correctly you just need the equation for a straight line through two points. It's about interpolation rather than fitting a regression model, i.e. pure math.

If the equation is written y = mx + b, then for each pair of points (x1, y1) (x2, y2) the the slope is m = (y2 - y1) / (x2 - x1) and the intercept should be b = y1 - m * x1.

Here's a jsl example applied to the example table above (the table must be sorted after datetime for this to work):

Summarize( growth_date_list = by( dt_growth:Date ), g = Max( dt_growth:Growth ) );

max_growth_list = As List( g );

rows = dt_growth << get rows where(

  And(

  !Is Missing( :growth ),

  Contains( max_growth_list, :Growth ),

  Contains( max_growth_list, :Growth ) == Contains( growth_date_list, Format( :Date, "m-d-y" ) )

  )

);

// Connect max points with straight lines and find equation for each segment ("Fit each point")

y = g[Loc( g )];

x = Column( "Date & Time" )[rows];

n = N Rows( rows );

slopes = (y[2 :: n] - y[1 :: n - 1]) :/ (x[2 :: n] - x[1 :: n - 1]);

intercepts = y[1 :: n - 1] - slopes :* x[1 :: n - 1];

dt_growth << New Column( "adj_growth", numeric );

nr = Expr( Loc Max( Row() <= rows[2 :: n] ) );

For Each Row( dt_growth:adj_growth[] = slopes[nr] * :Name( "Date & Time" )[] + intercepts[nr] );

// View result

dt_growth << Graph Builder(

  Show Control Panel( 0 ),

  Variables( X( :Name( "Date & Time" ) ), Y( :Growth ), Y( :adj_growth, Position( 1 ) ) ),

  Elements( Line( X, Y( 2 ), Legend( 3 ) ), Points( X, Y( 1 ), Legend( 4 ) ) )

);

terapin

Community Trekker

Joined:

Jun 23, 2011

Hi MS,

I finally got back to this topic and tried your code out and really like the way it works on regressing just the max(growth) values.  It does exactly what I need it to do. I really liked the prediction formula feature in your earlier example (as shown below) and was wondering if that can still be achieved using the newest code?  I've played with the latest code and have tried to save the formula to the "adj_growth" column but to no avail.  Any ideas on how I could do this would be appreciated.   

// 2. Fit linear regression for each period

regressions = Fit Model( Y( :Growth ), Effects( :Name( "Date & Time" ) ), by( :Period ), run() );

// 3. Save prediction formula

regressions << prediction formula;