<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to calculate moving average of column with 100 million entries in JMP 11? in Discussions</title>
    <link>https://community.jmp.com/t5/Discussions/How-to-calculate-moving-average-of-column-with-more-than-10000/m-p/210961#M42245</link>
    <description>&lt;P&gt;&lt;a href="https://community.jmp.com/t5/user/viewprofilepage/user-id/13585"&gt;@jojmp&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Either I do not understand your request or you do not understand Col Moving Avg().&amp;nbsp; You stated that you have 100 million rows and the span you specified in your script was 99,999,999.&amp;nbsp; So the last last row would be the mean of the entire column, and the previous rows would be all values up to and including the current row. Below is a picture of Big Class and a column moving average with a span of 5 (n before).&amp;nbsp; Note rows less than 6 just use all rows, and will result in a mean based on less than 6 values.&amp;nbsp; From row 6 and later, the mean is computed using the current row and the previous 5 rows.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So please provide more details. Now if your files have more than 100 million rows, then the custom formula would need to be modified.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="image.png" style="width: 999px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/17474iE3C63B880F8B5F0A/image-size/large?v=v2&amp;amp;px=999" role="button" title="image.png" alt="image.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 29 May 2019 07:42:52 GMT</pubDate>
    <dc:creator>gzmorgan0</dc:creator>
    <dc:date>2019-05-29T07:42:52Z</dc:date>
    <item>
      <title>How to calculate moving average of column with  more than 10000 million entries in JMP 11?</title>
      <link>https://community.jmp.com/t5/Discussions/How-to-calculate-moving-average-of-column-with-more-than-10000/m-p/210810#M42199</link>
      <description>&lt;P&gt;&lt;SPAN&gt;I am trying to run the below formula on a column with 100 million rows to calculate the moving average.Could you please suggest an JMP11 alternative for&amp;nbsp;&lt;CODE class=" language-jsl"&gt;&amp;nbsp;Col Moving Average&amp;nbsp;&amp;nbsp;&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;CODE class=" language-jsl"&gt;Col Moving Average( :x, 1, 9999999, 0 ) is not supported by JMP11.&amp;nbsp;&lt;/CODE&gt;&lt;CODE class=" language-jsl"&gt;&lt;/CODE&gt;&lt;/P&gt;&lt;P&gt;&lt;CODE class=" language-jsl"&gt;&lt;/CODE&gt;Please find below the code I have tried:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;NamesDefaultToHere(1);

// Use a function to open a .CSV file: Given the path to such a file, returns a reference to the resulting table
openMyCSV = 
function({path2File}, {Default Local},
			dt = Open(
					path2File,
					columns(
						New Column( "column 1", Numeric, "Continuous", Format( "Best", 12 ) ),
						New Column( "items", Numeric, "Continuous", Format( "Best", 12 ) )
						),
					Import Settings(
						End Of Line( CRLF, CR, LF ),
						End Of Field( Comma, CSV( 0 ) ),
						Strip Quotes( 1 ),
						Use Apostrophe as Quotation Mark( 0 ),
						Use Regional Settings( 0 ),
						Scan Whole File( 1 ),
						Treat empty columns as numeric( 0 ),
						CompressNumericColumns( 0 ),
						CompressCharacterColumns( 0 ),
						CompressAllowListCheck( 0 ),
						Labels( 0 ),
						Column Names Start( 1 ),
						Data Starts( 1 ),
						Lines To Read( "All" ),
						Year Rule( "20xx" )
					)
				)
	);
	
dt = New Table( "My data file");
dt &amp;lt;&amp;lt; New Column( "cc", Character, width( 40 ) );
dt &amp;lt;&amp;lt; New Column( "MovAvgMean", Numeric );
dt &amp;lt;&amp;lt; New Column( "MovAvgMax", Numeric );

// Put all the .csv files to process into a list
dir = Pick Directory(
	"Directory containing your csv files",
	"C:\Users\xxx\Documents\test\"
);
files = Files In Directory( dir );
For( f = N Items( files ), f &amp;gt;= 1, f--,
	If( !Ends With( Uppercase( files[f] ), ".CSV" ),
		Remove From( files, f )
	)
);

// Process each file - Incomplete!
For( f = 1, f &amp;lt;= N Items( files ), f++,
	thisTable = openMyCSV( dir || files[f] );
	thisTable &amp;lt;&amp;lt; New Column( "x",
		Numeric,
		Continuous,
		Formula( (87 *:items)-(0.03998113) ));
	thisTable &amp;lt;&amp;lt; New Column( "av", Numeric, Continuous, Formula( Col Moving Average( :x, 1, 9999999, 0 ) ) );
	
	// This will insure that all formula processing is complet before the next statement is run
	thisTable &amp;lt;&amp;lt; run formulas; 
	
	FileName = files[f];
	// Add the summary statistics to the summary table
	dt &amp;lt;&amp;lt; Add Rows( 1 );
    dt:cc[N Rows( dt )] = FileName;
	dt:MovAvgMean[N Rows( dt )] = Col Mean( thisTable:x );
	dt:MovAvgMax[N Rows( dt )] = Col Max( thisTable:x );
	
	Close( thisTable, nosave );
);
	&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;9&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 29 May 2019 08:01:52 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-to-calculate-moving-average-of-column-with-more-than-10000/m-p/210810#M42199</guid>
      <dc:creator>jojmp</dc:creator>
      <dc:date>2019-05-29T08:01:52Z</dc:date>
    </item>
    <item>
      <title>Re: How to calculate moving average of column with 100 million entries in JMP 11?</title>
      <link>https://community.jmp.com/t5/Discussions/How-to-calculate-moving-average-of-column-with-more-than-10000/m-p/210925#M42232</link>
      <description>&lt;P&gt;&lt;a href="https://community.jmp.com/t5/user/viewprofilepage/user-id/13585"&gt;@jojmp&lt;/a&gt;,&amp;nbsp; your formula for the moving average is a bit foreign&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;thisTable &amp;lt;&amp;lt; New Column( "av", Numeric, Continuous, Formula( Col Moving Average( :x, 1, 9999999, 0 ) ) );&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The third argument, 99999999, is creating some type of&amp;nbsp; (historical) mean by Sequence (row).&amp;nbsp; The third argument is the span of the average. More common is a span of 5 or 10 or ?? to provide a Mean for values near in time, that does not include the entire history.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Assuming that is what you want, an alternative formula can do the same thing&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;If( Row() == 1,
	:x,
	(:x + (Row() - 1) * Lag( :av, 1 )) / Row()
)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Attached is a version of sample data Big Class.jmp with two columns with formulas&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;Col Moving Average( :weight, 1, 9999, 0 )&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;and the comparable Custom Formula&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;If( Row() == 1,
	:weight,
	(:weight + (Row() - 1) * Lag( :Custom Formula, 1 )) / Row()
)&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 28 May 2019 22:18:17 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-to-calculate-moving-average-of-column-with-more-than-10000/m-p/210925#M42232</guid>
      <dc:creator>gzmorgan0</dc:creator>
      <dc:date>2019-05-28T22:18:17Z</dc:date>
    </item>
    <item>
      <title>Re: How to calculate moving average of column with 100 million entries in JMP 11?</title>
      <link>https://community.jmp.com/t5/Discussions/How-to-calculate-moving-average-of-column-with-more-than-10000/m-p/210932#M42237</link>
      <description>&lt;P&gt;In this custom formula how do we know it is the moving average of 9999 items?&lt;/P&gt;</description>
      <pubDate>Wed, 29 May 2019 05:16:57 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-to-calculate-moving-average-of-column-with-more-than-10000/m-p/210932#M42237</guid>
      <dc:creator>jojmp</dc:creator>
      <dc:date>2019-05-29T05:16:57Z</dc:date>
    </item>
    <item>
      <title>Re: How to calculate moving average of column with 100 million entries in JMP 11?</title>
      <link>https://community.jmp.com/t5/Discussions/How-to-calculate-moving-average-of-column-with-more-than-10000/m-p/210961#M42245</link>
      <description>&lt;P&gt;&lt;a href="https://community.jmp.com/t5/user/viewprofilepage/user-id/13585"&gt;@jojmp&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Either I do not understand your request or you do not understand Col Moving Avg().&amp;nbsp; You stated that you have 100 million rows and the span you specified in your script was 99,999,999.&amp;nbsp; So the last last row would be the mean of the entire column, and the previous rows would be all values up to and including the current row. Below is a picture of Big Class and a column moving average with a span of 5 (n before).&amp;nbsp; Note rows less than 6 just use all rows, and will result in a mean based on less than 6 values.&amp;nbsp; From row 6 and later, the mean is computed using the current row and the previous 5 rows.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So please provide more details. Now if your files have more than 100 million rows, then the custom formula would need to be modified.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="image.png" style="width: 999px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/17474iE3C63B880F8B5F0A/image-size/large?v=v2&amp;amp;px=999" role="button" title="image.png" alt="image.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 29 May 2019 07:42:52 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-to-calculate-moving-average-of-column-with-more-than-10000/m-p/210961#M42245</guid>
      <dc:creator>gzmorgan0</dc:creator>
      <dc:date>2019-05-29T07:42:52Z</dc:date>
    </item>
    <item>
      <title>Re: How to calculate moving average of column with 100 million entries in JMP 11?</title>
      <link>https://community.jmp.com/t5/Discussions/How-to-calculate-moving-average-of-column-with-more-than-10000/m-p/210964#M42248</link>
      <description>sorry for not being clear I have changed the question to more than 10000 million entries and the number of entries are not fixed it can increase to any number. Looking for a generic formula to calculate the moving average over any number of items.</description>
      <pubDate>Wed, 29 May 2019 08:06:15 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-to-calculate-moving-average-of-column-with-more-than-10000/m-p/210964#M42248</guid>
      <dc:creator>jojmp</dc:creator>
      <dc:date>2019-05-29T08:06:15Z</dc:date>
    </item>
    <item>
      <title>Re: How to calculate moving average of column with 100 million entries in JMP 11?</title>
      <link>https://community.jmp.com/t5/Discussions/How-to-calculate-moving-average-of-column-with-more-than-10000/m-p/210968#M42252</link>
      <description>&lt;P&gt;&lt;a href="https://community.jmp.com/t5/user/viewprofilepage/user-id/13585"&gt;@jojmp&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Given your original script where you did not even save your table and you reference a column that was never created, and given that the mean of a large number of values has a very small change by dropping one row and adding another, your question is very strange.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Below is a script, that creates a table with nr rows versus reading in a file where nr = nrow(dt). If you had 100 million rows, memory management is very important and calculations shoudl try to be order n, O(n) and not O(n*rng)&lt;/P&gt;
&lt;P&gt;If you show the embedded log the time to do the calculations will be printed out.&amp;nbsp; Also &lt;STRONG&gt;Begin Data Update()&lt;/STRONG&gt; and &lt;STRONG&gt;End Data Update()&amp;nbsp;&lt;/STRONG&gt;are used to save on IO (rendering) time.&amp;nbsp; On my PC here are some of my results:&lt;/P&gt;
&lt;TABLE style="border-collapse: collapse; width: 100%; height: 145px;" border="1"&gt;
&lt;TBODY&gt;
&lt;TR style="height: 29px;"&gt;
&lt;TD style="width: 33.3333%; height: 29px;"&gt;nr&lt;/TD&gt;
&lt;TD style="width: 33.3333%; height: 29px;"&gt;nrg&lt;/TD&gt;
&lt;TD style="width: 33.3333%; height: 29px;"&gt;seconds&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 29px;"&gt;
&lt;TD style="width: 33.3333%; height: 29px;"&gt;100,000&lt;/TD&gt;
&lt;TD style="width: 33.3333%; height: 29px;"&gt;100&lt;/TD&gt;
&lt;TD style="width: 33.3333%; height: 29px;"&gt;0.34&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 29px;"&gt;
&lt;TD style="width: 33.3333%; height: 29px;"&gt;10,000,000&amp;nbsp;&lt;/TD&gt;
&lt;TD style="width: 33.3333%; height: 29px;"&gt;1,000,000&lt;/TD&gt;
&lt;TD style="width: 33.3333%; height: 29px;"&gt;10.66&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 29px;"&gt;
&lt;TD style="width: 33.3333%; height: 29px;"&gt;100,000,000&lt;/TD&gt;
&lt;TD style="width: 33.3333%; height: 29px;"&gt;100,000,000&lt;/TD&gt;
&lt;TD style="width: 33.3333%; height: 29px;"&gt;85.74&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 29px;"&gt;
&lt;TD style="width: 33.3333%; height: 29px;"&gt;100,000,000&lt;/TD&gt;
&lt;TD style="width: 33.3333%; height: 29px;"&gt;10,000,000&lt;/TD&gt;
&lt;TD style="width: 33.3333%; height: 29px;"&gt;104.50&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note, this script assumes &lt;FONT color="#0000FF"&gt;&lt;STRONG&gt;no missing values&lt;/STRONG&gt;&lt;/FONT&gt;.&amp;nbsp; The script will need to be modified and performance will degrade immensely to accomodate missing values. Change the values of nr and nrg to test your PC's performance.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;Names Default to Here(1);

nr = 100000;
dt = new table("test", new column("x"), Add rows(nr));

dt:x &amp;lt;&amp;lt; set each value(Random Integer(100,200));

nrg = 100;

dt &amp;lt;&amp;lt; begin data update();
t0 = hp time();
mvcol = dt &amp;lt;&amp;lt; new column("MV");
mvcol[1] = :x[1];
for(j=2, j&amp;lt;=nrg, j++,
	mvcol[j] = (:x[j] + (j-1)*mvcol[j-1])/j; //assumes no missing values
);
if(nrg &amp;lt; nr,
	for(j=nrg+1, j&amp;lt;=nr, j++,
		mvcol[j] = (:x[j] + (nrg)*mvcol[j-1] - :x[j - nrg])/ (nrg)
	);
); //end if, 
wait(0);
dt &amp;lt;&amp;lt; end data update();
t1= hp time();

show((t1-t0)*1e-6);

&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Given all this, I still get the feeling what you described miht not be what you need.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 29 May 2019 10:48:11 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-to-calculate-moving-average-of-column-with-more-than-10000/m-p/210968#M42252</guid>
      <dc:creator>gzmorgan0</dc:creator>
      <dc:date>2019-05-29T10:48:11Z</dc:date>
    </item>
  </channel>
</rss>

