cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
rmthomas
Level III

Last 30 pieces for each row and delete the other data

Hello Everyone, 

 

I have a data table that has multiple columns and each column has a different amount of rows with data in it. I am trying to find a way to get the last 30 rows of the data in each column. For example column 1 has 50 rows and column 2 has 75 rows and I want each row to have the last 30 rows of info. Is there a script that can do that? 

 

2 ACCEPTED SOLUTIONS

Accepted Solutions
txnelson
Super User

Re: Last 30 pieces for each row and delete the other data

Deleting all but the last 30 rows in a data table is easy.

Names Default To Here( 1 );
dt = Current Data Table();
dt << select where( Row() > N Rows( dt ) - 30 );
dt << delete rows;

If you are asking to find the last 30 non missing data points for each column, and then save that as your n columns by 30 rows of data, that is a little more difficult and I will question the validity of doing that.  The result of that would be removing pretty much any relationship between data points between the different columns.

Names Default To Here( 1 );
dt = Current Data Table();

// Copy the data table retaining no rows;
dtCopy = dt << subset( selected rows( 0 ), selected columns( 0 ) );
dtCopy << select where( Row() >= 0 );
dtCopy << delete rows;
For( i = 1, i <= N Cols( dtCopy ), i++,
	theList = {};
	For( k = N Rows( dt ), k >= 1, k--,
		If( Is Missing( Column( dt, i )[k] ) == 0,
			Insert Into( theList, Column( dt, i )[k] );
			If( N Items( theList ) >= 30,
				Break()
			);
		)
	);
	Column( dtCopy, i ) << set values( Reverse( theList ) );
);
Jim

View solution in original post

Re: Last 30 pieces for each row and delete the other data

Here is another version to accomplish the same result as the second solution from @txnelson 

 

For( col = 1, col <= N Col( dt ), col++,
	values = Column( dt, col ) << Get Values;
	end = Contains( values, . ) - 1;
	If( end == -1, end = N Row( dt ) );
	view = end - 29 :: end;
	Column( dt, col ) << Set Values( values[view] );
);
dt << Select Where( Row() > 30 ) << Delete Rows;

View solution in original post

6 REPLIES 6
txnelson
Super User

Re: Last 30 pieces for each row and delete the other data

Deleting all but the last 30 rows in a data table is easy.

Names Default To Here( 1 );
dt = Current Data Table();
dt << select where( Row() > N Rows( dt ) - 30 );
dt << delete rows;

If you are asking to find the last 30 non missing data points for each column, and then save that as your n columns by 30 rows of data, that is a little more difficult and I will question the validity of doing that.  The result of that would be removing pretty much any relationship between data points between the different columns.

Names Default To Here( 1 );
dt = Current Data Table();

// Copy the data table retaining no rows;
dtCopy = dt << subset( selected rows( 0 ), selected columns( 0 ) );
dtCopy << select where( Row() >= 0 );
dtCopy << delete rows;
For( i = 1, i <= N Cols( dtCopy ), i++,
	theList = {};
	For( k = N Rows( dt ), k >= 1, k--,
		If( Is Missing( Column( dt, i )[k] ) == 0,
			Insert Into( theList, Column( dt, i )[k] );
			If( N Items( theList ) >= 30,
				Break()
			);
		)
	);
	Column( dtCopy, i ) << set values( Reverse( theList ) );
);
Jim

Re: Last 30 pieces for each row and delete the other data

Here is another version to accomplish the same result as the second solution from @txnelson 

 

For( col = 1, col <= N Col( dt ), col++,
	values = Column( dt, col ) << Get Values;
	end = Contains( values, . ) - 1;
	If( end == -1, end = N Row( dt ) );
	view = end - 29 :: end;
	Column( dt, col ) << Set Values( values[view] );
);
dt << Select Where( Row() > 30 ) << Delete Rows;
rmthomas
Level III

Re: Last 30 pieces for each row and delete the other data

@Mark_Bailey  This works great! I have come to realize I have data in columns that is less than 30 and greater than 30 is there an easy way to only show this? An example column 1 has 50 rows and column 2 has 75 rows and column 3 has 28 rows and I want each row to have the last 30 rows of info nothing that is below 30 so if it has 29 or less data points I would like to delete it also. Thanks for the help!

Re: Last 30 pieces for each row and delete the other data

Try this version.

 

Names Default to Here( 1 );

// create the example

dt = New Table( "Odd Lengths",
	New Column( "Column 1", Numeric, Continuous, Values( 1::50 ) ),
	New Column( "Column 2", Numeric, Continuous, Values( 1::75 ) ),
	New Column( "Column 3", Numeric, Continuous, Values( 1::28 ) )
);

Wait( 0 );

// process the data table

For( col = N Col( dt ), col > 0, col--,
	values = Column( dt, col ) << Get Values;
	not missing = Sum( Not( Is Missing( values ) ) );
	If( not missing < 30,
		dt << Delete Columns( Column( dt, col ) ),
		end = Contains( values, . ) - 1;
		If( end == -1, end = N Row( dt ) );
		view = end - 29 :: end;
		Column( dt, col ) << Set Values( values[view] );
		
	);
);

dt << Select Where( Row() > 30 ) << Delete Rows;
ian_jmp
Staff

Re: Last 30 pieces for each row and delete the other data

 FWIW, here's yet another version using matrices:

NamesDefaultToHere(1);

// Make a data table 'dt', to play with.
nr = 100;		// Number of rows
nc = 30;		// Number of columns
p = 0.7;		// Proportion of missing values
vals = J(nr*nc, 1, RandomInteger(0, 99));
makeMissing = RandomIndex(nr*nc, Floor(nr*nc*p));
vals[makeMissing] = .;
dt = AsTable(Shape(vals, nr, nc));
dt << setName("Random Integers with some missing");

// Start here with the current table . . .
lastN = 30;														// Number of non-missing values to retain
dt = CurrentDataTable();
vals = dt << getAsMatrix;
// Loop over columns . . .
newVals = [];
for(c=1, c<=NCol(vals), c++,
	NMV = vals[LocNonMissing(vals[0, c]), c];					// Non-mising values in this column
	if(NRow(NMV) >= lastN,										// Do we have enough non-missing values?
		lastNMV = NMV[(NRow(NMV) - lastN + 1)::NRow(NMV)];		// If so, get the 'lastN' of these  . . .
		newVals = newVals||lastNMV;								// . . . and add them into a new column
		);
);
dt2 = AsTable(newVals);
dt2 << setName("Last "||Char(lastN)||" non-missing Random Integers");
txnelson
Super User

Re: Last 30 pieces for each row and delete the other data

For the of future readers of this track, a correction to @Mark_Bailey statement about his code when compared to the second piece of code from @txnelson response needs to be clarified.  The two pieces of code will produce the same results if the assumption is made that once a missing value is found, that  no additional non missing values appear in the column.  @Mark_Bailey code requires that the minimum number of non missing values must exist, where @txnelson code does not

Jim