I have multi column (character, nominal) data table. I want to get the the entries/items in the last four columns into a new column in the same data table ignoring duplicate entries/items from the columns being stacked. How to achieve this in JSL?
Here are some ways to do this:
Names Default To Here( 1 );
// create table with 4 columns with random letters
nr = 20;
dt = New Table( "Example", addrows( nr ) );
letterlist = {"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q",
"R", "S", "T", "U", "V", "W", "X", "Y", "Z"};
dt << New Column( "C1", character );
Column( dt, "C1" ) << set values( letterlist[J( nr, 1, Random Integer( 26 ) )] );
dt << New Column( "C2", character );
Column( dt, "C2" ) << set values( letterlist[J( nr, 1, Random Integer( 26 ) )] );
dt << New Column( "C3", character );
Column( dt, "C3" ) << set values( letterlist[J( nr, 1, Random Integer( 26 ) )] );
dt << New Column( "C4", character );
Column( dt, "C4" ) << set values( letterlist[J( nr, 1, Random Integer( 26 ) )] );
// Method 1
dtstack = dt << Stack(
columns( :C1, :C2, :C3, :C4 ),
Source Label Column( "Label" ),
Stacked Data Column( "Data" ),
Output Table( "Stacked Values" )
);
dtstack << Summary(
Group( :Data ),
Freq( "None" ),
Weight( "None" ),
output table name( "Unique Values" )
);
//Method 2
list_of_values = Column( dt, "C1" ) << get values;
Insert Into( list_of_values, Column( dt, "C2" ) << get values );
Insert Into( list_of_values, Column( dt, "C3" ) << get values );
Insert Into( list_of_values, Column( dt, "C4" ) << get values );
Print( list_of_values );
unique_values = (Associative Array( list_of_values )) << get keys;
dtunique = New Table( "Unique Values" );
dtunique << New Column( "Unique Values", character );
Column( dtunique, "Unique Values" ) << set values( unique_values );
Without knowing how your data looks and and how it should look like in the end my best guess is to use combination of Associative Array() << Get Keys and Concat Items()
@jthi Thanks. But does this drop duplicates?
My Columns (to be stacked) have common Char entries between them but also unique entries. I want to keep each in the new column dropping the duplicates.
Associative Arrays cannot have duplicated keys, so yes it will (it is also ordered so order might change).
@jthi OK. Thanks. Is this the correct syntax?
aa = Associative Array( As Column(lastCH), As Column(2ndLastCH), As Column(3rdLastCH), As Column(4rthLastCH)) << get keys();
// Where "lastCH", "2ndlastCH", "3rdlastCH" and "4rthlastCH" are column headers for the last four columns in my data table, supplied from above
Here are some ways to do this:
Names Default To Here( 1 );
// create table with 4 columns with random letters
nr = 20;
dt = New Table( "Example", addrows( nr ) );
letterlist = {"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q",
"R", "S", "T", "U", "V", "W", "X", "Y", "Z"};
dt << New Column( "C1", character );
Column( dt, "C1" ) << set values( letterlist[J( nr, 1, Random Integer( 26 ) )] );
dt << New Column( "C2", character );
Column( dt, "C2" ) << set values( letterlist[J( nr, 1, Random Integer( 26 ) )] );
dt << New Column( "C3", character );
Column( dt, "C3" ) << set values( letterlist[J( nr, 1, Random Integer( 26 ) )] );
dt << New Column( "C4", character );
Column( dt, "C4" ) << set values( letterlist[J( nr, 1, Random Integer( 26 ) )] );
// Method 1
dtstack = dt << Stack(
columns( :C1, :C2, :C3, :C4 ),
Source Label Column( "Label" ),
Stacked Data Column( "Data" ),
Output Table( "Stacked Values" )
);
dtstack << Summary(
Group( :Data ),
Freq( "None" ),
Weight( "None" ),
output table name( "Unique Values" )
);
//Method 2
list_of_values = Column( dt, "C1" ) << get values;
Insert Into( list_of_values, Column( dt, "C2" ) << get values );
Insert Into( list_of_values, Column( dt, "C3" ) << get values );
Insert Into( list_of_values, Column( dt, "C4" ) << get values );
Print( list_of_values );
unique_values = (Associative Array( list_of_values )) << get keys;
dtunique = New Table( "Unique Values" );
dtunique << New Column( "Unique Values", character );
Column( dtunique, "Unique Values" ) << set values( unique_values );
Both these work fine, but appends an empty row if the number of rows is not the same for every column as in the case of my actual data set.
if you insert the code
remove from(unique_values, loc(unique_values,"")[1])
after the line where the unique_values list is created, that will remove the value from the list that is the empty string.
remove from(unique_values, 1)
should also work, since the keys are sorted alphabetically with they are requested.