Solved: How to stack several columns into a new column ignoring duplicate entries?

Neo · Sep 13, 2023 07:47 AM

I have multi column (character, nominal) data table. I want to get the the entries/items in the last four columns into a new column in the same data table ignoring duplicate entries/items from the columns being stacked. How to achieve this in JSL?

When it's too good to be true, it's neither

SamGardner · Sep 13, 2023 09:29 AM

Here are some ways to do this:

Table Operations:
- Stack the columns in to one column using Tables > Stack
- Use Tables > Summary with that stacked column as the Group role
JSL
- Get column values into a list
  - list_of_values = column(table name, column name) << get values
  - You can do this again on another column and insert those values into the list
    - list_of_values = Insert Into(list_of_values, column(table name, another column name) << get values)
    - repeat as necessary
  - Use an Associate Array to get the unique values
    - unique_values = (Associative Array(list_of_values)) << get keys
  - put that list of values into a new column in a data table

Names Default To Here( 1 );

// create table with 4 columns with random letters
nr = 20;
dt = New Table( "Example", addrows( nr ) );

letterlist = {"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q",
"R", "S", "T", "U", "V", "W", "X", "Y", "Z"};

dt << New Column( "C1", character );
Column( dt, "C1" ) << set values( letterlist[J( nr, 1, Random Integer( 26 ) )] );
dt << New Column( "C2", character );
Column( dt, "C2" ) << set values( letterlist[J( nr, 1, Random Integer( 26 ) )] );
dt << New Column( "C3", character );
Column( dt, "C3" ) << set values( letterlist[J( nr, 1, Random Integer( 26 ) )] );
dt << New Column( "C4", character );
Column( dt, "C4" ) << set values( letterlist[J( nr, 1, Random Integer( 26 ) )] );

// Method 1
dtstack = dt << Stack(
	columns( :C1, :C2, :C3, :C4 ),
	Source Label Column( "Label" ),
	Stacked Data Column( "Data" ),
	Output Table( "Stacked Values" )
);

dtstack << Summary(
	Group( :Data ),
	Freq( "None" ),
	Weight( "None" ),
	output table name( "Unique Values" )
);

//Method 2
list_of_values = Column( dt, "C1" ) << get values;
Insert Into( list_of_values, Column( dt, "C2" ) << get values );
Insert Into( list_of_values, Column( dt, "C3" ) << get values );
Insert Into( list_of_values, Column( dt, "C4" ) << get values );
Print( list_of_values );
unique_values = (Associative Array( list_of_values )) << get keys;

dtunique = New Table( "Unique Values" );
dtunique << New Column( "Unique Values", character );
Column( dtunique, "Unique Values" ) << set values( unique_values );

View solution in original post

jthi · Sep 13, 2023 08:22 AM

Without knowing how your data looks and and how it should look like in the end my best guess is to use combination of Associative Array() << Get Keys and Concat Items()

-Jarmo

Neo · Sep 13, 2023 6:07 AM

@jthi Thanks. But does this drop duplicates?

My Columns (to be stacked) have common Char entries between them but also unique entries. I want to keep each in the new column dropping the duplicates.

When it's too good to be true, it's neither

jthi · Sep 13, 2023 09:06 AM

Associative Arrays cannot have duplicated keys, so yes it will (it is also ordered so order might change).

-Jarmo

Neo · Sep 13, 2023 09:33 AM

@jthi OK. Thanks. Is this the correct syntax?

aa = Associative Array( As Column(lastCH), As Column(2ndLastCH), As Column(3rdLastCH), As Column(4rthLastCH)) << get keys();
// Where "lastCH", "2ndlastCH", "3rdlastCH" and "4rthlastCH" are column headers for the last four columns in my data table, supplied from above

When it's too good to be true, it's neither

SamGardner · Sep 13, 2023 09:29 AM

Here are some ways to do this:

Table Operations:
- Stack the columns in to one column using Tables > Stack
- Use Tables > Summary with that stacked column as the Group role
JSL
- Get column values into a list
  - list_of_values = column(table name, column name) << get values
  - You can do this again on another column and insert those values into the list
    - list_of_values = Insert Into(list_of_values, column(table name, another column name) << get values)
    - repeat as necessary
  - Use an Associate Array to get the unique values
    - unique_values = (Associative Array(list_of_values)) << get keys
  - put that list of values into a new column in a data table

Names Default To Here( 1 );

// create table with 4 columns with random letters
nr = 20;
dt = New Table( "Example", addrows( nr ) );

letterlist = {"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q",
"R", "S", "T", "U", "V", "W", "X", "Y", "Z"};

dt << New Column( "C1", character );
Column( dt, "C1" ) << set values( letterlist[J( nr, 1, Random Integer( 26 ) )] );
dt << New Column( "C2", character );
Column( dt, "C2" ) << set values( letterlist[J( nr, 1, Random Integer( 26 ) )] );
dt << New Column( "C3", character );
Column( dt, "C3" ) << set values( letterlist[J( nr, 1, Random Integer( 26 ) )] );
dt << New Column( "C4", character );
Column( dt, "C4" ) << set values( letterlist[J( nr, 1, Random Integer( 26 ) )] );

// Method 1
dtstack = dt << Stack(
	columns( :C1, :C2, :C3, :C4 ),
	Source Label Column( "Label" ),
	Stacked Data Column( "Data" ),
	Output Table( "Stacked Values" )
);

dtstack << Summary(
	Group( :Data ),
	Freq( "None" ),
	Weight( "None" ),
	output table name( "Unique Values" )
);

//Method 2
list_of_values = Column( dt, "C1" ) << get values;
Insert Into( list_of_values, Column( dt, "C2" ) << get values );
Insert Into( list_of_values, Column( dt, "C3" ) << get values );
Insert Into( list_of_values, Column( dt, "C4" ) << get values );
Print( list_of_values );
unique_values = (Associative Array( list_of_values )) << get keys;

dtunique = New Table( "Unique Values" );
dtunique << New Column( "Unique Values", character );
Column( dtunique, "Unique Values" ) << set values( unique_values );

Neo · Sep 13, 2023 10:02 AM

@SamGardner

Both these work fine, but appends an empty row if the number of rows is not the same for every column as in the case of my actual data set.

When it's too good to be true, it's neither

SamGardner · Sep 13, 2023 03:11 PM

if you insert the code

remove from(unique_values, loc(unique_values,"")[1])

after the line where the unique_values list is created, that will remove the value from the list that is the empty string.

remove from(unique_values, 1)

should also work, since the keys are sorted alphabetically with they are requested.

How to stack several columns into a new column ignoring duplicate entries?

Re: How to stack several columns into a new column ignoring duplicate entries?

Re: How to stack several columns into a new column ignoring duplicate entries?

Re: How to stack several columns into a new column ignoring duplicate entries?

Re: How to stack several columns into a new column ignoring duplicate entries?

Re: How to stack several columns into a new column ignoring duplicate entries?

Re: How to stack several columns into a new column ignoring duplicate entries?

Re: How to stack several columns into a new column ignoring duplicate entries?

Re: How to stack several columns into a new column ignoring duplicate entries?