Discussions

JM13 · Jun 9, 2023 9:00 AM

I'm trying to take a list of things and make all possible pairs of combinations appear in a set of two columns. I currently have a script generated that does permutations (plus repeates of the same value in col a vs col b) but I'm struggling on how to reduce it down to combinations. For example, if the list of things was 1, 2, 3, 4, 5 then I don't need both 1/2 and 2/1, or 1/1. Below is my script I currently have and I think I'm close, I just need a bit more to get where I need to be. I just need to make this script strictly do the choose 2 combinations of an arbitrary list. This particular example is using 5 things for simplicity. thanks

Names Default To Here( 1 );

New Table( "Test",
	Add Rows( 5 ),
	New Column( "API 10", Numeric, "Continuous", Format( "Best", 15 ), Set Values( [4238939261, 4238939263, 4238939262, 4238939328, 4238939329] ) ),

);

Test = Data Table( "Test" );
APIs = Test:API 10 << get values;
working = New Table( "TestAPIComb", New Column( "API10" ), New Column( "API102" ), add rows( N Items( APIs ) ^ 2 ) );
APIs2 = Test:API 10 << get values;

theRow = 0;
	
For( APICNT = 1, APICNT <= N Items( APIs ), APICNT++,
	For( API2CNT = 1, API2CNT <= N Items( APIs2 ), API2CNT++,
		theRow++;
		:API10[theRow] = APIs[APICNT];
		:API102[theRow] = APIs[API2CNT];
	)
);

txnelson · Oct 12, 2022 11:02 AM

I have added a few lines of code that will eliminate the duplicate combinations. In addition, I would suggest that you look into

Tables=>Join

at the Cartesian join capability.

It produces the all possible combinations much faster than JSL code can do it.

Names Default To Here( 1 );

New Table( "Test",
	Add Rows( 5 ),
	New Column( "API 10", Numeric, "Continuous", Format( "Best", 15 ), Set Values( [4238939261, 4238939263, 4238939262, 4238939328, 4238939329] ) ),

);

Test = Data Table( "Test" );
APIs = Test:API 10 << get values;
working = New Table( "TestAPIComb", New Column( "API10" ), New Column( "API102" ), add rows( N Items( APIs ) ^ 2 ) );
APIs2 = Test:API 10 << get values;

theRow = 0;
	
For( APICNT = 1, APICNT <= N Items( APIs ), APICNT++,
	For( API2CNT = 1, API2CNT <= N Items( APIs2 ), API2CNT++,
		theRow++;
		:API10[theRow] = APIs[APICNT];
		:API102[theRow] = APIs[API2CNT];
	)
);	

dt = Data Table( "TestAPIcomb" );

// Arange data so order is the same for all rows
For( i = 1, i <= N Rows( dt ), i++,
	If( :API102[i] > :API10[i],
		hold = :API10[i];
		:API10[i] = :API102[i];
		:API102[i] = hold;
	)
);

// Find the duplicates
dt << Select Duplicate Rows( :API10, :API102 );

// Delete the duplicates
dt << delete rows;

Jim

jthi · Oct 14, 2022 11:13 AM

Most likely there are quite many ways of doing this (the cartesian join @txnelson is good starting point). Here is one option using just matrix calculations:

Names Default To Here(1);

// input list
m = [4238939261, 4238939263, 4238939262, 4238939328, 4238939329];
n_items = N Items(m);

// combination count
comb_count = Summation(i = 1, n_items - 1, i++);

m = Sort Ascending(m);

first_col = [];
second_col = [];
For(i = 2, i <= n_items, i++,
	first_col ||= Repeat(m[i - 1], n_items - i + 1)`;
	second_col ||= m[Index(i, n_items)]`;
);

dt = New Table("Combinations",
	Add Rows(comb_count),
	New Column("FirstCol", Numeric, Continuous, Values(first_col)),
	New Column("SecondCol", Numeric, Continuous, Values(second_col)),
);

-Jarmo

JM13 · Oct 20, 2022 04:17 PM

Thanks. I looked at what @txnelson suggested and also implemented the cartesian join and it returned 15 results instead of 10. I then just added a simple select condition at the end where if the first column = the second column then I also deleted them. Do either of you know which method is faster in JMP? i'm working with a large dataset and I'll try that matrix method as well but just didn't know if either of you know what should be more efficient.

txnelson · Oct 20, 2022 04:36 PM

I don't know which method would work fastest. My suggestion is to run both of them and see which is fastest.

Jim

nthai · Feb 7, 2023 03:10 AM

Hi @jthi, any ideas if the initial list is string instead of number? Since matrix only works for number data type.

jthi · Feb 7, 2023 07:49 AM

Without much changes you could just use initial list as numbers and then create conversion table to join character values based on those numeric values.

-Jarmo

brady_brady · Feb 7, 2023 12:29 PM

Hello,

The NChooseK Matrix( ) function is tailor-made for problems like this. It is also reasonably quick, and especially useful for k>2. Check out the code below for an example of 2 things, and 5 things, chosen from a set of 26.

Cheers,

Brady

names default to here(1);

lis = words("abcdefghijklmnopqrstuvwxyz","");

// original example: choose 2
dt = astable( nchoosekMatrix(nitems(lis), 2), << column names({"x", "y"}) );
dt << new column ("first", character, << set values ( lis[dt:x << get values]));
dt << new column ("second", character, << set values ( lis[dt:y << get values]));
dt << delete columns (1::2);

// a more interesting example: choose 5
dt = astable( nchoosekMatrix(nitems(lis), 5), << column names( words("x1,x2,x3,x4,x5",",") ) );
for each ({ i }, 1::5, //i = 1;
	eval (substitute( expr(dt << new column ( "y"||char(i), character, <<set values ( lis[_EX_ << get values]  ) ) ), 
								expr(_EX_), 
								parse(evalinsert("dt:x^i^"))
	));
);
dt << delete columns (1::5);

brady_brady · Feb 7, 2023 01:12 PM

As to performance.

The code below generates the almost 2 million row table of pairs from a set of 2000 in a little under 6 seconds, which is a bit less than half as long as looping took. I don't know whether such performance is suitable for your purposes. Hopefully it is a feasible option for you.

Names Default To Here( 1 );

letters = Words( "abcdefghijklmnopqrstuvwxyz", "" );

// create a large list of random strings of length 10. use an AA to remove (unlikely) duplicates
lis = Associative Array( Repeat( {Concat Items( letters[J( 10, 1, Random Integer( 26 ) )], "" )}, 2e3 ) ) << get keys;

t1 = HP Time();

// looping solution
dt = new table( "xx", add rows (2e6), << new column ("X", character), <<new column ("y", character));
n = nitems(lis);
r=0;
for  each ( { i }, 1::(n-1), 
	for each( { j }, ( i + 1):: n, 
		r++;
		dt[ r, 1] = lis[ i ]; dt[ r, 2 ] = lis [ j ]		
	)
);

t2 = HP Time();


// nChooseK matrix solution
dt = As Table( NChooseK Matrix( N Items( lis ), 2 ), <<column names( {"x", "y"} ) );
dt << New Column( "first", character, <<set values( lis[dt:x << get values] ) );
dt << New Column( "second", character, <<set values( lis[dt:y << get values] ) );
dt << delete columns( 1 :: 2 );

t3 = HP Time();

Print( Eval Insert( "Create table of pairs from loop: ^(t2-t1)/1e6^ seconds." ) );
Print( Eval Insert( "Create table of pairs from nChooseK matrix: ^(t3-t2)/1e6^ seconds." ) );

brady_brady · Feb 7, 2023 01:26 PM

I should also mention that the examples I used were with strings. If you are using numbers, you can just use straight matrix ops, which are pretty fast... the 2 million row table is generated in 0.3 seconds, and the 26choose5 table in 0.04 seconds.

Names Default To Here(1);

nums = J(2e3, 1, randominteger(1e9));

t1 = hptime();

astable(shape(nums[nChoosekMatrix( 2e3, 2 )], ., 2));

t2 = hptime();

Print( Eval Insert( "Create table of 2 numbers chosen from 2000: ^(t2-t1)/1e6^ seconds." ) );

// 26 choose 5
nums = J(26, 1, randominteger(1e9));

t3 = hptime();

astable(shape(nums[nChoosekMatrix( 26, 5 )], ., 5));

t4 = hptime();

Print( Eval Insert( "Create table of 5 numbers chosen from 26: ^(t4-t3)/1e6^ seconds." ) );

Discussions

Combinations of a list put into columns

Re: Combinations of a list put into columns

Re: Combinations of a list put into columns

Re: Combinations of a list put into columns

Re: Combinations of a list put into columns

Re: Combinations of a list put into columns

Re: Combinations of a list put into columns

Re: Combinations of a list put into columns

Re: Combinations of a list put into columns

Re: Combinations of a list put into columns

Recommended Articles