Discussions

ENTHU · Jun 3, 2021 01:35 PM

I need to summarize a complex data set.Data set contains a bunch of items that processed in last 2 weeks and how they performed.In order to gauge performance we have rating for different categories.Categories are from XZ001 to XZ100.I also have sub categories for each main category.Eg - XZ008 has YZ801 to YZ809 sub categories.

Data set is dynamic.meaning 1 week's data set might have YZ 899 but other week might not.

I have certain categories and subcategories classified into bucket1 and bucket2.All other categories would be part of bucket3.

Below is the script I attempted to write but doesnt seem to work.

Any help appreciated.

cdt = open("C:\Documents\Data\out.csv");

cdt=current data table();
sum_lst2 = {};
sum_lst3 = {};
sum_lst4 = {};
col_lst2 = cdt << get column names( Numeric );
For( i = 1, i <= N Items( col_lst2 ), i++,
colname = Column( col_lst2[i] ) << get name;
If( Contains( {"XZ008","XZ010","XZ015","XZ053","YZ1374","YZ1375","YZ9429","YZ9792","YZ9908", "YZ1930","YZ1933","YZ2744","YZ4621","YZ4625","YZ4630","YZ4640","YZ4681","YZ4830"}, colname ) |(colname >= "YZ901" & colname <= "YZ916") |(colname >= "YZ954" & colname <= "YZ965") |(colname >= "YZ1360" & colname <= "YZ1371") |(colname >= "YZ2581" & colname <= "YZ2586") | (colname >= "YZ2900" & colname <= "YZ2950") | (colname >= "YZ3400" & colname <= "YZ3450"),
Insert Into( sum_lst2, col_lst2[i] )
); //Bucket1

If( Contains( {"XZ098","XZ090"}, colname ) | (colname >= "YZ9700" & colname <= "YZ9791") | (colname >= "YZ9793" & colname <= "YZ9799"),
Insert Into( sum_lst4, col_lst2[i] )
);
);//Bucket2


If(!Contains( {"XZ008","XZ010","XZ015","XZ053","XZ098","XZ090","YZ1374","YZ1375","YZ9429","YZ9792","YZ9908", "YZ1930","YZ1933","YZ2744","YZ4621","YZ4625","YZ4630","YZ4640","YZ4681","YZ4830"}, colname ) |(colname >= "YZ901" & colname <= "YZ916") |(colname >= "YZ954" & colname <= "YZ965") |(colname >= "YZ1360" & colname <= "YZ1371") |(colname >= "YZ2581" & colname <= "YZ2586") | (colname >= "YZ2900" & colname <= "YZ2950") | (colname >= "YZ3400" & colname <= "YZ3450")| (colname >= "YZ9700" & colname <= "YZ9791") | (colname >= "YZ9793" & colname <= "YZ9799"),
Insert Into( sum_lst3, col_lst2[i] )
); //Bucket3

 

Eval( Eval Expr( cdt << New Column( "Bucket1", formula( Sum( Expr( sum_lst2 ) ) ) ) ) );
Eval( Eval Expr( cdt << New Column( "Bucket3", formula( Sum( Expr( sum_lst3 ) ) ) ) ) );
Eval( Eval Expr( cdt << New Column( "Bucket2", formula( Sum( Expr( sum_lst4 ) ) ) ) ) );

Sample data set:

Item	YZ0100	YZ0801	YZ0805	YZ0806	YZ0807	YZ0813	YZ0814	YZ0815	YZ0817	YZ0871	YZ0874	YZ0881	YZ1367	YZ1374	YZ1541	YZ1571	YZ1573	YZ1575	YZ1577	YZ1583	YZ1585	YZ1590	YZ1591	YZ1594	YZ1933	YZ2701	YZ2708	YZ2737	YZ2740	YZ2744	YZ2745	YZ2749	YZ2750	YZ3404	YZ3407	YZ3415	YZ3416	YZ3417	YZ3418	YZ3422	YZ3423	YZ3492	YZ3493	YZ3502	YZ3534	YZ4621	YZ4625	YZ4821	YZ9031	YZ9429	YZ9901	YZ9908	XZ001	XZ008	XZ013	XZ015	XZ019	XZ027	XZ034	XZ035	XZ046	XZ048	XZ090	XZ094	XZ099
AAA666	455						2																		1				1	5		1			1				1						5	3	1					22	455	2			1	7	2	5	4				22
AAA755	118																																																			6	118												6
AAA757	324	1					1																		7	1				33		3					1		3	1			1		3		1	1		1		8	324	2			7	37	6	3	1	1		1	8
BBA696	480									1		3													1							1			1										5		3	1				100	480	4			1	1	1	5	3	1			100
BBA786	155																										1												1							1						1	155					1	1		1				1
BBA787	197									1					1		1								1							2													7		1	1				59	197	1		2	1	2		7	1	1			59
CCA613	233											1																				1						1	1		1	1			2		1					12	233	1				1	4	2	1				12
CCA614	551				1																				1		2	1				7							2				1		4		4			1		69	551	1			1	10	3	4	4			1	69
CCA642	382					1					1	2												2								2					2		1		4			1	4	1	4					91	382	4		2		2	7	5	5				91
CCA642R	57																																										1									3	57						1						3
CCB613	165											1								1								1			1	2			2				1	1	1		1		1	1						52	165	1		1		4	6	1	1				52
DDA624	113	6		4				7	2																							1		1																		16	113	19				1	1						16
DDA631	561												1	1		5	6	3				3			2		1	1	1			3				1	1	1	1	1	1				1		3			1		30	561		2	17	2	6	6	1	3			1	30
DDA634	569															3	1		1		1	1	1		1							3				1	3		1	7	1	1	2		3		5		1		1	8	569			8	1	3	16	3	5		1		9
EEA624	228		1	1				1																								3							1	2	2	1								15	2	54	228	3				3	6					15	56
EEA627	495															12	15	7				3					1					1	2	2	1	1		1	3	5	2	1			7		3		1	2		28	495			37		4	16	7	3		1	2	28

txnelson · Jun 4, 2021 07:01 AM

The only error I found in your code, is that your closing ")" for your

For( i = 1, i <= N Items( col_lst2 ), i++,

was placed after Bucket 2, instead of after Bucket 3.

I made a couple of additional minor modifications, but the code works as far as I can tell

Names Default To Here( 1 );
cdt = Open( "C:\Documents\Data\out.csv" );

Current Data Table( cdt );

sum_lst2 = {};
sum_lst3 = {};
sum_lst4 = {};
col_lst2 = cdt << get column names( Numeric );
For( i = 1, i <= N Items( col_lst2 ), i++,
	colname = Column( col_lst2[i] ) << get name;
	If(
		Contains(
			{"XZ008", "XZ010", "XZ015", "XZ053", "YZ1374", "YZ1375", "YZ9429", "YZ9792", "YZ9908", "YZ1930", "YZ1933", "YZ2744",
			"YZ4621", "YZ4625", "YZ4630", "YZ4640", "YZ4681", "YZ4830"},
			colname
		) | (colname >= "YZ901" & colname <= "YZ916") | (colname >= "YZ954" & colname <= "YZ965") | (colname >= "YZ1360" &
		colname <= "YZ1371") | (colname >= "YZ2581" & colname <= "YZ2586") | (colname >= "YZ2900" & colname <= "YZ2950") | (
		colname >= "YZ3400" & colname <= "YZ3450"),
		Insert Into( sum_lst2, col_lst2[i] )
	); //Bucket1

	If(
		Contains( {"XZ098", "XZ090"}, colname ) | (colname >= "YZ9700" & colname <= "YZ9791") | (colname >= "YZ9793" & colname
		 <= "YZ9799"),
		Insert Into( sum_lst4, col_lst2[i] )
	);//Bucket2

	If(
		!Contains(
			{"XZ008", "XZ010", "XZ015", "XZ053", "XZ098", "XZ090", "YZ1374", "YZ1375", "YZ9429", "YZ9792", "YZ9908", "YZ1930",
			"YZ1933", "YZ2744", "YZ4621", "YZ4625", "YZ4630", "YZ4640", "YZ4681", "YZ4830"},
			colname
		) | (colname >= "YZ901" & colname <= "YZ916") | (colname >= "YZ954" & colname <= "YZ965") | (colname >= "YZ1360" &
		colname <= "YZ1371") | (colname >= "YZ2581" & colname <= "YZ2586") | (colname >= "YZ2900" & colname <= "YZ2950") | (
		colname >= "YZ3400" & colname <= "YZ3450") | (colname >= "YZ9700" & colname <= "YZ9791") | (colname >= "YZ9793" &
		colname <= "YZ9799"),
		Insert Into( sum_lst3, col_lst2[i] )
	); //Bucket3
);
 

Eval( Eval Expr( cdt << New Column( "Bucket1", formula( Sum( Expr( sum_lst2 ) ) ) ) ) );
Eval( Eval Expr( cdt << New Column( "Bucket3", formula( Sum( Expr( sum_lst3 ) ) ) ) ) );
Eval( Eval Expr( cdt << New Column( "Bucket2", formula( Sum( Expr( sum_lst4 ) ) ) ) ) );

Jim

View solution in original post

txnelson · Jun 4, 2021 07:01 AM

The only error I found in your code, is that your closing ")" for your

For( i = 1, i <= N Items( col_lst2 ), i++,

was placed after Bucket 2, instead of after Bucket 3.

I made a couple of additional minor modifications, but the code works as far as I can tell

Names Default To Here( 1 );
cdt = Open( "C:\Documents\Data\out.csv" );

Current Data Table( cdt );

sum_lst2 = {};
sum_lst3 = {};
sum_lst4 = {};
col_lst2 = cdt << get column names( Numeric );
For( i = 1, i <= N Items( col_lst2 ), i++,
	colname = Column( col_lst2[i] ) << get name;
	If(
		Contains(
			{"XZ008", "XZ010", "XZ015", "XZ053", "YZ1374", "YZ1375", "YZ9429", "YZ9792", "YZ9908", "YZ1930", "YZ1933", "YZ2744",
			"YZ4621", "YZ4625", "YZ4630", "YZ4640", "YZ4681", "YZ4830"},
			colname
		) | (colname >= "YZ901" & colname <= "YZ916") | (colname >= "YZ954" & colname <= "YZ965") | (colname >= "YZ1360" &
		colname <= "YZ1371") | (colname >= "YZ2581" & colname <= "YZ2586") | (colname >= "YZ2900" & colname <= "YZ2950") | (
		colname >= "YZ3400" & colname <= "YZ3450"),
		Insert Into( sum_lst2, col_lst2[i] )
	); //Bucket1

	If(
		Contains( {"XZ098", "XZ090"}, colname ) | (colname >= "YZ9700" & colname <= "YZ9791") | (colname >= "YZ9793" & colname
		 <= "YZ9799"),
		Insert Into( sum_lst4, col_lst2[i] )
	);//Bucket2

	If(
		!Contains(
			{"XZ008", "XZ010", "XZ015", "XZ053", "XZ098", "XZ090", "YZ1374", "YZ1375", "YZ9429", "YZ9792", "YZ9908", "YZ1930",
			"YZ1933", "YZ2744", "YZ4621", "YZ4625", "YZ4630", "YZ4640", "YZ4681", "YZ4830"},
			colname
		) | (colname >= "YZ901" & colname <= "YZ916") | (colname >= "YZ954" & colname <= "YZ965") | (colname >= "YZ1360" &
		colname <= "YZ1371") | (colname >= "YZ2581" & colname <= "YZ2586") | (colname >= "YZ2900" & colname <= "YZ2950") | (
		colname >= "YZ3400" & colname <= "YZ3450") | (colname >= "YZ9700" & colname <= "YZ9791") | (colname >= "YZ9793" &
		colname <= "YZ9799"),
		Insert Into( sum_lst3, col_lst2[i] )
	); //Bucket3
);
 

Eval( Eval Expr( cdt << New Column( "Bucket1", formula( Sum( Expr( sum_lst2 ) ) ) ) ) );
Eval( Eval Expr( cdt << New Column( "Bucket3", formula( Sum( Expr( sum_lst3 ) ) ) ) ) );
Eval( Eval Expr( cdt << New Column( "Bucket2", formula( Sum( Expr( sum_lst4 ) ) ) ) ) );

Jim

ENTHU · Jun 4, 2021 01:00 PM

After changing the barcket position my code is working,thank you.But the numbers in bucket3 are wrong.That is because for bucket3 values in item column are also getting added.How do I include only the columns with col name starting as XZ and YZ in bucket 3?

txnelson · Jun 4, 2021 02:45 PM

I don't understand the issue. The following columns are the columns that are being added up for Bucket 3.

{YZ0100, YZ0801, YZ0805, YZ0806, YZ0807, YZ0813, YZ0814, YZ0815, YZ0817, YZ0871,
YZ0874, YZ0881, YZ1367, YZ1541, YZ1571, YZ1573, YZ1575, YZ1577, YZ1583, YZ1585,
YZ1590, YZ1591, YZ1594, YZ2701, YZ2708, YZ2737, YZ2740, YZ2745, YZ2749, YZ2750,
YZ3404, YZ3407, YZ3415, YZ3416, YZ3417, YZ3418, YZ3422, YZ3423, YZ3492, YZ3493,
YZ3502, YZ3534, YZ4821, YZ9031, YZ9901, XZ001, XZ013, XZ019, XZ027, XZ034, XZ035,
XZ046, XZ048, XZ094, XZ099}

Which of those column should not be there?

Jim

ENTHU · Jun 4, 2021 02:52 PM

I have a !contains statement for bucket3.none of these columns need to be added.All columns starting from "XZ" and "YZ" excluding the ones added in bucket1 and 2 need to be added for bucket3.

txnelson · Jun 4, 2021 04:12 PM

Ah, given that logic, which I missed, I think how it needs to be stated, is:

If(
		Contains(
			{"XZ008", "XZ010", "XZ015", "XZ053", "XZ098", "XZ090", "YZ1374", "YZ1375", "YZ9429", "YZ9792", "YZ9908", "YZ1930", "YZ1933", "YZ2744",
			"YZ4621", "YZ4625", "YZ4630", "YZ4640", "YZ4681", "YZ4830"},
			colname
		) == 0,
		If(
			(colname >= "YZ901" & colname <= "YZ916") | (colname >= "YZ954" & colname <= "YZ965") | (colname >= "YZ1360" & colname <= "YZ1371") | (colname
			 >= "YZ2581" & colname <= "YZ2586") | (colname >= "YZ2900" & colname <= "YZ2950") | (colname >= "YZ3400" & colname <= "YZ3450") | (colname
			 >= "YZ9700" & colname <= "YZ9791") | (colname >= "YZ9793" & colname <= "YZ9799"),
			Insert Into( sum_lst3, col_lst2[i] )
		)
	); //Bucket3

Jim

brady_brady · Jun 4, 2021 2:35 PM

HI,

Watching this thread, I'm thinking that this script could be challenging to maintain / proofread later. I'll post an alternate approach in case anyone finds it useful... hopefully someone will.

1) Stack the original table, so it looks like the one below. I call this "MainTable"

2) Store the conditions in a data table just for this purpose. In your case, anything that is not in buckets 1 or 2 is in bucket 3, so those conditions need not be listed. I've only done this for a couple of cases, just for illustration. I call this "ConditionTable". Note that the dt1 column name :Label is hardcoded here... you can work around this of course, if needed or desired.

3) Now, the script becomes a lot shorter and easier to follow, assuming familiarity with expression handling, which it appears from your code that you possess. The sums can be easily obtained by using a summary table, as we now have a "bucket" column for grouping. Overall, the project should be easier to maintain, as the condition table is much easier to read than code. (Be sure to use the == operator in the condition table.)

I've attached sample tables in case you want to try this out.

Cheers,

Brady

Names Default To Here( 1 );

dt1 = Data Table( "MainTable" );

dtC = Data Table( "ConditionTable" );

//set all buckets to 3, and change as needed
dt1 << New Column( "Bucket", <<set each value( 3 ) );

//this will assign buckets 1 and 2 to appropriate rows in dt1 based on conditions 
//in the condition table
condExpr = Expr(
	dt1:Bucket[dt1 << get rows where(  _COND_  )] =  _BUCKET_
);

//loop through to apply the conditions
For( i = 1, i <= N Row( dtC ), i++,
	Eval(
		Substitute( Name Expr( condExpr ),
			Expr( _COND_ ), Parse( dtC:Condition[i] ),
			Expr( _BUCKET_ ), dtc:Bucket[i]
		)
	)
);

//produce a table of results
dtSum = dt1 << summary ( sum(:Data), group(:Item, :Bucket));

ENTHU · Jun 7, 2021 12:17 PM

Thanks this is a very simple and robust approach.Only caveat is stacking the main table.The column names beginning with YZ* is dynamic.How do I stack the table without knowing names of all the columns of interest and their data type?

ENTHU · Jun 7, 2021 02:12 PM

Okay so I was able to get one step further and select the columns starting with "YZ" using the script below.
My original table contains - item,total tested qty,week,FB*,IB* columns.
I would need total tested qty,week and FB* for my analysis.How do I use the stack function to prep the table so that Brady's approach can be used.

dt = Current Data Table();

Col_List = dt << Get Column Names( "String" );

For( i = 1, i <= N Items( Col_List ), i++,
    If( Contains( Col_List[i], "YZ" ),
        Column( Col_List[i] ) << Set Selected( 1 )
    )
);

Discussions

summary analysis for dynamic data

Re: summary analysis for dynamic data

Re: summary analysis for dynamic data

Re: summary analysis for dynamic data

Re: summary analysis for dynamic data

Re: summary analysis for dynamic data

Re: summary analysis for dynamic data

Re: summary analysis for dynamic data

Re: summary analysis for dynamic data

Re: summary analysis for dynamic data

Recommended Articles