cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
ENTHU
Level IV

summary analysis for dynamic data

I need to summarize a complex data set.Data set contains a bunch of items that processed in last 2 weeks and how they performed.In order to gauge performance we have rating for different categories.Categories are from XZ001 to XZ100.I also have sub categories for each main category.Eg - XZ008 has YZ801 to YZ809 sub categories.

Data set is dynamic.meaning 1 week's data set might have YZ 899 but other week might not.

 

I have certain categories and subcategories classified into bucket1 and bucket2.All other categories would be part of bucket3.

Below is the script I attempted to write but doesnt seem to work.

 

 

Any help appreciated.

cdt = open("C:\Documents\Data\out.csv");

cdt=current data table();
sum_lst2 = {};
sum_lst3 = {};
sum_lst4 = {};
col_lst2 = cdt << get column names( Numeric );
For( i = 1, i <= N Items( col_lst2 ), i++,
colname = Column( col_lst2[i] ) << get name;
If( Contains( {"XZ008","XZ010","XZ015","XZ053","YZ1374","YZ1375","YZ9429","YZ9792","YZ9908", "YZ1930","YZ1933","YZ2744","YZ4621","YZ4625","YZ4630","YZ4640","YZ4681","YZ4830"}, colname ) |(colname >= "YZ901" & colname <= "YZ916") |(colname >= "YZ954" & colname <= "YZ965") |(colname >= "YZ1360" & colname <= "YZ1371") |(colname >= "YZ2581" & colname <= "YZ2586") | (colname >= "YZ2900" & colname <= "YZ2950") | (colname >= "YZ3400" & colname <= "YZ3450"),
Insert Into( sum_lst2, col_lst2[i] )
); //Bucket1

If( Contains( {"XZ098","XZ090"}, colname ) | (colname >= "YZ9700" & colname <= "YZ9791") | (colname >= "YZ9793" & colname <= "YZ9799"),
Insert Into( sum_lst4, col_lst2[i] )
);
);//Bucket2


If(!Contains( {"XZ008","XZ010","XZ015","XZ053","XZ098","XZ090","YZ1374","YZ1375","YZ9429","YZ9792","YZ9908", "YZ1930","YZ1933","YZ2744","YZ4621","YZ4625","YZ4630","YZ4640","YZ4681","YZ4830"}, colname ) |(colname >= "YZ901" & colname <= "YZ916") |(colname >= "YZ954" & colname <= "YZ965") |(colname >= "YZ1360" & colname <= "YZ1371") |(colname >= "YZ2581" & colname <= "YZ2586") | (colname >= "YZ2900" & colname <= "YZ2950") | (colname >= "YZ3400" & colname <= "YZ3450")| (colname >= "YZ9700" & colname <= "YZ9791") | (colname >= "YZ9793" & colname <= "YZ9799"),
Insert Into( sum_lst3, col_lst2[i] )
); //Bucket3

 

Eval( Eval Expr( cdt << New Column( "Bucket1", formula( Sum( Expr( sum_lst2 ) ) ) ) ) );
Eval( Eval Expr( cdt << New Column( "Bucket3", formula( Sum( Expr( sum_lst3 ) ) ) ) ) );
Eval( Eval Expr( cdt << New Column( "Bucket2", formula( Sum( Expr( sum_lst4 ) ) ) ) ) );

 

Sample data set:

 

Item YZ0100 YZ0801 YZ0805 YZ0806 YZ0807 YZ0813 YZ0814 YZ0815 YZ0817 YZ0871 YZ0874 YZ0881 YZ1367 YZ1374 YZ1541 YZ1571 YZ1573 YZ1575 YZ1577 YZ1583 YZ1585 YZ1590 YZ1591 YZ1594 YZ1933 YZ2701 YZ2708 YZ2737 YZ2740 YZ2744 YZ2745 YZ2749 YZ2750 YZ3404 YZ3407 YZ3415 YZ3416 YZ3417 YZ3418 YZ3422 YZ3423 YZ3492 YZ3493 YZ3502 YZ3534 YZ4621 YZ4625 YZ4821 YZ9031 YZ9429 YZ9901 YZ9908 XZ001 XZ008 XZ013 XZ015 XZ019 XZ027 XZ034 XZ035 XZ046 XZ048 XZ090 XZ094 XZ099
AAA666 455           2                                   1       1 5   1     1       1           5 3 1         22 455 2     1 7 2 5 4       22
AAA755 118                                                                                                     6 118                       6
AAA757 324 1         1                                   7 1       33   3         1   3 1     1   3   1 1   1   8 324 2     7 37 6 3 1 1   1 8
BBA696 480                 1   3                         1             1     1                   5   3 1       100 480 4     1 1 1 5 3 1     100
BBA786 155                                                   1                       1             1           1 155         1 1   1       1
BBA787 197                 1         1   1               1             2                         7   1 1       59 197 1   2 1 2   7 1 1     59
CCA613 233                     1                                       1           1 1   1 1     2   1         12 233 1       1 4 2 1       12
CCA614 551       1                                       1   2 1       7             2       1   4   4     1   69 551 1     1 10 3 4 4     1 69
CCA642 382         1         1 2                       2               2         2   1   4     1 4 1 4         91 382 4   2   2 7 5 5       91
CCA642R 57                                                                                   1                 3 57           1           3
CCB613 165                     1               1               1     1 2     2       1 1 1   1   1 1           52 165 1   1   4 6 1 1       52
DDA624 113 6   4       7 2                                             1   1                                   16 113 19       1 1           16
DDA631 561                       1 1   5 6 3       3     2   1 1 1     3       1 1 1 1 1 1       1   3     1   30 561   2 17 2 6 6 1 3     1 30
DDA634 569                             3 1   1   1 1 1   1             3       1 3   1 7 1 1 2   3   5   1   1 8 569     8 1 3 16 3 5   1   9
EEA624 228   1 1       1                                               3             1 2 2 1               15 2 54 228 3       3 6         15 56
EEA627 495                             12 15 7       3         1         1 2 2 1 1   1 3 5 2 1     7   3   1 2   28 495     37   4 16 7 3   1 2 28







1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: summary analysis for dynamic data

The only error I found in your code, is that your closing ")" for your 

For( i = 1, i <= N Items( col_lst2 ), i++,

was placed after Bucket 2, instead of after Bucket 3.

 

I made a couple of additional minor modifications, but the code works as far as I can tell

Names Default To Here( 1 );
cdt = Open( "C:\Documents\Data\out.csv" );

Current Data Table( cdt );

sum_lst2 = {};
sum_lst3 = {};
sum_lst4 = {};
col_lst2 = cdt << get column names( Numeric );
For( i = 1, i <= N Items( col_lst2 ), i++,
	colname = Column( col_lst2[i] ) << get name;
	If(
		Contains(
			{"XZ008", "XZ010", "XZ015", "XZ053", "YZ1374", "YZ1375", "YZ9429", "YZ9792", "YZ9908", "YZ1930", "YZ1933", "YZ2744",
			"YZ4621", "YZ4625", "YZ4630", "YZ4640", "YZ4681", "YZ4830"},
			colname
		) | (colname >= "YZ901" & colname <= "YZ916") | (colname >= "YZ954" & colname <= "YZ965") | (colname >= "YZ1360" &
		colname <= "YZ1371") | (colname >= "YZ2581" & colname <= "YZ2586") | (colname >= "YZ2900" & colname <= "YZ2950") | (
		colname >= "YZ3400" & colname <= "YZ3450"),
		Insert Into( sum_lst2, col_lst2[i] )
	); //Bucket1

	If(
		Contains( {"XZ098", "XZ090"}, colname ) | (colname >= "YZ9700" & colname <= "YZ9791") | (colname >= "YZ9793" & colname
		 <= "YZ9799"),
		Insert Into( sum_lst4, col_lst2[i] )
	);//Bucket2

	If(
		!Contains(
			{"XZ008", "XZ010", "XZ015", "XZ053", "XZ098", "XZ090", "YZ1374", "YZ1375", "YZ9429", "YZ9792", "YZ9908", "YZ1930",
			"YZ1933", "YZ2744", "YZ4621", "YZ4625", "YZ4630", "YZ4640", "YZ4681", "YZ4830"},
			colname
		) | (colname >= "YZ901" & colname <= "YZ916") | (colname >= "YZ954" & colname <= "YZ965") | (colname >= "YZ1360" &
		colname <= "YZ1371") | (colname >= "YZ2581" & colname <= "YZ2586") | (colname >= "YZ2900" & colname <= "YZ2950") | (
		colname >= "YZ3400" & colname <= "YZ3450") | (colname >= "YZ9700" & colname <= "YZ9791") | (colname >= "YZ9793" &
		colname <= "YZ9799"),
		Insert Into( sum_lst3, col_lst2[i] )
	); //Bucket3
);
 

Eval( Eval Expr( cdt << New Column( "Bucket1", formula( Sum( Expr( sum_lst2 ) ) ) ) ) );
Eval( Eval Expr( cdt << New Column( "Bucket3", formula( Sum( Expr( sum_lst3 ) ) ) ) ) );
Eval( Eval Expr( cdt << New Column( "Bucket2", formula( Sum( Expr( sum_lst4 ) ) ) ) ) );
Jim

View solution in original post

8 REPLIES 8
txnelson
Super User

Re: summary analysis for dynamic data

The only error I found in your code, is that your closing ")" for your 

For( i = 1, i <= N Items( col_lst2 ), i++,

was placed after Bucket 2, instead of after Bucket 3.

 

I made a couple of additional minor modifications, but the code works as far as I can tell

Names Default To Here( 1 );
cdt = Open( "C:\Documents\Data\out.csv" );

Current Data Table( cdt );

sum_lst2 = {};
sum_lst3 = {};
sum_lst4 = {};
col_lst2 = cdt << get column names( Numeric );
For( i = 1, i <= N Items( col_lst2 ), i++,
	colname = Column( col_lst2[i] ) << get name;
	If(
		Contains(
			{"XZ008", "XZ010", "XZ015", "XZ053", "YZ1374", "YZ1375", "YZ9429", "YZ9792", "YZ9908", "YZ1930", "YZ1933", "YZ2744",
			"YZ4621", "YZ4625", "YZ4630", "YZ4640", "YZ4681", "YZ4830"},
			colname
		) | (colname >= "YZ901" & colname <= "YZ916") | (colname >= "YZ954" & colname <= "YZ965") | (colname >= "YZ1360" &
		colname <= "YZ1371") | (colname >= "YZ2581" & colname <= "YZ2586") | (colname >= "YZ2900" & colname <= "YZ2950") | (
		colname >= "YZ3400" & colname <= "YZ3450"),
		Insert Into( sum_lst2, col_lst2[i] )
	); //Bucket1

	If(
		Contains( {"XZ098", "XZ090"}, colname ) | (colname >= "YZ9700" & colname <= "YZ9791") | (colname >= "YZ9793" & colname
		 <= "YZ9799"),
		Insert Into( sum_lst4, col_lst2[i] )
	);//Bucket2

	If(
		!Contains(
			{"XZ008", "XZ010", "XZ015", "XZ053", "XZ098", "XZ090", "YZ1374", "YZ1375", "YZ9429", "YZ9792", "YZ9908", "YZ1930",
			"YZ1933", "YZ2744", "YZ4621", "YZ4625", "YZ4630", "YZ4640", "YZ4681", "YZ4830"},
			colname
		) | (colname >= "YZ901" & colname <= "YZ916") | (colname >= "YZ954" & colname <= "YZ965") | (colname >= "YZ1360" &
		colname <= "YZ1371") | (colname >= "YZ2581" & colname <= "YZ2586") | (colname >= "YZ2900" & colname <= "YZ2950") | (
		colname >= "YZ3400" & colname <= "YZ3450") | (colname >= "YZ9700" & colname <= "YZ9791") | (colname >= "YZ9793" &
		colname <= "YZ9799"),
		Insert Into( sum_lst3, col_lst2[i] )
	); //Bucket3
);
 

Eval( Eval Expr( cdt << New Column( "Bucket1", formula( Sum( Expr( sum_lst2 ) ) ) ) ) );
Eval( Eval Expr( cdt << New Column( "Bucket3", formula( Sum( Expr( sum_lst3 ) ) ) ) ) );
Eval( Eval Expr( cdt << New Column( "Bucket2", formula( Sum( Expr( sum_lst4 ) ) ) ) ) );
Jim
ENTHU
Level IV

Re: summary analysis for dynamic data

After changing the barcket position my code is working,thank you.But the numbers in bucket3 are wrong.That is because for bucket3 values in item column are also getting added.How do I include only the columns with col name starting as XZ and YZ in bucket 3?
txnelson
Super User

Re: summary analysis for dynamic data

I don't understand the issue.  The following columns are the columns that are being added up for Bucket 3. 

{YZ0100, YZ0801, YZ0805, YZ0806, YZ0807, YZ0813, YZ0814, YZ0815, YZ0817, YZ0871,
YZ0874, YZ0881, YZ1367, YZ1541, YZ1571, YZ1573, YZ1575, YZ1577, YZ1583, YZ1585,
YZ1590, YZ1591, YZ1594, YZ2701, YZ2708, YZ2737, YZ2740, YZ2745, YZ2749, YZ2750,
YZ3404, YZ3407, YZ3415, YZ3416, YZ3417, YZ3418, YZ3422, YZ3423, YZ3492, YZ3493,
YZ3502, YZ3534, YZ4821, YZ9031, YZ9901, XZ001, XZ013, XZ019, XZ027, XZ034, XZ035,
XZ046, XZ048, XZ094, XZ099}

Which of those column should not be there?

Jim
ENTHU
Level IV

Re: summary analysis for dynamic data

I have a !contains statement for bucket3.none of these columns need to be added.All columns starting from "XZ" and "YZ" excluding the ones added in bucket1 and 2 need to be added for bucket3.
txnelson
Super User

Re: summary analysis for dynamic data

Ah, given that logic, which I missed, I think how it needs to be stated, is:

If(
		Contains(
			{"XZ008", "XZ010", "XZ015", "XZ053", "XZ098", "XZ090", "YZ1374", "YZ1375", "YZ9429", "YZ9792", "YZ9908", "YZ1930", "YZ1933", "YZ2744",
			"YZ4621", "YZ4625", "YZ4630", "YZ4640", "YZ4681", "YZ4830"},
			colname
		) == 0,
		If(
			(colname >= "YZ901" & colname <= "YZ916") | (colname >= "YZ954" & colname <= "YZ965") | (colname >= "YZ1360" & colname <= "YZ1371") | (colname
			 >= "YZ2581" & colname <= "YZ2586") | (colname >= "YZ2900" & colname <= "YZ2950") | (colname >= "YZ3400" & colname <= "YZ3450") | (colname
			 >= "YZ9700" & colname <= "YZ9791") | (colname >= "YZ9793" & colname <= "YZ9799"),
			Insert Into( sum_lst3, col_lst2[i] )
		)
	); //Bucket3
Jim

Re: summary analysis for dynamic data

HI,

 

Watching this thread, I'm thinking that this script could be challenging to maintain / proofread later. I'll post an alternate approach in case anyone finds it useful... hopefully someone will.

 

1) Stack the original table, so it looks like the one below. I call this "MainTable"

brady_brady_0-1622841386650.png

 

2) Store the conditions in a data table just for this purpose. In your case, anything that is not in buckets 1 or 2 is in bucket 3, so those conditions need not be listed. I've only done this for a couple of cases, just for illustration. I call this "ConditionTable". Note that the dt1 column name :Label is hardcoded here... you can work around this of course, if needed or desired.

brady_brady_1-1622841462499.png

 

3) Now, the script becomes a lot shorter and easier to follow, assuming familiarity with expression handling, which it appears from your code that you possess. The sums can be easily obtained by using a summary table, as we now have a "bucket" column for grouping. Overall, the project should be easier to maintain, as the condition table is much easier to read than code. (Be sure to use the == operator in the condition table.)

 

I've attached sample tables in case you want to try this out.

 

Cheers,

Brady

 

Names Default To Here( 1 );

dt1 = Data Table( "MainTable" );

dtC = Data Table( "ConditionTable" );

//set all buckets to 3, and change as needed
dt1 << New Column( "Bucket", <<set each value( 3 ) );

//this will assign buckets 1 and 2 to appropriate rows in dt1 based on conditions 
//in the condition table
condExpr = Expr(
	dt1:Bucket[dt1 << get rows where(  _COND_  )] =  _BUCKET_
);

//loop through to apply the conditions
For( i = 1, i <= N Row( dtC ), i++,
	Eval(
		Substitute( Name Expr( condExpr ),
			Expr( _COND_ ), Parse( dtC:Condition[i] ),
			Expr( _BUCKET_ ), dtc:Bucket[i]
		)
	)
);

//produce a table of results
dtSum = dt1 << summary ( sum(:Data), group(:Item, :Bucket));

 

ENTHU
Level IV

Re: summary analysis for dynamic data

Thanks this is a very simple and robust approach.Only caveat is stacking the main table.The column names beginning with YZ* is dynamic.How do I stack the table without knowing names of all the columns of interest and their data type?

ENTHU
Level IV

Re: summary analysis for dynamic data

Okay so I was able to get one step further and select the columns starting with "YZ" using the script below.
My original table contains - item,total tested qty,week,FB*,IB* columns.
I would need total tested qty,week and FB* for my analysis.How do I use the stack function to prep the table so that Brady's approach can be used.

dt = Current Data Table();

Col_List = dt << Get Column Names( "String" );

For( i = 1, i <= N Items( Col_List ), i++,
    If( Contains( Col_List[i], "YZ" ),
        Column( Col_List[i] ) << Set Selected( 1 )
    )
);