cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
david_sam
Level II

rename column names in a jmp file as per values in rows in a separate .csv file

Hello, 

I am newbie to JSL, so appreciate help.
I have three files.

file #1: It is the Reference file, [as ref.csv] holding 3 cols as below. 

 

Category, width, label
--------------------------------
A,0,L1.abc
A,1,L2.xt
A,2,L8.dr
A,3,L20.ws
A,4,L72.oz
B,0,L11.qq
....
B,6,L61.er

 

file #2: data file#1 in JMP, holding Category, Data and many other cols.
file #3: data file #2 in JMP, holding Category, Data and many other cols.
difference between the two data files [File #2 and file #3] is the category type and width of value in data col.

Data file #1:

Category|data|many cols.....
----------------------------------------
A 123.23 ....
A 321.45 .... 
.....
many rows

 

the width of values in the column "data" [excluding the dots] is matching to the # of rows in file #1 for a given category
example:
width of 123.23 in row 1 in data file#1 is 5.
the # of rows in reference file for category A would be 5. this need not be tested per say. this is how the tables are coming in [ie when pulled from the source]

In the future, in the reference file , say for category A , the # of rows increase to 6 or more, the width of the value held in the data column in file data file #1 would also increase when information is pulled from database.

Data file #2:

Category|data|many cols.....

----------------------------------------
B 1236.23 ....
B 3213.45 .... 
.....
many rows


Goal 1:create new cols dependent on the width of value held in the "data" col in the datafile #1 [and other data files]
I am done with goal # 1 !


ex:
updated data file#1 would look like this:
Category|data|c1|c2|c3|c4|c5|many cols.....
----------------------------------------
A 123.23 1 2 3 2 3 ....
A 321.45 3 2 1 4 5 .... 
.....


goal 2: rename the new cols in data file #1 [ie c1 to c5] to L1... to L72... as defined in the column "label" in the reference file. no change in the cells values, just update the selected col names.


expected output after goal 2 is done:
Update the data file #1 to look like this:
Category|data|L1.abc|L2.xt|L8.dr|L20.ws|L72.oz|many cols.....
----------------------------------------
A 123.23 1 2 3 2 3 ....
A 321.45 3 2 1 4 5 .... 
.....


I need help with goal #2.

For goal #2 I opened the reference file but not sure to move forward.

 

Thanks
David

1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: rename column names in a jmp file as per values in rows in a separate .csv file

Sorry for my misinformation.  I misread the structure of the Categories data table.  Below is a script that creates a couple of data tables, and a categories table and then expands the data tables and finally renames the columns based upon the entries in the categories data table.  I hope this gets us back on track.  The script will work on as many data tables as necessary.  All that has to be done, is to add to the dataTableList list, all of the data tables to be processed

Names Default To Here( 1 );

// This example script opens up 2 data tables and 1 category table.  
// In reality, as many as necessary data tables could be used

// Create the tables
dt1 = New Table( "Table1",
	Add Rows( 3 ),
	New Column( "Category", Character, "Nominal", Set Values( {"ABC", "ABC", "ABC"} ) ),
	New Column( "DATA", Character( 33 ), "Nominal", Set Values( {"123.56.8.342", "431.02.0.567", "318.00.1.816"} ), Set Display Width( 182 ) )
);
dt2 = New Table( "Table2",
	Add Rows( 3 ),
	New Column( "Category", Character, "Nominal", Set Values( {"B", "B", "B"} ) ),
	New Column( "Data", Character, "Nominal", Set Values( {"13.24", "98.44", "63.89"} ) )
);
dtCat = New Table( "The Categories",
	Add Rows( 13 ),
	New Column( "Category",
		Character,
		"Nominal",
		Set Values( {"ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "B", "B", "B", "B"} ),
		Set Display Width( 67 )
	),
	New Column( "Width", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2, 3] ) ),
	New Column( "Label",
		Character( 16 ),
		"Nominal",
		Set Values(
			{"label0.0", "label0.1", "label0.2", "label0.3", "label0.4", "label0.5", "label0.6", "label0.7", "label0.8", "thelabel.1", "thelabel.2",
			"thelabel.3", "thelabel.4"}
		),
		Set Display Width( 99 )
	)
);

// Below this line, is the script that will add the new columns to each of the data tables
// and will take the names from the dtCat (Categories) data table and change the names to
// the columns in each of the data tables

// Create a list of the pointers to the data tables.  This will allow for the using of the list 
// entries to point to the data tables using a subscript 
dataTableList = {};
insert into(dataTableList, dt1 );
insert into(dataTableList, dt2 );

// Loop across the data table list and expand the columns based upon the the column called Data
For( i = 1, i <= N Items( dataTableList ), i++,
	ncols_to_add = length( substitute((dataTableList[i]:Data)[1],".",""));
	For( k = 1, k <= ncols_to_add, k++, // the || operator joins two strings
		dataTableList[i] << New Column( "c" || Char( i ),
			Numeric,
			Continous,
			<<Set Each Value( Num( Substr( Regex( :data, "\D", "", GLOBALREPLACE ), k, 1 ) ) )
		)
	)
);

// Now rename the columns in each of the data tables based upon the labels in the Category table

For( i = 1, i <= N Items( dataTableList ), i++, 
	// Subset the dtCat table into the a new data table with just the one
	// category being processed
	dtCat << select where( :Category == (dataTableList[i]:Category)[1] );
	dtSub = dtCat << subset( selected rows == 1, selected columns( 0 ) );
	
	// set the offset of where to start the renaming
	offset = 2;
	
	// Loop
	// Change names in dt1 to match the Label in 
	For( k = 1, k <= N Rows( dtSub ), k++,
		Column( dataTableList[i], k + offset ) << set name( dtSub:Label[k] )
	);
	
	Close( dtSub, nosave );
);
Jim

View solution in original post

10 REPLIES 10
txnelson
Super User

Re: rename column names in a jmp file as per values in rows in a separate .csv file

Here is an untested Script, that should be pretty close to being able to meet your second goal

dtCat = Data Table( "file#1 name" );
dt1 = Data Table( "file#2 name" );
dt2 = Data Table( " file#3 name" );
// add dt3, dt4 etc. for each of the categories you have

// Find the number of data tables to process....A B C......
// Create a temporary new column in data table dtCat that has just the category
dtCat << New Column( "temp", character, formula( Word( 1, :data, "." ) ) );
// Summarize the data getting one element in the ByGroup list for each Category found
Summarize( dt, Categories = ByGroup( :temp ) );
numCategories = N Items( Categories );

For( i = 1, i <= numCategories, i++, 
	// Subset the dtCat table into the a new data table with just the one
	// category being processed
	dtCat << select where( :temp == Categories[i] );
	dtSub = dtCat << subset( selected rows == 1, selected columns( 0 ) );
	
	// set the offset of where to start the renaming
	offset = 3;
	
	// Loop
	// Change names in dt1 to match the Label in 
	For( k = 1, k <= N Rows( dtSub ), k++,
		Eval(
			Substitute(
					Expr(
						Column( __dt__, k + offset ) << set name( Word( 3, dtSub:Data[k], "." ) ),
						Expr( __dt__ ),
						Parse( "dt" || Char( i ) )
					)
			)
		)
	);
	
	Close( dtSub, nosave );
);
Jim
david_sam
Level II

Re: rename column names in a jmp file as per values in rows in a separate .csv file

Thanks Jim. It is not working.  

 

So for goal #2,I have broken down the whole logic into values steps.

 

step1: get the category names from data table [say dt1] and store it in a variable [say it is holding category of A]-  DONE

step2: read variable from step1 and find all the rows that match category = A] get label names from the column "Label" from the ref.csv, [which you have called as dtCat], and store it in a list or variable.[ say it called category_list]

so using from the example stated in the discussion. I am expecting category_list to hold below values. 

category_list = L1.abc, L2.xt, L8.dr, L20.ws, L72.oz

 

I dont know how to do this. Appreciate help. 

 

step 3: Check the count in list == no. of new cols created in the dt1. this is extra thing I am doing for robustness.

count in category_list is 5 elements

new cols created in dt1 is 5. [ I got this from goal 1 as I am holding a variable to find how many new cols are needed]. 

 

Step 4: replace the cols names in dt1 , ie c[i] with list [i] using loop where i = 5 

c1 replaced 

I am right now in step 2 and need some help. 

 

Thanks,

David

 

 

txnelson
Super User

Re: rename column names in a jmp file as per values in rows in a separate .csv file

  1. You indicate that my last code isn't working.....why not?
  2. Here is a way to create the list that you are asking for
    Category_List = {};
    For( i=1, i<= N Rows( dtCat ), i++,
         insert into(Category_List, word(3,:data[i],".") || "." || word(4,:data[i], "." )  );
    );
Jim
david_sam
Level II

Re: rename column names in a jmp file as per values in rows in a separate .csv file

I looked at the code you have provided.. There seems to be some confusion.

this is how I intercepted and correct me it is wrong. 

 

dtcat is the reference table with col name [Category, width, label]. 

there multiple data tables dt1,dt2 etc with same structure with col names [Category|data|many cols.....]

 when I run the code, it does nothing

 

there is no column named data in dtcat table and looked into the debug. 

May be you meant dt1 table as it is data column. 

I tried playing around and it would not help. 

 

I think we are align on the definitions on the table , use what I have described in the original post, I might help in my opinion. 

Again appreciate the help here.

 

Let me the recommendation for step 2

 

Thanks

David

 

david_sam
Level II

Re: rename column names in a jmp file as per values in rows in a separate .csv file

Jim, 

 

For step 2 with below modifications [shown below]

 

++++

 

this is my ref.csv :

 

category|width|label

 

ABC 0 label0.0
ABC 1 label0.1
ABC 2 label0.2
ABC 3 label0.3
ABC 4 label0.4
ABC 5 label0.5
ABC 6 label0.6
ABC 7 label0.7
ABC 8 label0.8
ABC 9 label0.9
ABC 10 label0.10
ABC 11 label0.11
ABC 12 label0.12
ABC 13 label0.13
ABC 14 label0.14
ABC 15 label1.15
ABC 16 label0.16
ABC 17 label0.17
ABC 18 label0.18
ABC 19 label0.19
ABC 20 label0.20
B 0 label11.22
C 1 label11.21
D 2 label11.23
E 0 label11.24

My code:

dt_ref = Open ("ref.csv");

For( i=1, i<= N Rows( dt_ref ), i++,
insert into(Category_List, word(1,:label[i],".") || "." || word(2,:label[i], "." ) );
);
show(Category_List);

+++

I am getting the following on the log: 

Category_List = {"label0.0", "label0.1", "label0.2", "label0.3", "label0.4", "label0.5", "label0.6", "label0.7", "label0.8", "label0.9", "label0.10", "label0.11", "label0.12", "label0.13", "label0.14", "label1.15", "label0.16", "label0.17", "label0.18", "label0.19", "label0.20", "label11.22", "label11.21", "label11.23", "label11.24", ".", "."};

 

I want it to select the labels only when category is ABC. how do I do that?

 

I tried the following but it is returned nothing for the Category_List:

 

prod_found ={"ABC"};

 

if (:product == prod_found,
      For( i=1, i<= N Rows( dt_ref ), i++,
      insert into(Category_List, word(1,:name[i],".") || "." || word(2,:name[i], "." ) );
       );
);

 

Thanks,

David

txnelson
Super User

Re: rename column names in a jmp file as per values in rows in a separate .csv file

Sorry for my misinformation.  I misread the structure of the Categories data table.  Below is a script that creates a couple of data tables, and a categories table and then expands the data tables and finally renames the columns based upon the entries in the categories data table.  I hope this gets us back on track.  The script will work on as many data tables as necessary.  All that has to be done, is to add to the dataTableList list, all of the data tables to be processed

Names Default To Here( 1 );

// This example script opens up 2 data tables and 1 category table.  
// In reality, as many as necessary data tables could be used

// Create the tables
dt1 = New Table( "Table1",
	Add Rows( 3 ),
	New Column( "Category", Character, "Nominal", Set Values( {"ABC", "ABC", "ABC"} ) ),
	New Column( "DATA", Character( 33 ), "Nominal", Set Values( {"123.56.8.342", "431.02.0.567", "318.00.1.816"} ), Set Display Width( 182 ) )
);
dt2 = New Table( "Table2",
	Add Rows( 3 ),
	New Column( "Category", Character, "Nominal", Set Values( {"B", "B", "B"} ) ),
	New Column( "Data", Character, "Nominal", Set Values( {"13.24", "98.44", "63.89"} ) )
);
dtCat = New Table( "The Categories",
	Add Rows( 13 ),
	New Column( "Category",
		Character,
		"Nominal",
		Set Values( {"ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "B", "B", "B", "B"} ),
		Set Display Width( 67 )
	),
	New Column( "Width", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0, 1, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2, 3] ) ),
	New Column( "Label",
		Character( 16 ),
		"Nominal",
		Set Values(
			{"label0.0", "label0.1", "label0.2", "label0.3", "label0.4", "label0.5", "label0.6", "label0.7", "label0.8", "thelabel.1", "thelabel.2",
			"thelabel.3", "thelabel.4"}
		),
		Set Display Width( 99 )
	)
);

// Below this line, is the script that will add the new columns to each of the data tables
// and will take the names from the dtCat (Categories) data table and change the names to
// the columns in each of the data tables

// Create a list of the pointers to the data tables.  This will allow for the using of the list 
// entries to point to the data tables using a subscript 
dataTableList = {};
insert into(dataTableList, dt1 );
insert into(dataTableList, dt2 );

// Loop across the data table list and expand the columns based upon the the column called Data
For( i = 1, i <= N Items( dataTableList ), i++,
	ncols_to_add = length( substitute((dataTableList[i]:Data)[1],".",""));
	For( k = 1, k <= ncols_to_add, k++, // the || operator joins two strings
		dataTableList[i] << New Column( "c" || Char( i ),
			Numeric,
			Continous,
			<<Set Each Value( Num( Substr( Regex( :data, "\D", "", GLOBALREPLACE ), k, 1 ) ) )
		)
	)
);

// Now rename the columns in each of the data tables based upon the labels in the Category table

For( i = 1, i <= N Items( dataTableList ), i++, 
	// Subset the dtCat table into the a new data table with just the one
	// category being processed
	dtCat << select where( :Category == (dataTableList[i]:Category)[1] );
	dtSub = dtCat << subset( selected rows == 1, selected columns( 0 ) );
	
	// set the offset of where to start the renaming
	offset = 2;
	
	// Loop
	// Change names in dt1 to match the Label in 
	For( k = 1, k <= N Rows( dtSub ), k++,
		Column( dataTableList[i], k + offset ) << set name( dtSub:Label[k] )
	);
	
	Close( dtSub, nosave );
);
Jim
david_sam
Level II

Re: rename column names in a jmp file as per values in rows in a separate .csv file

Thanks Jim. With minor changes [to my needs] it works ! Thank you !!

 

Question, can the list [datatableList] hold lot of info?

Each of my tables has lots of rows like in thousands and has many cols also. 

 

-David

 

 

txnelson
Super User

Re: rename column names in a jmp file as per values in rows in a separate .csv file

dataTableList does not hold the data tables, it only holds pointers to the tables. Create the list and then run

Show( dataTableList );

and you will see what is in the list.
You can also see what is in the list, by just hovering your mouse over the variable, and it will display it's value

Jim
david_sam
Level II

Re: rename column names in a jmp file as per values in rows in a separate .csv file

yup. got it. holds the name of the data table

 

-David