cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
doraemengs
Level III

getting column name with specific cut-off values

Hi,

I try to count and get specific column names from these data examples (my real data contained 180 columns and 20000 rows)

I want to select the column that cut-off > 1000 

 

No.A*1:01A*2:01B*9:02C*10:01DR*56:01DP*07_GGet name
110.6709520.2315000.190.5640.89B*9:02, C*10:01
2000009.56 
39040.3345.3345.9999.934510.5A*1:01
434.4456645700A*2:01
59190.2135673456678990880A*1:01,A*2:01,B*9:02,C*10:01,DR*56:01
6000000 
...       

I try the previous script like this 

Names Default To Here( 1 );

// Create a sample data table
dt = New Table( "Example",
	Add Rows( 2 ),
	New Column( "A*1:01", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0, 50000] ) ),
	New Column( "A*2:01", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0, 50000] ) ),
	New Column( "B*9:02", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0, 50000] ) ),
	New Column( "C*10:01", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0, 50000] ) ),
	New Column( "DR*56:01", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0, 50000] ) ),
//get wanted columns. Assumes that IMC columns start from column 1 and all IMC columns are one after another
colNames = dt << Get Column Names("String");
For(i = N Items(colNames), i > 3000 , i--,
);

//new column with IMC with value 1
dt << New Column( "Ag", Character, 
	<< Set Each Value(
		rowValues = dt[Row(), 0][1::N Items(colNames)]; //get values of row as matrix
		colIndex = Contains(rowValues, 1); //get index of
		If(colIndex > 0, //handle 
			colNames[colIndex], //get column name with the index
			""; //when no 1 found use ""
		);
	);
);

Could you please to correct my scripts, Thank you!

In addition, can I use matrix because of I need to generate the "new column" of 180 column. 

 

Best, 

25 REPLIES 25
doraemengs
Level III

Re: getting column name with specific cut-off values

Screenshot 2024-08-13 at 2.11.43 PM.png

 Thank you so much, I would like to clarify this. I have another column of interest as "alphabet of interest" and I want to get name and value of all columns with values> 1,000 and all columns match with the column alphabet of interest.

 

Thank you!

 

 

jthi
Super User

Re: getting column name with specific cut-off values

Most likely doing something like Jim suggested is faster and easier, but you can do something like this with the method I did demonstrate

Names Default To Here(1);

dt = Open("$DOWNLOADS/Examples_get column name_by cutoff.jmp", invisible);

Summarize(dt, alphs = By(:"Alphabet of interest"n));
aa = Associative Array(alphs, Repeat({{}}, N Items(alphs)));

collist = dt << Get Column Names("String");
For Each({colname}, collist,
	alph = Word(1, colname, "*");
	If(Contains(aa, alph),
		Insert Into(aa[alph], colname);
		Column(dt, colname) << Set Data Type("Numeric") << Modeling Type("Continuous");
	);
);


str_template = "^col^ (^val^)";
newcol = dt << New Column("getname", Character, Nominal, Formula(
	cols = aa[:"Alphabet of interest"n];
	vals = dt[Row(), aa[:"Alphabet of interest"n]];
	valid_idx = Loc(vals > 1000);
	
	If(N Items(valid_idx) > 0,
		res = {};
		For Each({{col, val}}, Across(cols[valid_idx], vals[valid_idx]),
			Insert Into(res, Eval Insert(str_template));
		);
		Concat Items(res, ", ");
	,
		""
	);
));

dt << run formulas;
newcol << delete formula;

dt << show window(1);

jthi_0-1723574578991.png

-Jarmo
doraemengs
Level III

Re: getting column name with specific cut-off values

Thank you so much. This script does work. However, I am sorry that my original data was complicated more than this. The Alphabet of interest has multiple "alphabet" and "numbers" separated with "," (A1, B9) as attached files. How can I get name and value column that correlated alphabet of interest column and column name. 

Screenshot 2024-08-13 at 4.12.46 PM.png

Thank you for your help.

 

jthi
Super User

Re: getting column name with specific cut-off values

How are those "alphabets" determined? Are they everything that is after last "_"? If that is the case you should be able to make very slight modification to my script and take last word instead of the first and use different separator, so this part requires slight modification 

alph = Word(1, colname, "*");
-Jarmo
doraemengs
Level III

Re: getting column name with specific cut-off values

Thank you for your suggestion. If I want to match the first word before ":" it should be this

alph = Word(1, colname, ":");

and if I want to match the last word after "_" it should be this

alph = Word(-1, colname, "_");

What if I need to match before ":" and after "_" simultaneously? After I joined the scripts I ran for each above, I could not manage with duplicate (exactly the same thing).

 

Thank you

jthi
Super User

Re: getting column name with specific cut-off values

Looking at your table start and end will be different (start A*1 and end A1) so you will have to add those "extra" keys to the associative array to perform the comparison AND handle duplicates later.

-Jarmo
doraemengs
Level III

Re: getting column name with specific cut-off values

In addition, if a column in the alphabet of interest had more than 1, the script did not run. Do you have any suggestion on this?  

 

Thank you!

jthi
Super User

Re: getting column name with specific cut-off values

My script handled the situations you did provide. Depending on the real question, the script might require total re-write or possibly just some small modifications.

-Jarmo
doraemengs
Level III

Re: getting column name with specific cut-off values

Thank you Jarmo, I would like to attach the file that is compatible with what I want to do here. I want to match Alphabet column that compose of DOA and DO with each column of DOA1*0x:0x,DOB1*0x:0x_DOx. The DOA0x matches with DOA1*0x (not include after ":") and DOx or DOB1*Ox matches with DOx. All of them need to have a value of more than 1000 to be showed in the result column. I cannot correct the existing formula as you said. Thank you in advance!

jthi
Super User

Re: getting column name with specific cut-off values

Why wouldn't DOA1*0x match with DOx?

-Jarmo