cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
doraemengs
Level III

getting column name with specific cut-off values

Hi,

I try to count and get specific column names from these data examples (my real data contained 180 columns and 20000 rows)

I want to select the column that cut-off > 1000 

 

No.A*1:01A*2:01B*9:02C*10:01DR*56:01DP*07_GGet name
110.6709520.2315000.190.5640.89B*9:02, C*10:01
2000009.56 
39040.3345.3345.9999.934510.5A*1:01
434.4456645700A*2:01
59190.2135673456678990880A*1:01,A*2:01,B*9:02,C*10:01,DR*56:01
6000000 
...       

I try the previous script like this 

Names Default To Here( 1 );

// Create a sample data table
dt = New Table( "Example",
	Add Rows( 2 ),
	New Column( "A*1:01", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0, 50000] ) ),
	New Column( "A*2:01", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0, 50000] ) ),
	New Column( "B*9:02", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0, 50000] ) ),
	New Column( "C*10:01", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0, 50000] ) ),
	New Column( "DR*56:01", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0, 50000] ) ),
//get wanted columns. Assumes that IMC columns start from column 1 and all IMC columns are one after another
colNames = dt << Get Column Names("String");
For(i = N Items(colNames), i > 3000 , i--,
);

//new column with IMC with value 1
dt << New Column( "Ag", Character, 
	<< Set Each Value(
		rowValues = dt[Row(), 0][1::N Items(colNames)]; //get values of row as matrix
		colIndex = Contains(rowValues, 1); //get index of
		If(colIndex > 0, //handle 
			colNames[colIndex], //get column name with the index
			""; //when no 1 found use ""
		);
	);
);

Could you please to correct my scripts, Thank you!

In addition, can I use matrix because of I need to generate the "new column" of 180 column. 

 

Best, 

3 ACCEPTED SOLUTIONS

Accepted Solutions
jthi
Super User

Re: getting column name with specific cut-off values

You will have to open your table, fix the columns and then run the part where new column is created

Names Default To Here(1);

dt = Open("$DOWNLOADS/Examples_get column name_by cutoff.jmp", invisible);

colnames = Remove(dt << Get Column Names("String"), 1); // drop No column

// Fix data and modeling types
For Each({colname}, colnames,
	Column(dt, colname) << Data Type("Numeric") << Modeling Type("Continuous");
);


newcol = dt << New Column("getname", Character, Nominal, Formula(
	valid_idx = Loc(dt[Row(), colnames] > 1000);
	If(N Items(valid_idx) > 0,
		Concat Items(colnames[valid_idx], ", ")
	,
		""
	);
));
dt << run formulas;
newcol << delete formula;

I strongly suggest you check out Scripting Guide from JMP Help

-Jarmo

View solution in original post

jthi
Super User

Re: getting column name with specific cut-off values

Most likely doing something like Jim suggested is faster and easier, but you can do something like this with the method I did demonstrate

Names Default To Here(1);

dt = Open("$DOWNLOADS/Examples_get column name_by cutoff.jmp", invisible);

Summarize(dt, alphs = By(:"Alphabet of interest"n));
aa = Associative Array(alphs, Repeat({{}}, N Items(alphs)));

collist = dt << Get Column Names("String");
For Each({colname}, collist,
	alph = Word(1, colname, "*");
	If(Contains(aa, alph),
		Insert Into(aa[alph], colname);
		Column(dt, colname) << Set Data Type("Numeric") << Modeling Type("Continuous");
	);
);


str_template = "^col^ (^val^)";
newcol = dt << New Column("getname", Character, Nominal, Formula(
	cols = aa[:"Alphabet of interest"n];
	vals = dt[Row(), aa[:"Alphabet of interest"n]];
	valid_idx = Loc(vals > 1000);
	
	If(N Items(valid_idx) > 0,
		res = {};
		For Each({{col, val}}, Across(cols[valid_idx], vals[valid_idx]),
			Insert Into(res, Eval Insert(str_template));
		);
		Concat Items(res, ", ");
	,
		""
	);
));

dt << run formulas;
newcol << delete formula;

dt << show window(1);

jthi_0-1723574578991.png

-Jarmo

View solution in original post

jthi
Super User

Re: getting column name with specific cut-off values

Here is some sort of a script, I'm still left with guess work for the correct rules so no idea if it does what it is supposed to do

Names Default To Here(1);

dt = Open("$DOWNLOADS/Examples_get column name_edit.jmp");

// fix data
For Each({colname}, dt << Get Column Names("Character", "String"),
	If(colname == "Alphabet",
		break();
	);
	Column(dt, colname) << Set Data Type("Numeric") << Set Modeling Type("Continuous");
);
dt << Delete Columns("get name and value");
dt << New Column("R", Numeric, Ordinal, Formula(Row()));

dt_stacked = dt << Stack(
	columns(
		:"DOA1*01:01,DOB1*01:01_DO1"n, :"DOA1*02:01,DOB1*01:01_DO1"n,
		:"DOA1*03:01,DOB1*01:01_DO1"n, :"DOA1*02:01,DOB1*02:01_DO2"n,
		:"DOA1*01:05,DOB1*02:01_DO2"n, :"DOA1*04:03,DOB1*03:01_DO3"n,
		:"DOA1*06:01,DOB1*04:01_DO4"n, :"DOA1*05:01,DOB1*03:01_DO7"n
	),
	Source Label Column("Label"),
	Stacked Data Column("Data"),
	Output Table("stacked")
);

rows_to_delete = dt_stacked << Get Rows Where(:Data <= 1000);
dt_stacked << Delete Rows(rows_to_delete);
dt_stacked << New Column("Match", Numeric, Nominal);

For Each Row(dt_stacked,
	alphs = Words(:Alphabet, ",");
	temp_label = Regex(:Label, "(\d\*)", "", GLOBALREPLACE);
	retval = 0;
	For Each({alph}, alphs,
		If(Contains(temp_label, alph),
			retval = 1;
			break();
		);
	);
	:Match = retval;
);

rows_to_delete = dt_stacked << Get Rows Where(:Match == 0);
dt_stacked << Delete Rows(rows_to_delete);

dt_stacked << New Column("NewStr", Character, Nominal, Formula(
	:Label || " (" || Char(:Data) || ")"
));

Summarize(dt_stacked, rs = By(:R));
fixed_rs = Transform Each({r}, rs, Num(r));
vals = Associative Array();

For Each({r}, fixed_rs,
	t = dt_stacked << Get Rows Where(:R == r);
	vals[r] = Concat Items(dt_stacked[t, "NewStr"], ",");
);
Close(dt_ stacked);

new_col = dt << New Column("Res", Character, Nominal, Formula(
	If(Contains(vals, :R),
		vals[:R]
	,
		""
	);
	
));
dt << run formulas;
new_col << Delete Formula;

 

-Jarmo

View solution in original post

25 REPLIES 25
txnelson
Super User

Re: getting column name with specific cut-off values

Here is how I would handle this

txnelson_0-1718733620444.png

names default to here(1);
dt = New Table( "Example",
	Add Rows( 6 ),
	New Column( "A*1:01",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values(
			[10.67, 0, 9040.2999999999993, 34.399999999999999, 9190.2000000000007, 0
			]
		)
	),
	New Column( "A*2:01",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [0, 0, 345.30000000000001, 4566, 13567, 0] )
	),
	New Column( "B*9:02",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [9520.2299999999996, 0, 345.89999999999998, 45, 3456, 0] )
	),
	New Column( "C*10:01",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [15000.1, 0, 999.89999999999998, 7, 6789, 0] )
	),
	New Column( "DR*56:01",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [90.560000000000002, 0, 345, 0, 9088, 0] )
	),
	New Column( "DP*07_G",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [40.890000000000001, 9.5600000000000005, 10.5, 0, 0, 0] )
	)
);

dt << New Column( "Get Name",character);

colNames = dt << get column names( continuous,string );

For Each Row(
	theList = {};
	For Each( {col, i}, colNames,
		If( As Column( col ) > 3000,
			Insert Into( theList, col )
		)
	);
	:Get Name = Concat Items( theList, "," );
);

Jim
doraemengs
Level III

Re: getting column name with specific cut-off values

Hi Jim,

 

Thank you for your help. However, I have 180 columns and 20000 rows in difference words and I cannot put all of them as following scripts. Could you please provide me with more suggestions?

 

Thank you

jthi
Super User

Re: getting column name with specific cut-off values

This might be fast enough

Names Default To Here(1);

// Create a sample data table
dt = New Table("Example",
	Add Rows(20000),
	invisible
);

colcount = 200;
For(i = 1, i <= colcount, i++,
	newcol = dt << New Column(Char(i), Numeric, Continuous, Formula(
		Random Normal() * 1000
	));
	dt << run formulas;
	newcol << delete formula;
);

start = Tick Seconds();
colnames = dt << Get Column Names("String");
newcol = dt << New Column("getname", Character, Nominal, Formula(
	valid_idx = Loc(dt[Row(), colnames] > 1000);
	If(N Items(valid_idx) > 0,
		Concat Items(colnames[valid_idx], ", ")
	,
		""
	);
));
dt << run formulas;
newcol << delete formula;

wait(0);
end = Tick Seconds();
show(end - start);
-Jarmo
doraemengs
Level III

Re: getting column name with specific cut-off values

Hi Jarmo,

 

Thank you so much. But the result is "end - start = 0.0333333333255723;" That's not the new table. could you please provide more solutions? please show me the following data table on JMP. 

 

Thank you

jthi
Super User

Re: getting column name with specific cut-off values

My script creates new column "getname" with the values

jthi_0-1718737059176.png

 

With your data, after you fix your columns to numeric (and drop No column)

jthi_1-1718737144803.png

 

-Jarmo
doraemengs
Level III

Re: getting column name with specific cut-off values

Hi Jarmo,

 

Thank you for your help but to be honest as in the following attachment, I cannot run the script as you did. when I checked the script (command + 1) shown below. Thank you for your help.

dt = New Table("Example",
	Add Rows(20000),
	invisible
);

colcount = 200;
For(i = 1, i <= colcount, i++,
	newcol = dt << New Column(Char(i), Numeric, Continuous, Formula(
		Random Normal() * 1000
	));
	dt << run formulas;
	newcol << delete formula;
);

start = Tick Seconds();
colnames = dt << Get Column Names("String");
newcol = dt << New Column("getname", Character, Nominal, Formula(
	valid_idx = Loc(dt[Row(), colnames] > 1000);
	If(N Items(valid_idx) > 0,
		Concat Items(colnames[valid_idx], ", ")
	,
		""
	);
));
dt << run formulas;
newcol << delete formula;

wait(0);
end = Tick Seconds();
show(end - start);
/*:

end - start = 0.683333333348855;

 

jthi
Super User

Re: getting column name with specific cut-off values

You will have to open your table, fix the columns and then run the part where new column is created

Names Default To Here(1);

dt = Open("$DOWNLOADS/Examples_get column name_by cutoff.jmp", invisible);

colnames = Remove(dt << Get Column Names("String"), 1); // drop No column

// Fix data and modeling types
For Each({colname}, colnames,
	Column(dt, colname) << Data Type("Numeric") << Modeling Type("Continuous");
);


newcol = dt << New Column("getname", Character, Nominal, Formula(
	valid_idx = Loc(dt[Row(), colnames] > 1000);
	If(N Items(valid_idx) > 0,
		Concat Items(colnames[valid_idx], ", ")
	,
		""
	);
));
dt << run formulas;
newcol << delete formula;

I strongly suggest you check out Scripting Guide from JMP Help

-Jarmo
doraemengs
Level III

Re: getting column name with specific cut-off values

Hi Jarmo, Do you have any idea for extracting the values of this that are more than the cutoff (if I have 50 columns) with each column name (getname column on an examples)

jthi
Super User

Re: getting column name with specific cut-off values

How do you wish to extract them? Doesn't my solution create new column which has the column names as strings separated by , and space?

jthi_0-1723564823980.png

Do you wish to check values by column instead of by row? Or something else?

 

Edit:

If you just want different column list, I would have to know how to filter it. If you just want all columns, remove the Remove() function

Names Default To Here(1);

dt = Open("$DOWNLOADS/Examples_get column name_by cutoff.jmp", invisible);

colnames = dt << Get Column Names("String");
For Each({colname}, colnames,
	Column(dt, colname) << Data Type("Numeric") << Modeling Type("Continuous");
);


For(i = 1, i < 100, i++,
	newcol = dt << New Column(Hex(i), Numeric, Nominal, Values(J(1, N Rows(dt), Random Normal()*1000)));
);

colnames = dt << Get Column Names("String");

newcol = dt << New Column("getname", Character, Nominal, Formula(
	valid_idx = Loc(dt[Row(), colnames] > 1000);
	If(N Items(valid_idx) > 0,
		Concat Items(colnames[valid_idx], ", ")
	,
		""
	);
));
dt << run formulas;
newcol << delete formula;
-Jarmo