cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
doraemengs
Level III

getting column name with specific cut-off values

Hi,

I try to count and get specific column names from these data examples (my real data contained 180 columns and 20000 rows)

I want to select the column that cut-off > 1000 

 

No.A*1:01A*2:01B*9:02C*10:01DR*56:01DP*07_GGet name
110.6709520.2315000.190.5640.89B*9:02, C*10:01
2000009.56 
39040.3345.3345.9999.934510.5A*1:01
434.4456645700A*2:01
59190.2135673456678990880A*1:01,A*2:01,B*9:02,C*10:01,DR*56:01
6000000 
...       

I try the previous script like this 

Names Default To Here( 1 );

// Create a sample data table
dt = New Table( "Example",
	Add Rows( 2 ),
	New Column( "A*1:01", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0, 50000] ) ),
	New Column( "A*2:01", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0, 50000] ) ),
	New Column( "B*9:02", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0, 50000] ) ),
	New Column( "C*10:01", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0, 50000] ) ),
	New Column( "DR*56:01", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0, 50000] ) ),
//get wanted columns. Assumes that IMC columns start from column 1 and all IMC columns are one after another
colNames = dt << Get Column Names("String");
For(i = N Items(colNames), i > 3000 , i--,
);

//new column with IMC with value 1
dt << New Column( "Ag", Character, 
	<< Set Each Value(
		rowValues = dt[Row(), 0][1::N Items(colNames)]; //get values of row as matrix
		colIndex = Contains(rowValues, 1); //get index of
		If(colIndex > 0, //handle 
			colNames[colIndex], //get column name with the index
			""; //when no 1 found use ""
		);
	);
);

Could you please to correct my scripts, Thank you!

In addition, can I use matrix because of I need to generate the "new column" of 180 column. 

 

Best, 

25 REPLIES 25
jthi
Super User

Re: getting column name with specific cut-off values

Here is some sort of a script, I'm still left with guess work for the correct rules so no idea if it does what it is supposed to do

Names Default To Here(1);

dt = Open("$DOWNLOADS/Examples_get column name_edit.jmp");

// fix data
For Each({colname}, dt << Get Column Names("Character", "String"),
	If(colname == "Alphabet",
		break();
	);
	Column(dt, colname) << Set Data Type("Numeric") << Set Modeling Type("Continuous");
);
dt << Delete Columns("get name and value");
dt << New Column("R", Numeric, Ordinal, Formula(Row()));

dt_stacked = dt << Stack(
	columns(
		:"DOA1*01:01,DOB1*01:01_DO1"n, :"DOA1*02:01,DOB1*01:01_DO1"n,
		:"DOA1*03:01,DOB1*01:01_DO1"n, :"DOA1*02:01,DOB1*02:01_DO2"n,
		:"DOA1*01:05,DOB1*02:01_DO2"n, :"DOA1*04:03,DOB1*03:01_DO3"n,
		:"DOA1*06:01,DOB1*04:01_DO4"n, :"DOA1*05:01,DOB1*03:01_DO7"n
	),
	Source Label Column("Label"),
	Stacked Data Column("Data"),
	Output Table("stacked")
);

rows_to_delete = dt_stacked << Get Rows Where(:Data <= 1000);
dt_stacked << Delete Rows(rows_to_delete);
dt_stacked << New Column("Match", Numeric, Nominal);

For Each Row(dt_stacked,
	alphs = Words(:Alphabet, ",");
	temp_label = Regex(:Label, "(\d\*)", "", GLOBALREPLACE);
	retval = 0;
	For Each({alph}, alphs,
		If(Contains(temp_label, alph),
			retval = 1;
			break();
		);
	);
	:Match = retval;
);

rows_to_delete = dt_stacked << Get Rows Where(:Match == 0);
dt_stacked << Delete Rows(rows_to_delete);

dt_stacked << New Column("NewStr", Character, Nominal, Formula(
	:Label || " (" || Char(:Data) || ")"
));

Summarize(dt_stacked, rs = By(:R));
fixed_rs = Transform Each({r}, rs, Num(r));
vals = Associative Array();

For Each({r}, fixed_rs,
	t = dt_stacked << Get Rows Where(:R == r);
	vals[r] = Concat Items(dt_stacked[t, "NewStr"], ",");
);
Close(dt_ stacked);

new_col = dt << New Column("Res", Character, Nominal, Formula(
	If(Contains(vals, :R),
		vals[:R]
	,
		""
	);
	
));
dt << run formulas;
new_col << Delete Formula;

 

-Jarmo
doraemengs
Level III

Re: getting column name with specific cut-off values

Thank you, Jarmo. It's work!

doraemengs
Level III

Re: getting column name with specific cut-off values

It is not matched by biology. 

txnelson
Super User

Re: getting column name with specific cut-off values

Here is an alternative method that may help with a large number of columns

Names Default To Here( 1 );

dt = Open("$DOWNLOADS/Examples_get column name_by cutoff.jmp");

colnames = Remove( dt << Get Column Names( "String" ), 1 ); // drop No column

// Fix data and modeling types
For Each({colname}, colnames,
	Column(dt, colname) << Data Type("Numeric") << Modeling Type("Continuous");
);

dtStack = dt << Stack(
	columns( colnames ),
	Source Label Column( "ColName" ),
	Stacked Data Column( "Value" ),
	Output Table( "The Stack" )
);

dtStack << select where( :value <= 1000 );
dtStack << delete rows;
dtStack << delete columns( Value );

dtSplit = dtStack << Split(
	Split By( :ColName ),
	Split( :ColName ),
	Group( :No ),
	Output Table( "Split of The Stack by ColName" ),
	Sort by Column Property
);

close( dtStack, nosave );
colnames = Remove( dtSplit << Get Column Names( "String" ), 1 );
dtSplit << Combine Columns(
	delimiter( "," ),
	columns( colnames ),
	Column Name( "Get Name" )
);
dtSplit << delete columns( colnames );
dt << Update( With( dtSplit ), Match Columns( :No = :No ) );

Close( dtSplit, nosave );
Jim
doraemengs
Level III

Re: getting column name with specific cut-off values

Thank you, Jim, but it does not work. please see the attached file. 

txnelson
Super User

Re: getting column name with specific cut-off values

If the script was written to run with a column called "No" followed by numerical columns.  Your supplied data table has 2 character columns in addition to the numeric.  If you remove the columns "Alphabet of interest" and "get name and value" and then run the script, it works fine.

You should be able to easily change the script to compensate for the additional columns.  Either deleting them in the script before beginning the getting the colnames retrieved, or else deleting them from the colnames list before processing the For Each should work. 

Jim