BookmarkSubscribe
Choose Language Hide Translation Bar
New Contributor

## Median Absolute Deviation --> Export To Table for further tSNE

Dear all,

I am currently working on a project using 57 samples and >800 000 variables (see image attached). Now I would like to reduce the variables by using only the ones with highest median absolute deviation (MAD). The goal would be to process the data further with PC Analysis, followed by a t-SNE (I am using the nice JMP add-in).

So far I am struggling with the identification of the variables with highest MAD and directly export these variables to a data table.

Pat

4 REPLIES 4
Highlighted
Super User

## Re: Median Absolute Deviation --> Export To Table for further tSNE

Here is a simple example of generating the MAD.  I suggest you turn it into a function, and then create a new data table with one row, and your >800000 columns.

``````Names Default To Here( 1 );
dt = Open( "\$SAMPLE_DATA/big class.jmp" );

theMatrix = Column( dt, "height" ) << get as matrix;
MAD = Quantile( .5, Abs( theMatrix - Quantile( .5, theMatrix ) ) );``````
Jim
Super User

## Re: Median Absolute Deviation --> Export To Table for further tSNE

Here is a more complete example that you may be able to directly use by just changing the reference to the "dt" data table, and the column in the output table to delete(i.e. Sample ID)

``````Names Default To Here( 1 );
dt = Open( "\$SAMPLE_DATA/semiconductor capability.jmp" );
dt << select where(row()==1);

Selected Rows( 0 ),
Rows( [1] ),
Selected columns only( 0 )
);
dtMAD << delete columns({"Lot_id","wafer","Wafer id in lot ID","site"});

For( i = 1, i <= N Cols( dtMAD ), i++,
theMatrix = Column( dt, Column( dtMAD, i ) << get name ) << get as matrix;
Column( dtMAD, i )[1] = Quantile( .5, Abs( theMatrix - Quantile( .5, theMatrix ) ) );
);``````
Jim
Staff

## Re: Median Absolute Deviation --> Export To Table for further tSNE

I initially mis-read 'median' as 'mean' and wrote the following: It takes a few seconds on my Mac, but almost all of that is in making the example table:

``````NamesDefaultToHere(1);

// Make a table with 'nc' columns and 'nr' rows to use . . .
nc = 1000;
nr = 57;
for(c=1, c<=nc, c++,
dt << NewColumn("Col "||Char(c), Numeric, Continuous, Formula(RandomNormal(0, 1)));
);
dt << runFormulas;

// Start Here: Calculate MAD for each column . . .
dt = CurrentDataTable();
mat = dt << getAsMatrix;
// (1): Get the mean of each column in a row vector
mean = VMean(mat);
// (2): 'Explode' this to make a matrix congruant with 'mat'
meanMat = Repeat(mean, nr);
// (3): Get the Absolute deviations
// Find the column numbers of the 'nc2' columns that have the biggest MAD values
nc2 = Round(nc/10, 0);
// Make a second table
dt2 = NewTable("Top "||Char(nc2)||" Largest MAD Values from "||(dt << getName),