Hi everyone.
1.) How can I extract the month in words from a date value for an entire column?
Say cell # 1 has the date "03/19/2009". I would like the next column to get the month - "March", in this example.
2.) Is there a way to change the default marker for outliers - like globally make the outlier markers an outlined-triangle?
Appreciate the help.
hi rabelardo,
txnelson is right on both topics. but, if you would like to set the marker for univariate outliers (bruteforce style) you can try the following.
it can be an advantage (or disadvantage) that any row that is an outlier in any column will now have the special mark. you could modify this to mark the outliers in specific columns as you go along with the analysis.
Good Luck!
Ron
Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA/Big Class.jmp" );
// get column list
cols = dt << get column names( Continuous ); //List all Numeric columns
For( i = 1, i <= N Items( cols ), i++,
out = Eval(
Parse(
"dt << get rows where(
dt:" || Char( cols[i] ) || " > Col Quantile( dt:" || Char( cols[i] ) || ", 0.75 ) + 1.5 * (Col Quantile( dt:" ||
Char( cols[i] ) || ", 0.75 ) - Col Quantile( dt:" || Char( cols[i] ) || ", 0.25 )) | dt:" || Char( cols[i] ) || " <
Col Quantile( dt:"
|| Char( cols[i] ) || ", 0.25 ) - 1.5 * (Col Quantile( dt:" || Char( cols[i] ) || ", 0.75 ) - Col Quantile( dt:" || Char( cols[i] ) ||
", 0.25 )));"
)
)`;
selected = dt << select rows( out );
selected << Markers( "*" );
selected << clear select;
);
// see what it looks like
Distribution(
Continuous Distribution( Column( :height ), Show Percents( 1 ), Customize Summary Statistics( Sum( 1 ) ) ),
Continuous Distribution( Column( :weight ), Show Percents( 1 ), Customize Summary Statistics( Sum( 1 ) ) ),
SendToReport(
Dispatch( {"Distributions", "height"}, "Distrib Histogram", FrameBox, {DispatchSeg( Hist Seg( 1 ), Histogram Color( 53 ) )} ),
Dispatch( {"Distributions", "weight"}, "Distrib Histogram", FrameBox, {DispatchSeg( Hist Seg( 1 ), Histogram Color( 53 ) )} )
)
);
The following results were achieved:
By applying the following formula to the Month Column:
Concerning you second question about assigning specific markers to outliers. The problem with that, is that markers are assigned as a rowstate to rows, not to cells within columns. Therefore, one does not have the ability to do a global assignment of outlier markers for each column.
Much appreciated, Jim !
I was about to give up and just settle on Substring() or Left().
Your solution works perfectly.
- Randy
hi rabelardo,
txnelson is right on both topics. but, if you would like to set the marker for univariate outliers (bruteforce style) you can try the following.
it can be an advantage (or disadvantage) that any row that is an outlier in any column will now have the special mark. you could modify this to mark the outliers in specific columns as you go along with the analysis.
Good Luck!
Ron
Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA/Big Class.jmp" );
// get column list
cols = dt << get column names( Continuous ); //List all Numeric columns
For( i = 1, i <= N Items( cols ), i++,
out = Eval(
Parse(
"dt << get rows where(
dt:" || Char( cols[i] ) || " > Col Quantile( dt:" || Char( cols[i] ) || ", 0.75 ) + 1.5 * (Col Quantile( dt:" ||
Char( cols[i] ) || ", 0.75 ) - Col Quantile( dt:" || Char( cols[i] ) || ", 0.25 )) | dt:" || Char( cols[i] ) || " <
Col Quantile( dt:"
|| Char( cols[i] ) || ", 0.25 ) - 1.5 * (Col Quantile( dt:" || Char( cols[i] ) || ", 0.75 ) - Col Quantile( dt:" || Char( cols[i] ) ||
", 0.25 )));"
)
)`;
selected = dt << select rows( out );
selected << Markers( "*" );
selected << clear select;
);
// see what it looks like
Distribution(
Continuous Distribution( Column( :height ), Show Percents( 1 ), Customize Summary Statistics( Sum( 1 ) ) ),
Continuous Distribution( Column( :weight ), Show Percents( 1 ), Customize Summary Statistics( Sum( 1 ) ) ),
SendToReport(
Dispatch( {"Distributions", "height"}, "Distrib Histogram", FrameBox, {DispatchSeg( Hist Seg( 1 ), Histogram Color( 53 ) )} ),
Dispatch( {"Distributions", "weight"}, "Distrib Histogram", FrameBox, {DispatchSeg( Hist Seg( 1 ), Histogram Color( 53 ) )} )
)
);
Cool script, Ron !
I've tried it on Fit Y by X, Distribution & Graph Builder and it worked like a charm *
Really thankful and admire your expertise in JMP, Ron & Jim.
a more sophisticated version:
Names Default To Here( 1 );
// make a sample data table
dt = New Table( "Data",
add rows( 10000 ),
New Column( "it", Numeric, Ordinal, Format( "Best", 8 ), Formula( Row() ) ),
New Column( "Batch", Numeric, Ordinal, Format( "Best", 8 ), formula( Random Integer( 110 ) ) ),
New Column( "y1", Numeric, Continues, Format( "Best", 8 ), formula( Random Lognormal( :Batch / 10, 5 ) ) ),
New Column( "y2", Numeric, Continues, Format( "Best", 8 ), formula( Random Lognormal( :Batch / 10, 5 ) ) ),
New Column( "Out",
Numeric,
Ordinal,
Format( "Best", 8 ),
Set Property( "Value Colors", {0 = -13977687, 1 = -4222943} ),
Color Cell by Value,
formula( 0 )
),
New Column( "Outy1",
Numeric,
Ordinal,
Format( "Best", 8 ),
Set Property( "Value Colors", {0 = -13977687, 1 = -4222943} ),
Color Cell by Value,
formula( 0 )
),
New Column( "Outy2",
Numeric,
Ordinal,
Format( "Best", 8 ),
Set Property( "Value Colors", {0 = -13977687, 1 = -4222943} ),
Color Cell by Value,
formula( 0 )
),
);
dt << run formulas;
Column( dt, "it" ) << delete formula;
Column( dt, "Batch" ) << delete formula;
Column( dt, "y1" ) << delete formula;
Column( dt, "y2" ) << delete formula;
Column( dt, "Out" ) << delete formula;
Column( dt, "Outy1" ) << delete formula;
Column( dt, "Outy2" ) << delete formula;
// now we get to work
// make a list of batches
byvarlist = Associative Array( dt:Batch ) << get keys;
// make a list of columns
cols = dt << get column names( Continuous ); //List all Numeric columns
// loop through the batches
For( i = 1, i <= N Items( byvarlist ), i++,
// select each batch
dt << select where( dt:Batch == byvarlist[i] );
// subset each batch
subdt = dt << Subset( Selected Rows( 1 ) );
// clear selection for smoth workflow
dt << clear select;
// finde outliers in each variable within each batch
For( ii = 1, ii <= N Items( cols ), ii++,
out = Eval(
Parse(
"subdt << get rows where(
subdt:" || Char( cols[ii] ) || " > Col Quantile( subdt:" || Char( cols[ii] ) ||
", 0.75 ) + 1.5 * (Col Quantile( subdt:" || Char( cols[ii] ) || ", 0.75 ) - Col Quantile( subdt:" || Char( cols[ii] ) ||
", 0.25 )) | subdt:" || Char( cols[ii] ) || " <
Col Quantile( subdt:" || Char( cols[ii] ) ||
", 0.25 ) - 1.5 * (Col Quantile( subdt:" || Char( cols[ii] ) || ", 0.75 ) - Col Quantile( subdt:" || Char( cols[ii] ) ||
", 0.25 )));"
)
)`;
Eval( Parse( "column (subdt, \!"Out" || Char( cols[ii] || "\!" )[" || Char( out ) || "] = 1;" ) ) );
// also indicate the global outlier
Column( subdt, "Out" )[out] = 1;
//wait (0.00001);
);
// update original data table with new information for the batch
subdt << delete columns( "Batch" );
subdt << delete columns( cols );
dt << Update( With( subdt ), Match Columns( :it = :it ) );
Close( subdt, No Save );
//Wait( 0.00001 );
);
dt << color by column( :Out );
another non recommended way to do what you want is by changing the preferences for selected rows in graphs. i do not recommend working with selections since some other commands will use the selection by default instead of all the rows in the table. but you can still try this:
Names Default To Here( 1 );
//Caution: Changing a preference will
//affect the default behavior of JMP.
Preferences( Marker Selection Mode ( 1 ) ); // you can do this from the file>> preferences menu as well, only need to do it once.
dt = Open( "$SAMPLE_DATA/Big Class.jmp" );
// get column list
cols = dt << get column names( Continuous ); //List all Numeric columns
For( i = 1, i <= N Items( cols ), i++,
out = Eval(
Parse(
"dt << get rows where(
dt:" || Char( cols[i] ) || " > Col Quantile( dt:" || Char( cols[i] ) || ", 0.75 ) + 1.5 * (Col Quantile( dt:" ||
Char( cols[i] ) || ", 0.75 ) - Col Quantile( dt:" || Char( cols[i] ) || ", 0.25 )) | dt:" || Char( cols[i] ) || " <
Col Quantile( dt:"
|| Char( cols[i] ) || ", 0.25 ) - 1.5 * (Col Quantile( dt:" || Char( cols[i] ) || ", 0.75 ) - Col Quantile( dt:" || Char( cols[i] ) ||
", 0.25 )));"
)
)`;
selected = dt << select rows( out );
selected << Markers( 17 );
//selected << Marker size ( 5 ); // no such option
);
// see what it looks like
new window ("Output",
h list box (
Distribution(
Continuous Distribution( Column( :height ), Show Percents( 1 ), Customize Summary Statistics( Sum( 1 ) ) ),
Continuous Distribution( Column( :weight ), Show Percents( 1 ), Customize Summary Statistics( Sum( 1 ) ) ),
SendToReport(
Dispatch( {"Distributions", "height"}, "Distrib Histogram", FrameBox, {DispatchSeg( Hist Seg( 1 ), Histogram Color( 53 ) )} ),
Dispatch( {"Distributions", "weight"}, "Distrib Histogram", FrameBox, {DispatchSeg( Hist Seg( 1 ), Histogram Color( 53 ) )} )
)
),
Bivariate( Y( :height ), X( :weight ) )
)
);
Not better perhaps, but as an alternative, you could also use 'Month()' and define a value label property:
New Table( "Month of Date",
Add Rows( 1 ),
New Column( "Date",
Numeric,
"Continuous",
Format( "m/d/y", 12 ),
Input Format( "m/d/y" ),
Formula( Today() )
),
New Column( "Month",
Numeric,
"Continuous",
Format( "Best", 12 ),
Formula( Month( :Date ) ),
Value Labels(
{1 = "January", 2 = "February", 3 = "March", 4 = "April", 5 = "May", 6
= "June", 7 = "July", 8 = "August", 9 = "September", 10 = "October", 11
= "November", 12 = "December"}
),
Use Value Labels( 1 )
)
)
Works like a VLookup(false) function.
Thanks, Ian !