cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
The Discovery Summit 2025 Call for Content is open! Submit an abstract today to present at our premier analytics conference.
Choose Language Hide Translation Bar
View Original Published Thread

Summarize Columns that contain more than one unique value

SpannerHead
Level V

I have a data table and I want to summarize the columns that contain more than one unique value including missing rows.  The columns with all rows the same are boring and I want to look at the more interesting ones only.


Slán



SpannerHead
2 ACCEPTED SOLUTIONS

Accepted Solutions
SpannerHead
Level V


Re: Summarize Columns that contain more than one unique value

Nice Add In Jarmo!  I need to tinker around with that but it looks great.  Here's the script I came up with to summarize any column that has > 1 unique entry by wafer.

 

 
dt = Current Data Table();
colList = Current Data Table() << Get Column Names( String );

dcols = {};

For( i = 1, i <= N Items( colList ), i++,
	If( Contains( colList[i], " / " ),
		Insert Into( dcols, colList[i] )
	)
);
Show( dcols );

Split_Cols = {};
Insert Into( Split_Cols, :wafer );

For( j = 1, j <= N Items( dcols ), j++,
summarize(unique_values = by( Column(dcols[j]) ));
if (N Items(unique_values) > 1, Insert Into( Split_Cols, dcols[j] ) );		
);


Show(Split_cols);



Current Data Table() << Summary(
	Group( Split_cols ),
	Freq( "None" ),
	Weight( "None" )
);

Slán



SpannerHead

View solution in original post

jthi
Super User


Re: Summarize Columns that contain more than one unique value

I suggest using  For Each, Filter Each and Transform Each over for loop most of the time as they are faster and easier to read. But they can be more annoying to debug and deal with due to their speed if there is an issue AND they do have one nasty bug/feature". Luckily that bug/"feature" isn't present in simple case such as this(Issue with Function() default local and Each functions ). I'm not sure if you wanted to consider missing values as "valid" values so I did leave Remove From commented out inside the Filter Each loop

 

Names Default To Here(1);

dt = Current Data Table();

collist = dt << Get Column Names("String");

valid_cols = Filter Each({colname}, collist,
	Summarize(dt, vals = By(Eval(colname)));
	//Remove From(vals, Contains(vals, "")); // drop missing values if necessary
	N Items(vals) > 1;
);

dt_summary = dt << Summary(
	Group(valid_cols),
	Freq("None"),
	Weight("None"),
	//Link to original data table(0),
	Output Table("Non-Unique Summary")
);

Here is script to create simple demo table

 

dt = New Table("demo",
	Compress File When Saved(1),
	New Column("Column 1", Character, "Nominal", Set Values({"a", "a", "a", "a"})),
	New Column("Column 2", Character, "Nominal", Set Values({"b", "b", "b", ""})),
	New Column("Column 3", Character, "Nominal", Set Values({"", "", "", ""})),
	New Column("Column 4", Character, "Nominal", Set Values({"c", "d", "", ""})),
	New Column("Column 5", Character, "Nominal", Set Values({"c", "d", "e", "f"}))
);

 

That add-in was created before JMP had Columns manager and I still prefer it over Columns Manager most of the time. I assume they do have different use cases and Columns Manager tries to replace Columns Viewer platform with some nice extra features and Analyse Columns got its inspiration from pandas profiling (seems to be renamed to ydata profiling) as I had to be able to quickly check how my data looked like in terms of summary statistics.

 

 

-Jarmo

View solution in original post

5 REPLIES 5


Re: Summarize Columns that contain more than one unique value

There are several ways to do this, but I think the easiest is the following:

Go to Cols > Column Manager

The resulting table will have a column with N Unique that tells you how many unique items are in each column. It also will have a column with N Missing which tells you how many missing values are in that column.

 

Unfortunately, N Unique is not automatically calculated for continuous columns, but the red pop-up menu offers the option to Force the Calculation for all Numeric columns.

 

You can also use that table to select columns with only one unique value. Those columns are then selected in the original table so you can hide and exclude them so you will not be forced to see those "non-interesting" columns.

Dan Obermiller
SpannerHead
Level V


Re: Summarize Columns that contain more than one unique value

Luddite I am, JMP17 I'm afraid.  Thanks, I think I can script this just wondered if someone had already cracked it.


Slán



SpannerHead
jthi
Super User


Re: Summarize Columns that contain more than one unique value

I don't really like "advertising" for my add-ins but maybe Analyse Columns could be helpful? I the public version should work (and work also with JMP17)). Simple script with a loop can also easily remove/exclude/hide/or whatever handle such columns.

-Jarmo
SpannerHead
Level V


Re: Summarize Columns that contain more than one unique value

Nice Add In Jarmo!  I need to tinker around with that but it looks great.  Here's the script I came up with to summarize any column that has > 1 unique entry by wafer.

 

 
dt = Current Data Table();
colList = Current Data Table() << Get Column Names( String );

dcols = {};

For( i = 1, i <= N Items( colList ), i++,
	If( Contains( colList[i], " / " ),
		Insert Into( dcols, colList[i] )
	)
);
Show( dcols );

Split_Cols = {};
Insert Into( Split_Cols, :wafer );

For( j = 1, j <= N Items( dcols ), j++,
summarize(unique_values = by( Column(dcols[j]) ));
if (N Items(unique_values) > 1, Insert Into( Split_Cols, dcols[j] ) );		
);


Show(Split_cols);



Current Data Table() << Summary(
	Group( Split_cols ),
	Freq( "None" ),
	Weight( "None" )
);

Slán



SpannerHead
jthi
Super User


Re: Summarize Columns that contain more than one unique value

I suggest using  For Each, Filter Each and Transform Each over for loop most of the time as they are faster and easier to read. But they can be more annoying to debug and deal with due to their speed if there is an issue AND they do have one nasty bug/feature". Luckily that bug/"feature" isn't present in simple case such as this(Issue with Function() default local and Each functions ). I'm not sure if you wanted to consider missing values as "valid" values so I did leave Remove From commented out inside the Filter Each loop

 

Names Default To Here(1);

dt = Current Data Table();

collist = dt << Get Column Names("String");

valid_cols = Filter Each({colname}, collist,
	Summarize(dt, vals = By(Eval(colname)));
	//Remove From(vals, Contains(vals, "")); // drop missing values if necessary
	N Items(vals) > 1;
);

dt_summary = dt << Summary(
	Group(valid_cols),
	Freq("None"),
	Weight("None"),
	//Link to original data table(0),
	Output Table("Non-Unique Summary")
);

Here is script to create simple demo table

 

dt = New Table("demo",
	Compress File When Saved(1),
	New Column("Column 1", Character, "Nominal", Set Values({"a", "a", "a", "a"})),
	New Column("Column 2", Character, "Nominal", Set Values({"b", "b", "b", ""})),
	New Column("Column 3", Character, "Nominal", Set Values({"", "", "", ""})),
	New Column("Column 4", Character, "Nominal", Set Values({"c", "d", "", ""})),
	New Column("Column 5", Character, "Nominal", Set Values({"c", "d", "e", "f"}))
);

 

That add-in was created before JMP had Columns manager and I still prefer it over Columns Manager most of the time. I assume they do have different use cases and Columns Manager tries to replace Columns Viewer platform with some nice extra features and Analyse Columns got its inspiration from pandas profiling (seems to be renamed to ydata profiling) as I had to be able to quickly check how my data looked like in terms of summary statistics.

 

 

-Jarmo