Solved: Summarize Columns that contain more than one unique value

Report Inappropriate Content · Jan 30, 2025 06:06 PM

I have a data table and I want to summarize the columns that contain more than one unique value including missing rows. The columns with all rows the same are boring and I want to look at the more interesting ones only.

Slán

SpannerHead

SpannerHead · Jan 31, 2025 04:49 PM

Nice Add In Jarmo! I need to tinker around with that but it looks great. Here's the script I came up with to summarize any column that has > 1 unique entry by wafer.

 
dt = Current Data Table();
colList = Current Data Table() << Get Column Names( String );

dcols = {};

For( i = 1, i <= N Items( colList ), i++,
	If( Contains( colList[i], " / " ),
		Insert Into( dcols, colList[i] )
	)
);
Show( dcols );

Split_Cols = {};
Insert Into( Split_Cols, :wafer );

For( j = 1, j <= N Items( dcols ), j++,
summarize(unique_values = by( Column(dcols[j]) ));
if (N Items(unique_values) > 1, Insert Into( Split_Cols, dcols[j] ) );		
);


Show(Split_cols);



Current Data Table() << Summary(
	Group( Split_cols ),
	Freq( "None" ),
	Weight( "None" )
);

Slán

SpannerHead

View solution in original post

jthi · Feb 1, 2025 01:32 AM

I suggest using For Each, Filter Each and Transform Each over for loop most of the time as they are faster and easier to read. But they can be more annoying to debug and deal with due to their speed if there is an issue AND they do have one nasty bug/feature". Luckily that bug/"feature" isn't present in simple case such as this(Issue with Function() default local and Each functions ). I'm not sure if you wanted to consider missing values as "valid" values so I did leave Remove From commented out inside the Filter Each loop

Names Default To Here(1);

dt = Current Data Table();

collist = dt << Get Column Names("String");

valid_cols = Filter Each({colname}, collist,
	Summarize(dt, vals = By(Eval(colname)));
	//Remove From(vals, Contains(vals, "")); // drop missing values if necessary
	N Items(vals) > 1;
);

dt_summary = dt << Summary(
	Group(valid_cols),
	Freq("None"),
	Weight("None"),
	//Link to original data table(0),
	Output Table("Non-Unique Summary")
);

Here is script to create simple demo table

dt = New Table("demo",
	Compress File When Saved(1),
	New Column("Column 1", Character, "Nominal", Set Values({"a", "a", "a", "a"})),
	New Column("Column 2", Character, "Nominal", Set Values({"b", "b", "b", ""})),
	New Column("Column 3", Character, "Nominal", Set Values({"", "", "", ""})),
	New Column("Column 4", Character, "Nominal", Set Values({"c", "d", "", ""})),
	New Column("Column 5", Character, "Nominal", Set Values({"c", "d", "e", "f"}))
);

That add-in was created before JMP had Columns manager and I still prefer it over Columns Manager most of the time. I assume they do have different use cases and Columns Manager tries to replace Columns Viewer platform with some nice extra features and Analyse Columns got its inspiration from pandas profiling (seems to be renamed to ydata profiling) as I had to be able to quickly check how my data looked like in terms of summary statistics.

-Jarmo

View solution in original post

Dan_Obermiller · Jan 30, 2025 08:08 PM

There are several ways to do this, but I think the easiest is the following:

Go to Cols > Column Manager

The resulting table will have a column with N Unique that tells you how many unique items are in each column. It also will have a column with N Missing which tells you how many missing values are in that column.

Unfortunately, N Unique is not automatically calculated for continuous columns, but the red pop-up menu offers the option to Force the Calculation for all Numeric columns.

You can also use that table to select columns with only one unique value. Those columns are then selected in the original table so you can hide and exclude them so you will not be forced to see those "non-interesting" columns.

Dan Obermiller

SpannerHead · Jan 31, 2025 10:36 AM

Luddite I am, JMP17 I'm afraid. Thanks, I think I can script this just wondered if someone had already cracked it.

Slán

SpannerHead

jthi · Jan 31, 2025 10:51 AM

I don't really like "advertising" for my add-ins but maybe Analyse Columns could be helpful? I the public version should work (and work also with JMP17)). Simple script with a loop can also easily remove/exclude/hide/or whatever handle such columns.

-Jarmo

SpannerHead · Jan 31, 2025 04:49 PM

Nice Add In Jarmo! I need to tinker around with that but it looks great. Here's the script I came up with to summarize any column that has > 1 unique entry by wafer.

 
dt = Current Data Table();
colList = Current Data Table() << Get Column Names( String );

dcols = {};

For( i = 1, i <= N Items( colList ), i++,
	If( Contains( colList[i], " / " ),
		Insert Into( dcols, colList[i] )
	)
);
Show( dcols );

Split_Cols = {};
Insert Into( Split_Cols, :wafer );

For( j = 1, j <= N Items( dcols ), j++,
summarize(unique_values = by( Column(dcols[j]) ));
if (N Items(unique_values) > 1, Insert Into( Split_Cols, dcols[j] ) );		
);


Show(Split_cols);



Current Data Table() << Summary(
	Group( Split_cols ),
	Freq( "None" ),
	Weight( "None" )
);

Slán

SpannerHead

jthi · Feb 1, 2025 01:32 AM

I suggest using For Each, Filter Each and Transform Each over for loop most of the time as they are faster and easier to read. But they can be more annoying to debug and deal with due to their speed if there is an issue AND they do have one nasty bug/feature". Luckily that bug/"feature" isn't present in simple case such as this(Issue with Function() default local and Each functions ). I'm not sure if you wanted to consider missing values as "valid" values so I did leave Remove From commented out inside the Filter Each loop

Names Default To Here(1);

dt = Current Data Table();

collist = dt << Get Column Names("String");

valid_cols = Filter Each({colname}, collist,
	Summarize(dt, vals = By(Eval(colname)));
	//Remove From(vals, Contains(vals, "")); // drop missing values if necessary
	N Items(vals) > 1;
);

dt_summary = dt << Summary(
	Group(valid_cols),
	Freq("None"),
	Weight("None"),
	//Link to original data table(0),
	Output Table("Non-Unique Summary")
);

Here is script to create simple demo table

dt = New Table("demo",
	Compress File When Saved(1),
	New Column("Column 1", Character, "Nominal", Set Values({"a", "a", "a", "a"})),
	New Column("Column 2", Character, "Nominal", Set Values({"b", "b", "b", ""})),
	New Column("Column 3", Character, "Nominal", Set Values({"", "", "", ""})),
	New Column("Column 4", Character, "Nominal", Set Values({"c", "d", "", ""})),
	New Column("Column 5", Character, "Nominal", Set Values({"c", "d", "e", "f"}))
);

That add-in was created before JMP had Columns manager and I still prefer it over Columns Manager most of the time. I assume they do have different use cases and Columns Manager tries to replace Columns Viewer platform with some nice extra features and Analyse Columns got its inspiration from pandas profiling (seems to be renamed to ydata profiling) as I had to be able to quickly check how my data looked like in terms of summary statistics.

-Jarmo

Summarize Columns that contain more than one unique value

Re: Summarize Columns that contain more than one unique value

Re: Summarize Columns that contain more than one unique value

Re: Summarize Columns that contain more than one unique value

Re: Summarize Columns that contain more than one unique value

Re: Summarize Columns that contain more than one unique value

Re: Summarize Columns that contain more than one unique value

Re: Summarize Columns that contain more than one unique value