- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Summarize Columns that contain more than one unique value
I have a data table and I want to summarize the columns that contain more than one unique value including missing rows. The columns with all rows the same are boring and I want to look at the more interesting ones only.
Slán
SpannerHead
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Summarize Columns that contain more than one unique value
Nice Add In Jarmo! I need to tinker around with that but it looks great. Here's the script I came up with to summarize any column that has > 1 unique entry by wafer.
dt = Current Data Table();
colList = Current Data Table() << Get Column Names( String );
dcols = {};
For( i = 1, i <= N Items( colList ), i++,
If( Contains( colList[i], " / " ),
Insert Into( dcols, colList[i] )
)
);
Show( dcols );
Split_Cols = {};
Insert Into( Split_Cols, :wafer );
For( j = 1, j <= N Items( dcols ), j++,
summarize(unique_values = by( Column(dcols[j]) ));
if (N Items(unique_values) > 1, Insert Into( Split_Cols, dcols[j] ) );
);
Show(Split_cols);
Current Data Table() << Summary(
Group( Split_cols ),
Freq( "None" ),
Weight( "None" )
);
Slán
SpannerHead
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Summarize Columns that contain more than one unique value
I suggest using For Each, Filter Each and Transform Each over for loop most of the time as they are faster and easier to read. But they can be more annoying to debug and deal with due to their speed if there is an issue AND they do have one nasty bug/feature". Luckily that bug/"feature" isn't present in simple case such as this(Issue with Function() default local and Each functions ). I'm not sure if you wanted to consider missing values as "valid" values so I did leave Remove From commented out inside the Filter Each loop
Names Default To Here(1);
dt = Current Data Table();
collist = dt << Get Column Names("String");
valid_cols = Filter Each({colname}, collist,
Summarize(dt, vals = By(Eval(colname)));
//Remove From(vals, Contains(vals, "")); // drop missing values if necessary
N Items(vals) > 1;
);
dt_summary = dt << Summary(
Group(valid_cols),
Freq("None"),
Weight("None"),
//Link to original data table(0),
Output Table("Non-Unique Summary")
);
Here is script to create simple demo table
dt = New Table("demo",
Compress File When Saved(1),
New Column("Column 1", Character, "Nominal", Set Values({"a", "a", "a", "a"})),
New Column("Column 2", Character, "Nominal", Set Values({"b", "b", "b", ""})),
New Column("Column 3", Character, "Nominal", Set Values({"", "", "", ""})),
New Column("Column 4", Character, "Nominal", Set Values({"c", "d", "", ""})),
New Column("Column 5", Character, "Nominal", Set Values({"c", "d", "e", "f"}))
);
That add-in was created before JMP had Columns manager and I still prefer it over Columns Manager most of the time. I assume they do have different use cases and Columns Manager tries to replace Columns Viewer platform with some nice extra features and Analyse Columns got its inspiration from pandas profiling (seems to be renamed to ydata profiling) as I had to be able to quickly check how my data looked like in terms of summary statistics.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Summarize Columns that contain more than one unique value
There are several ways to do this, but I think the easiest is the following:
Go to Cols > Column Manager
The resulting table will have a column with N Unique that tells you how many unique items are in each column. It also will have a column with N Missing which tells you how many missing values are in that column.
Unfortunately, N Unique is not automatically calculated for continuous columns, but the red pop-up menu offers the option to Force the Calculation for all Numeric columns.
You can also use that table to select columns with only one unique value. Those columns are then selected in the original table so you can hide and exclude them so you will not be forced to see those "non-interesting" columns.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Summarize Columns that contain more than one unique value
Luddite I am, JMP17 I'm afraid. Thanks, I think I can script this just wondered if someone had already cracked it.
Slán
SpannerHead
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Summarize Columns that contain more than one unique value
I don't really like "advertising" for my add-ins but maybe Analyse Columns could be helpful? I the public version should work (and work also with JMP17)). Simple script with a loop can also easily remove/exclude/hide/or whatever handle such columns.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Summarize Columns that contain more than one unique value
Nice Add In Jarmo! I need to tinker around with that but it looks great. Here's the script I came up with to summarize any column that has > 1 unique entry by wafer.
dt = Current Data Table();
colList = Current Data Table() << Get Column Names( String );
dcols = {};
For( i = 1, i <= N Items( colList ), i++,
If( Contains( colList[i], " / " ),
Insert Into( dcols, colList[i] )
)
);
Show( dcols );
Split_Cols = {};
Insert Into( Split_Cols, :wafer );
For( j = 1, j <= N Items( dcols ), j++,
summarize(unique_values = by( Column(dcols[j]) ));
if (N Items(unique_values) > 1, Insert Into( Split_Cols, dcols[j] ) );
);
Show(Split_cols);
Current Data Table() << Summary(
Group( Split_cols ),
Freq( "None" ),
Weight( "None" )
);
Slán
SpannerHead
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Summarize Columns that contain more than one unique value
I suggest using For Each, Filter Each and Transform Each over for loop most of the time as they are faster and easier to read. But they can be more annoying to debug and deal with due to their speed if there is an issue AND they do have one nasty bug/feature". Luckily that bug/"feature" isn't present in simple case such as this(Issue with Function() default local and Each functions ). I'm not sure if you wanted to consider missing values as "valid" values so I did leave Remove From commented out inside the Filter Each loop
Names Default To Here(1);
dt = Current Data Table();
collist = dt << Get Column Names("String");
valid_cols = Filter Each({colname}, collist,
Summarize(dt, vals = By(Eval(colname)));
//Remove From(vals, Contains(vals, "")); // drop missing values if necessary
N Items(vals) > 1;
);
dt_summary = dt << Summary(
Group(valid_cols),
Freq("None"),
Weight("None"),
//Link to original data table(0),
Output Table("Non-Unique Summary")
);
Here is script to create simple demo table
dt = New Table("demo",
Compress File When Saved(1),
New Column("Column 1", Character, "Nominal", Set Values({"a", "a", "a", "a"})),
New Column("Column 2", Character, "Nominal", Set Values({"b", "b", "b", ""})),
New Column("Column 3", Character, "Nominal", Set Values({"", "", "", ""})),
New Column("Column 4", Character, "Nominal", Set Values({"c", "d", "", ""})),
New Column("Column 5", Character, "Nominal", Set Values({"c", "d", "e", "f"}))
);
That add-in was created before JMP had Columns manager and I still prefer it over Columns Manager most of the time. I assume they do have different use cases and Columns Manager tries to replace Columns Viewer platform with some nice extra features and Analyse Columns got its inspiration from pandas profiling (seems to be renamed to ydata profiling) as I had to be able to quickly check how my data looked like in terms of summary statistics.