Run Analyse columns for all columns.
Quickly check if there column which have most values missing
or if values are mostly the same (you can re-order by clicking on header)
Select some of the columns which have most of the values same and use distribution and subset to check what they look like
As values are mostly same, they might not be that useful in further analysis. For demo purposes, we will use Hide&Exclude to remove these from analysis.
Next quick check could be to see, if there are some Continuous columns which should be possibly recoded as Nominal or Ordinal. Again Distribution and Subsets are good quick tools for this (looking for example for version numbers, id numbers and such). These values don't seem to be such values
Next we will check if there are possibly nines used instead of missing values and these seems to be quite a few columns like that
Analyse Columns will look for highest absolute nines and use those as Nines, it won't drop then based on quantiles or such. Some of those seem to have quite interesting situation where there are values larger than Nines. Again, we use distribution and subset to explore them in more detail
Distributions seem quite quite fine:
Next we create subset with those columns and take a closer look. These 9999 rows and missing values seem quite suspicion to me.
For example column 30N1_4X20_HFEPEAK*VA10U has 55 9999 values, doesn't feel completely normal to me. Let's create summary table of that column and order by N Rows.
This would require more knowledge of the process, but if I had to make a guess these are failed measurements / missing values, even though they are not even close to the largest values in the column
For demo purposes we conclude that those are missing values and use Set Nines Missing to exclude them WITHOUT losing data. After we have used Set Nines Missing, we should use Refresh Selected to refresh summary statistics calculations for those columns
Before:
After:
Next we could take a look at first order autocorrelation to see if the data isn't random and has some "row based" dependencies. There are quite a few columns with high autocorrelation, Start Time being obvious. We select some of the high autocorrelation columns and use Time Series to see what is going on
Seems like that there could be some dependencies which is caused by time.
There are still quite a few checks we could do, such as looking for correlations (we should clean outliers first for example with explore outliers) or use Model Driven Multivariate Control Charts to look for interesting patterns but I think we have enough to demonstrate what can be done with Analyse columns for now.